Hacker News new | past | comments | ask | show | jobs | submit login
Peer-to-Peer Communications with WebRTC (developer.mozilla.org)
210 points by memexy on June 6, 2020 | hide | past | favorite | 74 comments



I have been working on a large WebRTC project, and it’s a great technology but it’s just so fragmented right now it’s hard.

For iOS

1. Safari has a high pass filter or something on the incoming audio. Send in some music on opus/48000 with the proper maxbitrate and listen. Now download chrome for iOS and listen. Chrome has MUCH more low end. Running the audio before and after into a spectrum analyzer chrome is much closer to the original sound.

2. If you make the video full screen then exit full screen if pauses.

3. I couldn’t get incoming RTP that is opus/48000/2 to be played in stereo. But as number 1 playing it in chrome does play in stereo.

Android isn’t much better.

1.The fragmentation is such a HUGE pain. The manufacturer browsers on the phones seem to be very hit or miss. How many browsers come by default on android phones now days?

2. The latest Samsung phones cameras can’t do resolution 640x360 for whatever reason. Previous generations could and other android phones can as well.

Even desktop isn’t all there.

1. If you want to webrtc screen share there is no way to automatically select a screen or tab, not even CLI flags. So you can’t automate testing it. And you can’t even click the dialog in the chrome remote debugger.

2. FireFox has nothing like chromes chrome://webrtc-internals

A few other things

1. Simulcasting is ghetto

2. I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

3. Want to collect metrics with getStats? Well there is no easy way to record all of them and the one SaaS is super super expensive


This is one of the reasons we’ve had to go the dreaded electron route and really push the desktop app: it’s just not feasible to deliver the same UX across browsers/OS. Really frustrating. I’ll add one to your list:

4. I wish RTCRtpTransceiver would allow reusing encoded MediaStreams instead of re-encoding. For video, dropped frames are acceptable without needing to renegotiate bandwidth.


Video codecs doesn't work like that, most frames only encode the difference from the previous frame, meaning that if you drop one frame, all frames after that will be impossible to decode correctly.

A technique that's used to get around this is to use special frame reference pattern so some frames depend on a frame two or four frames before, allowing the intermediate frame to be dropped. This comes at a significant cost to the encoding efficiency though. tT use it with a pre-encoded video, it would require you to specifically encode it for this purpose, making the usefulness of such a feature in a client questionable.

Nothing stops you from building a server that does this though, for instance if you want to build a scalable streaming service based on WebRTC.


What you describe are called key frames. Nearly all video uses them. Recorded video for seeking, and steaming for well, steaming.

You missed the entire point of the comment.


Yup I’m in the same boat. The original appeal was how it’s all browser cross platform but in the end it’s not really.

And that’s a good one


We've been watching Apple undermine WebRTC for years now. There's been nothing stopping them from taking the existing library and running with it; nothing stopped them for the years they had no support, and nothing is stopping them now that they have nominal support.

Manufacturer browsers used to be a big deal, but for the last couple years it's been mostly Chrome or something nearly identical on most devices. Not having that specific resolution available is one thing, but it's nothing like having no support, or inexplicably degrading audio quality.


They accept pull requests... Have you tried asking on the mailing list if they'd be open to accepting a patch?


> If you want to webrtc screen share there is no way to automatically select a screen or tab, not even CLI flags. So you can’t automate testing it. And you can’t even click the dialog in the chrome remote debugger.

I've recorded a tab automatically before with Chrome [0]. Basically I have --auto-select-desktop-capture-source set to pick a tab then named my tab something it could find, you can probably get close with entire screen too.

> Simulcasting is ghetto

Agreed, but in my recent server side project I just realized if I ignore the ssrcs and use the rids from the rtp packets, my sfu did what I want. I found while working with Chrome, upping the log visibility really helped me understand why, based on estimated bitrate sent by the server, my highest simulcast stream wasn't being sent (since then I adhere to remb needs).

0 - https://github.com/cretz/chrome-screen-rec-poc/blob/master/a...


Thanks for both of those! Super helpful


> I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

You should look into SVC. Nowadays there aren’t too many reasons why you’d need to decode/reencode if all you want is to change resolution, quality or drop frames graciously.


Didn’t know about them, that actually looks really really nice. Not for the use case I was talking about but SVC seem great


You do know about about:webrtc in firefox, right?


Ha didn’t know about that. All I knew about was the plugin. Even doing googling it’s not that well known for Firefox. Even the Firefox website talks about the plugin


It is not in Google or Apple's best interest to work together on things like this, because then it'd accelerate the already fast-coming age of Progressive Web Apps (PWAs), which removes their, what, 30% cut from "apps"?

Which are all....websites smashed into "native" code + privacy invading trackers, anyway?


quick question - how do you scale webrtc server-side ?

Do you use kubernetes to scale jitsi/mediasoup/whatever servers up and down ?

what kind of loadbalancers do you use, etc . I come from the kubernetes side and have never read anything about scaling of these things...so am super curious.


I just do unique URLs per server like serverid.server.blah.com or something. Then just take care of selecting which one in the app.


interesting. and how do you scale/orchestrate them ? home grown or something like kubernetes


> 2. I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

Why is this? What advantages are there to doing it on a CPU rather than in a GPU?


Most servers don’t have GPUs and don’t require full GPU capabilities.


Which SaaS is that?


https://www.callstats.io is the only one I am aware of.

The CEO is one of the authors of the getStats RFC as well https://www.w3.org/TR/webrtc-stats/


Yup that’s the one. It’s just soooo expensive for starting out


We do have a free plan, it’s not publicly listed at the moment. If you do under 20 000 minutes or so a month, it should not prompt for a credit card.


I was pretty unimpressed by Zoom as I used to think along the lines of "bro just open up a WebRTC connection. You can use socket.io it's super simple"

Then... I attended a "Zoom Nightclub" and saw over 200 attendees streaming video to one another. All encryption, routed through China, etc. issues aside; I was very very impressed.


I don't think zoom is p2p. It just routes through a media server. So the media server merges all streams and distributes it.


That's their point. P2P doesn't scale to 100s of people. For even more than 10 you need a server. One service I've used forever is whereby.com (used to be appear.in) but even they acquiesced with their professional option for more than 4 people to ditching pure P2P using WebRTC.


Naive P2P doesn't scale like that. You'd have to do something much more advanced like a tree based multicast algorithm. That's hard to get right. Seems like the engineering there is more expensive than just paying for a ton of bandwidth and doing a simple centralized aggregator.

Of course there could also be a surveillance motive.


We have experimented with a P2P multicast setup and WebRTC makes this very difficult to do. Ultimately what we’re rolling out soon, and what others like Zoom have done, is to send media over WebRTC data channels. The media channels are amazing if you’re building a demo project, but for anything serious the spec does not allow enough low level control.

Also latency in P2P multicast starts to become a real problem.


How do you decode video from a data channel?

Getting encoded video bytes from a buffer onto the screen while using hardware video decoding and without a multi-second lag I haven't been able to do in either safari or chrome - any tips?


wasm ffmpeg -> canvas

Granted, not tested on Safari


Can you do 1080p60 video on mobile with wasm ffmpeg? I'd imagine it to have severe performance issues, since WASM is pretty bad for vector and bit twiddling operations heavily used in video codecs, and writing to a canvas on every frame requires the javascript (main) thread rather than being possible to do on the compositor thread (where videos normally decode and run) which means it'll end up janky if any other javascript runs at all.

Even basic canvas animations on the main thread like the dinosaur game (chrome offline page) are pretty janky, and they aren't doing much per frame at all.


Also a privacy issue. If your app reveals the IP of other users, it will lead to "interesting" effects like someone DoSing the home internet connection of a presenter whose stream they want to disrupt.


That's such a flaw in the Internet's architecture. IP was never designed for such a hostile environment.


DoS could be handled by stateful multicast firewalls, yes?


That requires infrastructure changes, which is pretty much impossible as nobody has a vested interest and its herding cats. P2P means over existing networks.

I forgot to write though: I am not convinced this is that big a problem in the real world, and there are other mitigations. One would be a "poor man's Tor," onion routing over just 1-2 hops. Since you are propagating and aggregating P2P anyway, its not going to be that expensive. Doesn't make DOS impossible but makes it tough enough to deter amateurs.


Latency is also a factor in a tree based algo.


You could just use WebRTC with a selective forwarding unit.


Zoom can supposedly handle up to 1000 people video calls.


Another nitpick: WebRTC connections are extremely fragile and a pain to herd. We’ve had to build both data and audio keepalives into Squawk[1] to ensure that we can hold connections for long durations (running in the background 24/7). And if the connection encounters a problem, often the only recourse is to renegotiate or recycle.

[1] https://www.squawk.to

EDIT: For anyone curious, we swap out the track from the RTCRtpSender so we’re not transmitting any data when we’re muted (muting the track locally means you’re still sending frames of silence every 20ms which eats up cpu and bandwidth). And every 15 seconds we send a 60ms blast of silence, and verify on the receiver every minute.


Why not just use ICE connectivity checks? That's what they are for.

And I have proposed additions to the WebRTC API to allow control of when ICE checks are sent and how long a particular ICE candidate pair should stay alive (you could set it to "forever"):

https://github.com/w3c/webrtc-ice/pull/22


This seems like an issue with residential network infrastructure more than webRTC. Do other peer-to-peer technologies work better than webRTC?


I can’t speak to other P2P tech, but I presume so. As far as I know, no other P2P tech runs at a big level, sandboxed abstraction of what’s really going on. And therein lies the rub of WebRTC. It was supposed to be a portable high level standard, but it isn’t portable so we’re just left with the costs of being high level.


I'm using webRTC for a video chat project that I'm developing now. I had a hard time making it work on iOS browsers and ended up supporting iOS Safari alone. Though all iOS browsers use the same WebKit, Apple still reserves a few things only for Safari I guess (maybe H264/VP8 video codec related stuff?!).

It may take some time for webRTC (or Apple) to get there, to webRTC become a solid option for p2p video communications.

If anyone wants to quickly try webRTC, check this demo https://appr.tc/ out, you know, with multiple browsers/tabs.

ps: I used "simple-peer" library and it's quite good for beginners.


getUserMedia isn't supported by third-party browsers on iOS yet, see note 3:

https://caniuse.com/#feat=stream

The Judiciary Committee asked Apple about this last summer, see question 6:

https://docs.house.gov/meetings/JU/JU05/20190716/109793/HHRG...

I think it's close to landing though, need to revisit.


Sort of a tangent, but the committee asking Apple why they don’t support a specific web API is such a stark contrast from that senate panel with Zuckerberg. The Judiciary Committee seems so much more technically informed.


WebRTC mildly competes with their proprietary FaceTime URLs that, along with iMessage, "encourages" people to keep coming back into their stores. They could make it work exactly like the implementations on Windows and Linux (and the other macOS web browsers that are "required" to use WebKit) if they desired. You can still use an SDK though --as long as they allow you publish it to users.


There are native libraries available for both iOS and Android. I didn't want end-user to install any. Just to click the invitation link and start chatting.


I am developing a video chat + screen share for a school. I am looking for similar browser based application "Click to Start" without installing an app or downloading anything.


I use {reactJS,socket.io,simple-peer,nodeJS}. you can skip the reactjs. There are a lot of tutorials available for video chat apps using webrtc+socket.io. https://appr.tc/ is open-sourced but not actively maintained.


check this out: https://peer.school/

webRTC classroom app, open-sourced.


Webtorrent.io is probably the best thing I’ve come to so far with p2p webrtc. Really amazing what came out of PeerCDN.


I think WebRTC is still getting started also :) I love the space, here are some of my favorites.

* Open Source classic game streaming - https://github.com/giongto35/cloud-game

* Controlling industrial construction equipment - https://twitter.com/shirokunet/status/1268011883816054784

* Run a web browser remotely, great for low spec computers - https://github.com/nurdism/neko

* Use NAT Traversal to access hosts. No more bastion hosts/3rd party services - https://github.com/mxseba/rtc-ssh

* Use Tor via WebRTC DataChannels - https://snowflake.torproject.org/

* Fly a drone via WASD in the browser - https://github.com/oliverpool/tello-webrtc-fpv

* Share files P2P in a cross-platform way. I HATE paying to transfer files, seems so basic - https://github.com/saljam/webwormhole

* Access a friends terminal from your browser via P2P - https://github.com/maxmcd/webtty


These are great references. Thanks for the list.


Thanks :) I am really passionate about this stuff. WebRTC is really great, but the community is kind of anemic. I am involved with a project Pion that is trying to bring more community ownership (instead of commercial) to the WebRTC space.

If you are interested for more check out https://github.com/pion/webrtc, https://twitter.com/_pion and https://pion.ly/slack


Ya it seems like good technology. I was trying to think of ways to use it in a peer-to-peer knowledge base sharing mechanism where users could decide which nodes in their personal knowledge graph would be public and figured webRTC might be a good starting point so that's how I stumbled on that page.

I've shelved that aspect at the moment but your link will help me with code samples I can re-use when I revisit the sharing aspect of the project.


Author of Peer Calls here, an open source, anonymous WebRTC multi-user conferencing solution. I recently ported the app that Go and added an SFU (thanks to pion/webrtc) to make support for 3+ users better. Works on all recent major browsers (including Safari on iOS).

See https://github.com/peer-calls/peer-calls for more information.

You can host it yourself or use peercalls.com

Happy to answer any questions in this thread.


Local audio is replayed locally it seems. So I hear myself with a delay which disconcerting.


In the past there was no encryption for communications then sRTP with DTLS became the way of doing it (secure signalling and media) when you negotiate sRTP material between parties you can intercept the keys and inject some pretending to be the remote caller and viceversa. https://bloggeek.me/is-webrtc-safe/ a good read


A bit off topic: recently multi monitor screen sharing based on WebRTC was fixed in Chromium/Linux/x11 (landed in Chrome 83) making my Google Meet/ Jitsi screen sharing support sessions much easier...


Yesterday Vivaldi & Brave screensharing on Fedora were dismal with Whereby (which uses WebRTC assumingely) while Chrome 83 worked great. Why would that be? Because Whereby doesn’t test Chromium browsers on Linux?


Hi, one Whereby-developer here. I use Vivaldi and Firefox on Linux mostly. So you could rather say we over-test Chromium-based browsers on Linux. Also test in "real" Chrome and open up Opera now and then, and Brave quite infrequently.

But the API we get from the browser is basically just `navigator.mediaDevices.getDisplayMedia()`, so what we do to screenshare is actually not much code at all. If we do something bad for Vivaldi/Brave though, we'll be quick to fix it, but this is likely their bugs, not ours :) Vivaldi screensharing on Linux broke half a year or so ago, but we're in the same town and know them, so they quickly fixed whatever bug it must have been after we pinged them.

But the most likely thing here is just that they might not have updated to newest Chromium with all these fixes. Or maybe not done the necessary platform-integrations or UI for it. Having worked on Opera previously I know anything involving chrome (UI) will often take a lot longer to do because you actually have to write a lot of custom code for it (in contrast to pure web-platform features and improvements, which basically will be free).


Why is `RTCPeerConnection.setLocalDescription()` an independent method? I've always wondered what use case might exist that would not set the local description right after calling `pc.createOffer()` or `pc.createAnswer()`. Similarly with `pc.setRemoteDescription()`.

Wouldn't have been clearer to have createOffer + processAnswer (for a negotiation started locally), and processOffer + createAnswer (to respond to remote offers), and set the local/remote descriptions implicitly as part of these methods?


`setLocalDescription` is where all the logic happens! `createOffer`/`createAnswer` just creates an easy config to pass.

You can change lots of undocumented stuff in the SDP! This list isn't exhaustive, but I have seen production apps do these.

* Change ice ufrag/upwd

* Enable simulcast

* Change code preferences

* Change codec specific behavior (Opus ptime)

My guess is that it works this way so browser vendors don't have to actually standardize behavior. They don't need to go through the W3C every time they want add something new. ORTC was released to try and fix this though. `SDP Munging` is a terrible idea and has caused me so many headaches :/


I'm not sure why originally it was set, independent. But, it helped me to play around munging SDP.


I needed a very low traffic chat feature for a website built on Django so i went looking too what existing code there was.

1 was based on simple Ajax to server and a server backed message store.

About a dozen used WebRTC or other advanced techniques.

I wonder how many people who could quite easily cope with the expected traffic using brain-dead simple techniques end up going the complex route because they assume that's how you do chat?


Is your chat text only? I think for media WebRTC is going to always win in the browser (unless you have the developers who can implement custom congestion control etc..)

For text only I still see a couple advantages

* E2E Security. With WebRTC I don't have to worry about snooping. Why should I upload my messages to a server?

* Scaling/Bandwidth/Ops Burden. Massively reduced if all my clients are directly connected to each other

* Ajax (TCP) can't do as much. DataChannels give me lossy/out-of-order messaging. Not useful for chat, but really great if building anything gaming/real-time


Yeah - I think all of that supports my point. Aside from security (which most users wouldn't be able to verify in any case) none of those things are relevant to low-traffic chat. i.e. "talk to our live advisors" or "collaborate with another user". The bandwidth and load is tiny and I can imagine handling hundreds or thousands of concurrent users with a traditional architecture and commodity hosting.


Before that, a webserver was still required to initiate the connection between 2 user-peers, which of course made little sense. I once asked about it and got an answer about "DDOS risks", and it made sense, because if you let people visit some webpage, it would be an easy way to DDOS.


Hi all,

I built a simple messenger using WebRTC. You can check at http://sambhashana.com/.

Please use it and share your feedback


You need to provide more details. How does it compare to other chat applications? What improvements and features does it have that other chat applications don't have.


Hi,

Except for room members online information nothing goes to the server. Every message goes peer to peer and nothing is stored so the moment user closes or refreshes browser everything gone.


That's a good description. It should have been included with your original pitch.


Question to front end folks:

How hard is it to add real-time voice communication to a web page using this?

Can it be done without using third party libraries and frameworks?


its pretty much easy and it doesnt require any third party libs. check the demo

https://webrtc.github.io/samples/src/content/peerconnection/...

code: https://github.com/webrtc/samples/tree/gh-pages/src/content/...




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: