Hacker News new | past | comments | ask | show | jobs | submit login

I have been working on a large WebRTC project, and it’s a great technology but it’s just so fragmented right now it’s hard.

For iOS

1. Safari has a high pass filter or something on the incoming audio. Send in some music on opus/48000 with the proper maxbitrate and listen. Now download chrome for iOS and listen. Chrome has MUCH more low end. Running the audio before and after into a spectrum analyzer chrome is much closer to the original sound.

2. If you make the video full screen then exit full screen if pauses.

3. I couldn’t get incoming RTP that is opus/48000/2 to be played in stereo. But as number 1 playing it in chrome does play in stereo.

Android isn’t much better.

1.The fragmentation is such a HUGE pain. The manufacturer browsers on the phones seem to be very hit or miss. How many browsers come by default on android phones now days?

2. The latest Samsung phones cameras can’t do resolution 640x360 for whatever reason. Previous generations could and other android phones can as well.

Even desktop isn’t all there.

1. If you want to webrtc screen share there is no way to automatically select a screen or tab, not even CLI flags. So you can’t automate testing it. And you can’t even click the dialog in the chrome remote debugger.

2. FireFox has nothing like chromes chrome://webrtc-internals

A few other things

1. Simulcasting is ghetto

2. I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

3. Want to collect metrics with getStats? Well there is no easy way to record all of them and the one SaaS is super super expensive




This is one of the reasons we’ve had to go the dreaded electron route and really push the desktop app: it’s just not feasible to deliver the same UX across browsers/OS. Really frustrating. I’ll add one to your list:

4. I wish RTCRtpTransceiver would allow reusing encoded MediaStreams instead of re-encoding. For video, dropped frames are acceptable without needing to renegotiate bandwidth.


Video codecs doesn't work like that, most frames only encode the difference from the previous frame, meaning that if you drop one frame, all frames after that will be impossible to decode correctly.

A technique that's used to get around this is to use special frame reference pattern so some frames depend on a frame two or four frames before, allowing the intermediate frame to be dropped. This comes at a significant cost to the encoding efficiency though. tT use it with a pre-encoded video, it would require you to specifically encode it for this purpose, making the usefulness of such a feature in a client questionable.

Nothing stops you from building a server that does this though, for instance if you want to build a scalable streaming service based on WebRTC.


What you describe are called key frames. Nearly all video uses them. Recorded video for seeking, and steaming for well, steaming.

You missed the entire point of the comment.


Yup I’m in the same boat. The original appeal was how it’s all browser cross platform but in the end it’s not really.

And that’s a good one


We've been watching Apple undermine WebRTC for years now. There's been nothing stopping them from taking the existing library and running with it; nothing stopped them for the years they had no support, and nothing is stopping them now that they have nominal support.

Manufacturer browsers used to be a big deal, but for the last couple years it's been mostly Chrome or something nearly identical on most devices. Not having that specific resolution available is one thing, but it's nothing like having no support, or inexplicably degrading audio quality.


They accept pull requests... Have you tried asking on the mailing list if they'd be open to accepting a patch?


> If you want to webrtc screen share there is no way to automatically select a screen or tab, not even CLI flags. So you can’t automate testing it. And you can’t even click the dialog in the chrome remote debugger.

I've recorded a tab automatically before with Chrome [0]. Basically I have --auto-select-desktop-capture-source set to pick a tab then named my tab something it could find, you can probably get close with entire screen too.

> Simulcasting is ghetto

Agreed, but in my recent server side project I just realized if I ignore the ssrcs and use the rids from the rtp packets, my sfu did what I want. I found while working with Chrome, upping the log visibility really helped me understand why, based on estimated bitrate sent by the server, my highest simulcast stream wasn't being sent (since then I adhere to remb needs).

0 - https://github.com/cretz/chrome-screen-rec-poc/blob/master/a...


Thanks for both of those! Super helpful


> I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

You should look into SVC. Nowadays there aren’t too many reasons why you’d need to decode/reencode if all you want is to change resolution, quality or drop frames graciously.


Didn’t know about them, that actually looks really really nice. Not for the use case I was talking about but SVC seem great


You do know about about:webrtc in firefox, right?


Ha didn’t know about that. All I knew about was the plugin. Even doing googling it’s not that well known for Firefox. Even the Firefox website talks about the plugin


It is not in Google or Apple's best interest to work together on things like this, because then it'd accelerate the already fast-coming age of Progressive Web Apps (PWAs), which removes their, what, 30% cut from "apps"?

Which are all....websites smashed into "native" code + privacy invading trackers, anyway?


quick question - how do you scale webrtc server-side ?

Do you use kubernetes to scale jitsi/mediasoup/whatever servers up and down ?

what kind of loadbalancers do you use, etc . I come from the kubernetes side and have never read anything about scaling of these things...so am super curious.


I just do unique URLs per server like serverid.server.blah.com or something. Then just take care of selecting which one in the app.


interesting. and how do you scale/orchestrate them ? home grown or something like kubernetes


> 2. I wish server CPUs would support h264/vp8/vp9 hardware encoding and decoding on the CPUs

Why is this? What advantages are there to doing it on a CPU rather than in a GPU?


Most servers don’t have GPUs and don’t require full GPU capabilities.


Which SaaS is that?


https://www.callstats.io is the only one I am aware of.

The CEO is one of the authors of the getStats RFC as well https://www.w3.org/TR/webrtc-stats/


Yup that’s the one. It’s just soooo expensive for starting out


We do have a free plan, it’s not publicly listed at the moment. If you do under 20 000 minutes or so a month, it should not prompt for a credit card.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: