Hacker News new | past | comments | ask | show | jobs | submit login
FFmpeg to WebRTC (github.com/ashellunts)
215 points by ashellunts on Sept 22, 2021 | hide | past | favorite | 46 comments



Note that there are a lot of tunings that you may need depending on what your latency tolerance and picture quality tolerance is. I would recommend following FFmpeg's streaming guide [0].

If you are trying to stream desktop, camera, and microphone to the browser, I would recommend pion's mediadevices package [1].

[0] - https://trac.ffmpeg.org/wiki/StreamingGuide

[1] - https://github.com/pion/mediadevices


Thanks!

I've been wanting to build a hackday project that takes images captured from our satellites and builds a video stream. (we gets hundreds of new shots from space every minute, for fun create a screensaver-like streaming video feed of interesting pictures)

perhaps I can build it with either ffmpeg or pion (using pkg/driver/screen as a model to create a virtual canvas to draw on)


What industry do you work in? Sounds interesting to have many satellite images streaming in!


I'm at Planet.com - our founders worked on phone-sat at NASA then went on to build an earth imaging company with more than 100 satellites at any given time - all imaging as fast as possible :)


My all time question about FFmpeg is what are all those timestamp correction flags and synchronization options for:

* -fflags +genpts, +igndts, +ignidx

* -vsync

* -copyts

* -use_wallclock_as_timestamps 1

* And more that you find even when you thought you had seen all flags that might be related.

FFmpeg docs are a strange beast, they cover a lot of topics, but are extremely shallow in most of them, so the overall quality ends up being pretty poor. I mean it's like the kind of frowned upon code comments such as "ignidx ignores the index; genpts generates PTS". No surprises there... but no real explanation, either.

What I'd love is for a real, technical explanation of what are the consequences of each flag, and more importantly, the kind of scenarios where they would make a desirable difference.

Especially for the case of recording live video that comes from an unreliable connection (RTP through UDP) and storing it as-is (no transcoding whatsoever): what is the best, or recommended set of flags that FFmpeg authors would recommend? Given that packets can get lost, or timestamps can get garbled, UDP packets reordered in the network, or any combination of funny stuff.

For now I've sort of decided on using genpts+igndts and use_wallclock_as_timestamps, but all comes from intuition and simple tests, and not from actual evidence and guided by technical documentation of each flag.


Think of ffmpeg as a universal translator. There are 100s of languages in use around the world, with their own syntax, vocabulary, writing system, formal and informal conventions ..etc.

A universal translator framework cannot provide a bespoke translation engine for all possible permutations of source and target language. Instead it provides a common engine which is meant to be suitable enough for the most common traits shared across languages.

When converting any two languages at random, there will be quirks of the language or errors/ambiguity in the source prose which the engine cannot hope to all automatically recognize and accommodate, so there are all these options that do one specific thing and allow users to modify a step of the translation process. The docs cannot go in detail because the downstream ramifications of the option can vary based on the exact properties of the source-target pair and the transformations requested of ffmpeg. Instead the docs will describe the exact change directly triggered by the option.

----

As for the specific options,

* -fflags +genpts, +igndts, +ignidx

All of these apply to inputs only.

genpts: if input packets have missing presentation timestamps, this option will assign the decoding timestamp as PTS, if present.

igndts will unset dts if packet's pts is set.

ignidx is only applied to a few formats. These provide a keyframe index, which ffmpeg uses to populate its internal KF index for the stream. This option makes ffmpeg ignore the supplied index.

* -vsync

The option is misnamed. It's better called fpsmode. Most commonly used to drop or duplicate frames to achieve a constant framerate stream.

* -copyts

FFmpeg will, by default, remove any starting offset to input timestamps or adjust timestamps if they overflow (roll over) or have a large gap. copyts stops all that and relays input timestamps. Basically used if one wishes to manually examine and adjust timestamps using setpts filter or setts bitstream filter.

* -use_wallclock_as_timestamps 1

Discards input timestamps and assign system clock time at time of handling packet as its pts.


Thank you; I understand what you mean with the translator metaphor. The project still needs to pick some very common usage scenarios and discuss them in depth (kind of "let's pick English and Spanish, two very commonly used languages, and talk about all quirks and translation techniques", in your own example).

You already provided better lines for some of the options than what their docs state, although I'd still miss a small commentary about some example instances where some of them would be useful.

For example: "igndts will unset dts if packet's pts is set"... OK but why would anyone want to do that? DTS is for Decoding, PTS is for Presentation, so wouldn't mixing them cause presentation issues?

As mentioned I'm interested in storing UDP RTP as-is, and for that I'm using "-fflags +genpts+igndts -use_wallclock_as_timestamps 1" because intuitively it makes sense to me that potentially broken incoming timestamps should be ignored and new ones written from scratch, but now that you mention it, maybe "+genpts" is doing nothing in this scenario?


> You already provided better lines for some of the options than what their docs state

I wrote the docs section you refer to :)

They are meant to be narrow and to the point, not didactic, not least because the codepath leading up to an option and following that option depend on internal evaluations and other options.

> why would anyone want to do that?

Because the input has weird or unreliable DTS. One would have to check code history to see why an option was initially added. Many of ffmpeg options are there to deal with edge cases or weird inputs.

> -fflags +genpts+igndts -use_wallclock_as_timestamps 1

In theory, igndts is unsetting wallclock dts set by the last option. genpts should have no effect, again due to the third option.


> They are meant to be narrow and to the point, not didactic

And that's understandable, but I believe some are crossing a thin line between "this is not the place where you should be learning about the technicalities of this option" and "this was added for some obscure and undocumented reason and nobody will really know for sure why it's useful".

> [why would anyone want to do use +igndts?] Because the input has weird or unreliable DTS.

> In theory, igndts is unsetting wallclock dts set by the last option. genpts should have no effect, again due to the third option.

These 2 are perfect examples of concise clarifications that would help users a lot

> I wrote the docs section you refer to :)

And I personally thank you for it; it's much better some documentation, even if IMHO a bit too short on details, than no docs at all... and writing proper docs is hard, I know it very well from the project I'm maintainer of! :)


For `-f h264`, `-bsf:v h264_mp4toannexb` is not needed. It will be automatically inserted as needed, with ffmpeg 4.0 or later.

For latency, specify a short GOP size, e.g. `-g 50`


Thank you, will try.


I personally use this project to proxy IP camera RTSP stream via Web Sockets as fragmented MP4 - https://github.com/deepch/RTSPtoWSMP4f

I'm not affiliated with the project, it's just really performant and reliable.


Nice stuff; I did something similar with ffmpeg and pion.

It was for audio and it was webrtc to ffmpeg. I was streaming a group chat directly to s3.

It mostly worked, but the only problem I ran into was syncing issues if a user had a spotty connection. The solution seemed to involve using rtmp to synchronize but I didn’t have a chance to go down that rabbit hole.


Hopefully WHIP takes off. It’s a standard protocol that would easily allow things to interface with WebRTC.

https://www.meetecho.com/blog/whip-janus/

https://millicast.medium.com/whip-the-magic-bullet-for-webrt...


I did something similar for Mac, a while back[0]. I never really developed it much farther, because of the latency issues. Since it was for surveillance cameras, that was a showstopper.

[0] https://github.com/RiftValleySoftware/RVS_MediaServer


I think we evaluated something like this (ffmpeg to rtc with kurento) to broadcast the screen of mobile devices to a web browser. If I remember correctly, with the correct ffmpeg settings, latency became more than acceptable.


I believe that. I'm sure that I could have greatly reduced the latency, but tuning ffmpeg is not for the faint of heart, and my heart wasn't really into it.

Anyway, HLS has latency, just by definition. The "H" stands for "HTTP" (a synchronous protocol, based on TCP). RT[S]P uses UDP or RDT, and is isochronous.


HLS has gotten better in this regard: https://developer.apple.com/documentation/http_live_streamin...

This provides capability for much lower latency, including effective push by blocking for playlist updates.


To the author: if you really want to be permissive about what others can do with your software, a MIT, BSD, or Apache 2 license (which is more complete in that it even includes a patent grant) seem to be more widely recognized and well tested than the Unlicense. Unless you did choose that license for some solid reasons, I'd suggest to consider switching to one of the other better regarded licences.

* https://softwareengineering.stackexchange.com/questions/1471...

* https://news.ycombinator.com/item?id=3610208


I was just looking for something to do this, but couldn’t find much. I need to serve up about 1000 cameras to both hls (for public) and webrtc (for low latency/ptz admin use). Today we do it with paid packages, but I was exploring just using ffmpeg + nginx. Hls is easy enough, but since webrtc is not http, needs its own piece. Anyone have ideas on this? I’m familiar with Wowza and Ant. Any other open source utilities that do rtsp to both hls/webrtc?


Would you mind sharing the paid packages you use?


Wowza. We put an open source nginx cache layer in front, and cloudfront in front of that. It’s a pretty cool setup, but I feel I could do most of the HLS in nginx+ffmpeg.


I am also exploring the webrtc(ingest)-to-llhls. FFMPEG doesn't seem to support llhls now, as it requires web server intelligence. For the webrtc, mediasoup is nice being node based.


Since we're on this topic, I want to ask a question:

How do I play video files stored in my VPS to Chromecast?

I want my mom to watch a video, from her TV, but I can't upload it to YouTube due to copyrighted content (yes, even if you set unlisted, YouTube will block it).


catt (Cast All The Things!) is your friend.

https://github.com/skorokithakis/catt

You might need a VPN as whatever is running catt must be able to connect to your chromecast and chromecast must be able to pull from whatever is running catt.

We watch all the movies this way - just cast an mp4 file. Works great on a local network.


Plex and jellyfin are options if its a video that they might want to watch repeatedly at later times.

Both let you stream their videos to Chromecast last I checked.

Plex also has support for pictures which might be interesting in some related cases


[help request]

I created a commercial product Video Hub App and have been trying for a year to get streaming a video from a PC to an iPhone working (through a PWA, not a dedicated iOS app) and have had 0 success. I could get the video stream to play on a separate laptop through Chrome, but iOS Safari kicks my ass.

Does anyone have suggestions / ideas?

https://github.com/whyboris/Video-Hub-App

https://github.com/whyboris/Video-Hub-App-remote


Easy thing: iOs on iPhone does not support MediaFileExtensions, so you can't use <video> tag with dynamic source.

You can however go the way described in the post: instead of requesting data though the data channel, you can initiate video/audio channels and make your streaming work pretty much like google hangouts, having your streamer as a participant.

It is not the recommended way though. But no other way for iOs anyways.


I don't understand, why not use webrtc or hls, on iOS Safari?


What's this? You have a server component in the PC, with access to local videos, then you want to play those videos back remotely on the iPhone?


Correct! I've even tried transcoding via FFmpeg - successful video playback on PC over WiFi but not on iPhone - https://github.com/whyboris/Video-Hub-App/pull/611


Yes, if your mobile app is a PWA, you would definitely need to transcode most videos. I'm not sure of how you did that so far, but in my experience, all web browsers support some specific codecs, and nothing else. Same for container formats. For example I haven't been able to open a Matroska (MKV) video with either Firefox or Chrome. And if those browsers have their limitations, I can only assume iOS Safari is even more strict with what it can consume!

I think your idea was the safest one: transcode the file in the main PC, then send it out to a <video> tag in the phone; I'm surprised that didn't work. What protocol is the transfer done with? Maybe iOS Safari doesn't support it, or if it's plain and simple HTTP, it might silently fail if the source is not HTTPS with a proper certificate.

You could try WebRTC. Safari supports it, and it's well tested at this point. It's more complex, having to send SDP messages back and forth, and having to care about browser limitations (e.g. videos cannot autoplay if they have audio, things like that). If you decide to go that route, this project might be just what you needed for this task ;-)


Since there are probably some people experienced with ffmpeg here, is it possible to to image zooms with ffmpg that go deeper then zoom factor 10?

I can zoom up to factor 10 like this:

ffmpeg -i someimage.jpg -vf "zoompan=z='10-on/100':d=1000:x='iw/2-(iw/zoom/2)':y='ih/2-(ih/zoom/2)':s=1920x1437" zoom.mp4

But everything above a zoom of 10 seems to fail. Is there a hard limit in the code for some reason? Some way to overcome this?

Or is there another nice linux or online tool to do zooms into images?


You could just crop then scale which should give you a lot more control.


Great work! This makes WebRTC much more accessible, it not being available in ffmpeg makes people default to worse alternatives.


hi, I dont intend to hijack this thread and I dont know if this is considered [OT].

I came across your post[0] about KVS from a while ago. Thank you for your work on pion and KVS.

A quick question on the KVS, C implementation. Is this in anyway tied to be used with AWS Kinesis? Can it be used with Wowza for instance?

[0] https://news.ycombinator.com/item?id=21951692


Yes it could! It is vendor agnostic, just provides a WebRTC API that can be used anywhere.


Awesome! Thank you.


Offtopic, are there any streaming gateway to automatically insert CC subtitles into the video container on the fly?


As someone completely new to go, how do I run this? I have go installed but I cant seem to get any of the sample commands to work. I pulled the repo, cd'd into directory and ran the GO sample command that was provided in the source, but the terminal just hangs and blinks with no output.


This is what I got for

go run . -rtbufsize 100M -f dshow -i video="Integrated Webcam" -pix_fmt yuv420p -c:v libx264 -bsf:v h264_mp4toannexb -b:v 2M -max_delay 0 -bf 0 - < SDP

Connection State has changed failed

Peer Connection State has changed: failed

Peer Connection has gone to failed exiting


When pasting SDP back to browser, make sure text box is empty. It has an annoying space there that can mess things up.


Would using gstreamer instead of ffmpeg offer better or worse performance? (Less CPU usage on the sender side?) If anyone has experience with this setup, I’d love to know.


Cool! I wish there was an easy way to consume browser streams in FFmpeg — the other way around.



Sending one frame per message is quite expensive, I'd do some buffer aggregation instead




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: