If you're converting .aax files you should consider using .m4b as the output, since it preserves chapters and remembers your last listened timestamp [0]:
> Audiobook and podcast files, which also contain metadata including chapter markers, images, and hyperlinks, can use the extension .m4a, but more commonly use the .m4b extension. An .m4a audio file cannot "bookmark" (remember the last listening spot), whereas .m4b extension files can.
The cool thing is that once you rip your activation bytes, it works for all your audiobooks. Would definitely recommend it.
There's no functional difference between a .m4b file and a .m4a file. Both use the MP4 container so adhere to the same specification, so support all the same features (including bookmarks). FFmpeg even uses the same muxer and demuxer for both "formats".
The only difference is a non-standard convention used by certain software (like iTunes) to write autiobook-related metadata only to MP4 files that use the .m4b file extension.
You'll get exactly the same result if you just change the file extension after remuxing/transcoding.
Well, it's also worth noting that different file extensions can have different associations, so .m4b is more likely to open in an app that the user wants to use for audiobooks, rather than opening in a generic mp4 audio playing app.
Even iTunes, I think, would treat files differently between m4r (Ringtone) and m4a (audio) files, so despite there being no difference at all, using the 'correct' extension might be quite a bit more convenient in the long run.
Wait, so an m4b stores playback position in the file itself? As in, the checksum will change and file syncers like Syncthing will re-upload every time I hit pause?
I don't know Syncthing internals, but state of the art in file syncing is to use rolling checksums to identify which parts of the file have changed. If only a few bytes of the file are overwritten, only the immediate vicinity of these bytes would be synced.
The information online about M4A vs M4B is wrong. There is no difference other than the file extension. The Wikipedia article links to a lifewire.com article about the bookmark claim. This container format can store XMP metadata, and you can certainly have a player that saves a playback position in the file's XMP metadata, regardless of its M4A or M4B extension. But every player I know of doesn't do that. They store playback positions in their own internal database.
This claim seems to originate from the fact that the old iPods only remembered the last played position on M4B files. But that's entirely a player convention, not a file format convention.
So it writes the timestamp to the file metadata? That would cause issues with syncing, backups, running from a read-only filesystem, etc. My audiobook app already keeps track of current timestamp for me.
The amount of time I spent on correcting wrong chapter marks (like the first four books of the Wheel of Time) is masochistic. But I am absolutely happy that I have the possibility of doing so after the conversion :)
Maybe you would like to try m4b-tool chapter-adjustment by musicbrainz id or silence-detection? :-)
Disclaimer: I'm the author - https://github.com/sandreas/m4b-tool
I once planned a chapter database (https://www.chapter-db.org) to collect work like yours, provide an online chapter editor and bind an api to m4b-tool, but i did not have the time to finish the project.
Heh, I just started The Dragon Reborn on my 3rd read-through. I have noticed the chapters aren't right, but it hasn't really bothered me. What does your workflow look like when fixing these? Is there a way to share chapter corrections so others can apply them to their files?
I have the corresponding ebooks too, so I open up the whole audiobook as one file and guess where the chapters are from the waveform. I listen to the guess and compare to the ebook, checking whether I am too far or too early. Once I find the correct position (and I got quite good at spotting it from the waveform), I set a marker and start with the next chapter. In the end I split it along the markers.
I was planning on writing something to spot when they say "chapter" as it is always the same but I never got around to that. Also, doing all that work was almost meditative :)
A way to share the corrections would be to export the markers from audacity but sadly I don't have that data anymore, though I could calculate the markers from the files I exported if you are interested.
Well, if you own the epub, you could try to find out the whole length of the audiobook, then extract the whole text of the epub splitted by chapters and then relatively match the text length to the audio length and put the chapters where the nearest silence is (chapter 2 is at 3.3845% of the whole text, so seek for a silence around 3.3845% of the audio length)
I got some pretty good matches with m4b-tool here, while it does not work for all audio books (you need the latest pre-release for this very experimental undocumented feature!):
# try to match my-book.epub on my-audiobook.m4b
# ignore first, second and last two epub-chapters for the match (dedication etc.)
# split chapters into sub chapters to ensure they are between 5 and 15 minutes
# create a backup of the original chapters (done automatically)
m4b-tool chapters -v --epub=my-book.epub --epub-ignore-chapters=0,1,-1,-2 --max-chapter-length=300,900 "my-audiobook.m4b"
# omg it did not work and messed up all chapters, please restore the original chapters
m4b-tool chapters -v --epub-restore "my-audiobook.m4b"
# ok, lets only dump the findings in chapter.txt format to do it manually
m4b-tool chapters -v --epub-dump --epub=my-book.epub my-audiobook.m4b
Yeah that does sound like a lot of work. I appreciate the offer but like I said it hasn't bothered me much. I don't know that I've ever relied on chaptering for audiobooks, other than for breaking the book into smaller pieces to make the scrubbing less sensitive. My mental model is much more of a linear monolith.
I wanted to create a slideshow a couple weeks ago and came across this article [0] on creating a Ken Burns Effect Slideshow. Very cool and is a great demo of some of ffmpeg's functionality.
Just so you know, FFMPEG has a long-standing bug (reported 8 years ago) [1] that affects precisely the type of work you're doing, converting static images to video.
The gist of it is that if your static image is in bgr24 (most of BMP images) format, when converting to typical video pixel format (like yuv420p), the color will be distorted.
This can be worked around by converting to rgb24 first (which is exactly why this bug is bizarre, since the two should be practically identical.)
(There is also the BT601/BT709 conversion thing, but that's not a bug, just something need to be taken care of.)
On many systems I work with and have to debug, gnuplot won't be already installed (and won't be installable on a system not connected to the Internet) so 'rich' would be 'flush in packages with a full system available'.
Sometimes, even imagemagick isn't there, but rsvg-convert is, you can still do amazing things with just bash+curl+svg...
Apparently gnuplot is unrelated to GNU, and the original developers were making a pun on "newplot." It's license is a bit complicated, which is why most systems don't have it.
Or the IT security guys that expertly hand-picked all your available packages for minimum attack surface removed it (but let imagemagick... after some time you stop asking some questions :-)
If you look at it as more of a "composable" interface (as the sibling poster suggested), it makes a lot more sense.
No one is going to type out this one-liner from scratch, or have an easy time understanding what it means by reading it, but as it's made up of a series of smaller, more easily understood commands, in a shell or Python script it could be vastly more legible and, dare I say it, usable.
This is also the reason why there are so many frontends to ffmpeg, to simplify various specific tasks. I can't count how many one-off apps I've seen that do one thing and do it well, and just ship a full copy of ffmpeg to do that one thing. Making an actual GUI for all of this would be just... insane, really, but it's so versatile and flexible that you can basically do anything with it.
The problem with calling ffmpeg multiple times in a script is you often waste compute time.
If you can manage to cobble together a single command (as illegible as that may be), you might be able to do what you're looking to do in far less time
I've been working on making "pre-baked recipes" for a while to help with simple tasks like cutting/merging a video [0]. I recently made an npm package for making time lapses with effects from the command line with FFmpeg as well [1].
I'd recommend looking at something like kdenlive, it generates the ffmpeg command and script for you. It's still complex and difficult to do somethings but it can be much nicer than trying to work out the command line interface for something you want to do. It's also nice because you can save the project at a higher level and reopen it later without having to do all the work of figuring things out again.
The syntax is very difficult to wrap your head around, but beyond that, it is the implicit behavior that makes it difficult to reason about. I find that the short ffmpeg commands that take advantage of automatic stream selection and omit the inputs/outputs of filters because ffmpeg can hook them up automatically to be much harder to reason about than the fully written out, fully explicit ffmpeg command lines.
It almost seems like the best way to understand an ffmpeg complex filter graph is to actually draw a graph...
I actually made my own CLI frontend just because I didn't want to try and memorize ffmpeg options to do the simple things that I want to do most of the time. Now I can just do `--h264 -s X -e Y` to do a h.264 encode from X timecode to Y timecode.
Yes, because that doesn't actually work proper. In order to seek fast while having the ability to start encoding at any point without issues with keyframes, I need to actually do `ffmpeg -ss X1 -i FILE -ss X2 -t (Y-X)` where X1 + X2 = X.
That's awesome.
Every section on the command is simple, but taken as a whole, it is pretty thorny.
I find that MoviePy (https://zulko.github.io/moviepy/) is a great tool for more complex operations like the aboce. A lot of MoviePy's functionality is derived from an FFMPEG wrapper, but it is just easier to split things up into a small script.
Just did a little 1-day hackathon project where we built out a video presentation automation tool. One of those components would compile together a series of photos, video, and audio clips (with optional text to go with it for subtitle generation) and it would build everything out. The final command would look insane, there being a line for each added clip and any spacing between them, but it worked perfectly.
I have used ffmpeg. It's a damn good project, and under current development and support.
It also appears to be the only game in town. Many commercial offerings are really just veneers over custom ffmpeg implementations.
Tuning it is also pretty crazy. Some folks can make entire careers out of just tuning ffmpeg.
I think the biggest issue with video software (besides it being difficult and performance-intensive), is the prevalence of a lot of old, highly-enforceable patents.
Video has been around a while, and companies like Ampex patented a heck of a lot of stuff that can easily be applied to current video.
ffmpeg actually has a couple of build configs that are designed to remove coercive-licensed components.
I'm not so happy about that, but it's the world we live in.
I do have a project that I was playing around with (and will get back to, sooner or later), where I made a simple MacOS wrapper for ffmpeg:
I use FFMPEG as a thermostat to keep my apartment nice and warm :)
I encode all of my videos (phone, DSLR, dash cameras) to h265 on my Ryzen workstation when it's not use.
I have a primitive "PID-controller" script pulling temperatures from influxdb (data collected using a few esp8266 with ds18b20 sensors) and adjusting -threads parameter accordingly. It automatically adjusts presets (slow, veryslow, placebo, etc) depending on number of videos in the queue, so it never runs out of material to encode :)
It saves me from using stinky baseboard heaters and reduces my HDD bills!
Can someone explain to me how FFmpeg seems to be the only open-source software to do even just basic functionality with audio.
I was looking at getting the sound wave graph for a piece of audio a while ago, and not only was FFmpeg the only option I found to be able to do it, it was amazingly fast and also free.
It’s not perfect but it’s way easier to use for audio stuff than FFmpeg is. I have a bunch of scripts I reuse that do basic stuff like high-pass, normalize, automatically trim audio files, add fade-in or fade-out, downmix to mono, and then resample / dither to the right depth and size.
It also will spit out spectrograms.
Generally when I need to record a ton of sound clips, I chop the audio up and rename it in a GUI editor similar to Audacity, and then do all the processing in SoX. I might also do a bunch of work in a DAW beforehand.
The man pages are chock full of examples too, which is great because the tool does a lot. Some of the examples are really interesting too, such as the delay effect showing how to synthesise a guitar chord.
I use an audio player built largely around sox¹, and it allows you to take advantage of the power of sox.
SoX is amazing because it indeed makes very nice spectrograms which visually show how audio is encoded. It makes it easy to see if this really is a lossless FLAC or a crappy 192 VBC mp3 audio source.
If you personally hear the difference is a completely different subject of course.
I hadn't even thought about SOX 'till your comment in about 10 years. And looking at the page, there hasn't been a new release since 2015.
From what I recall, it only worked on wav files back in the day, but now it supports OGG. But a lot has changed in even 5 years - does it even support MP3, as patents expired since then?
> From what I recall, it only worked on wav files back in the day
It depends on your build, but on my system it supports: 8svx aif aifc aiff aiffc al amb amr-nb amr-wb anb au avr awb caf cdda cdr cvs cvsd cvu dat dvms f32 f4 f64 f8 fap flac fssd gsm gsrt hcom htk ima ircam la lpc lpc10 lu mat mat4 mat5 maud mp2 mp3 nist ogg paf prc pvf raw s1 s16 s2 s24 s3 s32 s4 s8 sb sd2 sds sf sl sln smp snd sndfile sndr sndt sou sox sph sw txw u1 u16 u2 u24 u3 u32 u4 u8 ub ul uw vms voc vorbis vox w64 wav wavpcm wv wve xa xi. You can check your own with `sox --help`.
I just use SoX for processing audio data, and then pass the result to LAME if I want an MP3. Each format has so many different options for encoding and metadata anyway. It’s not like video, where the sheer amount of data discourages you from working uncompressed.
Sure, there hasn’t been a new release since 2015… but would that be necessary? It’s not missing any features I want.
It's not important that it doesn't support mp3. That's not it's purpose - it doesn't need to. The unix philosophy. Feel free to pipeline it on either side with tools that do support MP3.
according to https://github.com/chirlu/sox/commit/af261dcc91071cafd7d8305..., sox added support for Ogg Vorbis files in 2001, which is a little more than 5 years ago. since sox didn't exist until 1999 and vorbis didn't exist until 2000, that seems like pretty solid format support to me.
In addition to what others have said, there's also gstreamer and its suite of plugins. I find gstreamer a bit easier to work with, although both are very complex pieces of software and each have their own quirks.
If you're looking for audio production work, there's Ardour, although I haven't used it myself. http://ardour.org/
Indeed it does; it's about as complex as ffmpeg, and in my opinion has a somewhat more intuitive interface for building up complicated pipelines of processing steps:
You can use gst_parse_launch to create a pipeline using the launch syntax.
I've found this helpful to prototype with gst-launch-1.0 and then pull into a separate program down the road. I found it to be pretty hairy trying to create and link all the individual elements manually in complex pipelines.
My DAW is bash+sox+ecasound because I don't want to be distracted by visuals when working with audio. However, I just started working on a project involving about 15 hours of digital audio recorded under less than ideal circumstances a couple of decades ago and need a reliable way to analyze the data. SoX produces spectrograms that are insufficient for my needs and I've had reliability issues with Audacity. So far, DFasma looks very promising:
What ever happened to Facebook's (or was it Netflix?) technology to create a new unit of time measurement to help align audio and video files? I believe it was called a "flick"...
Audacity is a great GUI for working with audio files. I would think it has a way to export a graph of the wave that it shows you when you open up an audio file.
You can install an FFMPEG plugin for Audacity if you need broader support of audio formats (either import or export).
- You don't trust so much complex logic, taking untrusted input, written in C and want to rewrite it in Rust.
- You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.
- ...
I think the main reason there isn't any alternative is that it supports soooo many formats that the task seems impossible to anybody thinking about it.
> - You want to code it all again using an API that doesn't expect to get its input from a blocking read() function.
In which real world situation/scenario is this a problem? It is hard to think of one, but I am probably missing something?
In any case, if that was a real show-stopper, it would probably be much wiser to go with a fork that would modify that one thing, instead of re-writing the whole project.
I could see it being an issue if you were doing a bunch of streaming transcodes, and wanted that in an event loop instead of blocking... but
a) you're probably going to want to control the number of simultaneous streams to a low enough number that you could just fork
b) the responsible thing to do when decoding streams with ffmpeg is to disable all formats except your whitelisted format, but still sandbox the heck out of it, because there's been a lot of CVEs where a crafted input allows remote code execution
Sandboxing is going to be much more complete if the ffmpeg process is only dealing with one input fd, one output fd (maybe an error reportint fd), and no network or filesystem access --- you don't want a decoder error to influence media you're encoding/decoding for another user.
As a little side project, I've been trying to automate creation of those "1 second everyday" style videos [1], and used FFmpeg to achieve this.
For things like trimming and concatenating videos, one thing that surprised me was that it was slower than using a tool like ScreenFlow. Note, we're talking about hundreds of gigabytes worth of 4K videos.
slower = When I say slower, I mean, if I manually performed the same operation in a professional video editing tool like ScreenFlow, the time it took ScreenFlow to export a video was quicker than the time it took FFmpeg to finish executing the command.
Interestingly, there seems to be a fast and a slow way to do things in FFmpeg [2]. The slow way is free of quirks, whereas the fast way introduces something unexpected to the video, like a half a second of a black screen with audio continue playing like normal.
I'm still curious as to how a tool like ScreenFlow can achieve faster trimming/concatenation/subtitle overlaying, than FFmpeg. I suspect if I read their documentation and do some more research, I'll discover a more optimal way of ordering the various flags on the command line which can speed up the execution, while preserving accuracy.
Tangent: you reminded me of one of the coolest auditory experiences I've ever had. Roughly one-and-a-half decades ago I attended a public lecture by Olivier Nijs, a sound design guy from my region[0], about how he built an automated set-up from an old desktop to record one second of 7:00 in the morning every day. Then he manually cut together one whole year. The amazing thing about it was that after a few seconds the long-term trends really started to become noticeable. The changing sounds of birds, people and other living things. How rains in spring were somehow just a little different than the rains in summer or autumn. It was really, really amazing.
(the artist himself wasn't that impressed with his own work - perhaps he saw someone else do it before and didn't feel like showing off with something unoriginal or something?)
Most likely you are transcoding the video instead of copying the raw stream. A lot of more complicated stuff requires that but things like trimming can be done the fast way by simply cutting of the irrelevant pieces in the raw encoded data itself which is much faster. It’s kind of a sport to find the exact string of flags that has the correct effect without transcoding :)
The reason it often fails is that ffmpeg can do so many things that any time you are using some curious combination of flag A, B and C is likely that no one else has ever done that and there are some side effects ;)
Anyway, some of it can be avoided by learning how containers en codecs work, what I-frames are and all the other nitty gritty details of the world of video where there is so much to learn!
Yes, copying the raw stream is the way to go if you're not rescaling, etc. This is the command to extract 15 seconds of video starting at the first minute, and not reencoding; it should be quite fast and it's also quite self-explanatory:
> The slow way is free of quirks, whereas the fast way introduces something unexpected to the video, like a half a second of a black screen with audio continue playing like normal.
The reason here is the the "fast way" and the "slow way" work very differently behind the scenes.
The "fast way" looks only needs to look at the container, the bits of metadata that tell a player which bits of data need to be given to the decoder at which time. It can just take a blob of data and stick it in another blob of data without looking at the contents.
The "slow way" actually decodes the frames, that is, it takes the blobs of compressed data and turns them into actual pixels, which especially in the case of 4K video, is very slow.
The reason the "fast way" might be less accurate is that the frame you're asking for might not be possible to obtain without decoding the video. Modern video codecs have different kinds of frames and some frames depend on the frames before or after them. If you took such a frame and just jammed it into another video, things would break, because the other frames it refers to are missing.
> I'm still curious as to how a tool like ScreenFlow can achieve faster trimming/concatenation/subtitle overlaying, than FFmpeg.
It's likely that they have optimisations that FFmpeg doesn't or cannot have. FFmpeg has a bit of an emphasis of being able to play and handle pretty much anything you throw at it, no matter how broken. It could be that accurate input seeking is difficult while preserving that reliability.
One option that's probably not an option for you but you might consider is encoding your video in an all-intra format. This is fairly standard in the video editing world. All-intra means that all your frames are independent of each other and can be moved around by editing software without decoding anything. Doing this will result in larger files, however.
> The "slow way" actually decodes the frames, that is, it takes the blobs of compressed data and turns them into actual pixels, which especially in the case of 4K video, is very slow.
The "slow way" decodes and re-encodes the frames. Decoding is a little slow. Encoding is very very slow if done at high quality. (And even high quality is still lossy.)
> The reason the "fast way" might be less accurate is that the frame you're asking for might not be possible to obtain without decoding the video.
This is often possible to solve with an .mp4 "edit list". You include more data than is expected to be displayed along with instructions for the player to skip part of it. One obvious caveat is that the person you send the video from can remove the edit list, so the hidden frames shouldn't be anything you want to redact for privacy.
It depends what you're doing, there are many different ways to cut a section out of a video. If you do it copying the streams (not reencoding anything) ffmpeg should do it at almost the speed that it can read and write to disk. However if you are reencoding the video, the speed will depend on every parameter controlling the encoder. If that's what ScreenFlow is doing, it's probably using hardware accelerated encoding too.
Did you configure FFMPEG to encode your video with the exact same encoder as ScreenFlow? If Screenflow is using hardware accelerated encoding and FFMPEG was configured for software x264 encoding, that might explain the discrepancy.
This is where GUI's shine, as opposed to command-line.
Perhaps this is slightly off-topic, but my dream is an interface that combines the best of both worlds.
Kind of an automated GUI-builder for command-line tools, that analyzes the combinations of options used most, breaks them down into workflows with options (that can be manually named), and you can thus execute one-off commands easily and quickly without having to hunt through man pages, but still export the command as a command-line incantation for reuse, to use in a script, etc.
I had a similar idea (if not the same), while learning about ffmpeg a year or two ago, and quickly put together a small prototype of it. At least, for the GUI command builder part of it.
Seems like you'd need a universal CLI tool usage traverser and parser to figure out what's possible.
Likely this would produce a decision tree of sorts with different modes and options excluding or including new options.
We'd need a way to show all this, maybe nested tabs for modes and check boxes and other inputs at each appropriate level.
Layer on this a way to optimize for the most common cases like you said.
Further, if this becomes popular, CLI tools could emit some sort of standard description language that would optionally customize the GUI.
The GUI's output should be both the text command it constructed and the ability to run that command directly.
More future steps would be a way to reason about multiple commands, pipes, and other combinators.
Adding this to my side projects backlog. Thanks for the idea!
In the emacs world, there are textual interfaces like magit's [0] or dired's [1] interface, that tick many, perhaps most, boxes that you mention. Magit is basically that interface, but tailored to `git`, that also constructs the actual git command, should you want to see it. Dired is like that for ls, rm, cp, mv and other file utils. So, as a general design, they _may_ be of useful interest.
A tool to automatically parse `man` pages or help prompts from tools would be a dream come true, basically.
Apart from the possible commands, it may be useful for the GUI to also show some kind of state, for example filesize (akin to invoking `ls` before `ffmpeg`, as you would normally do on the CLI).
Done with the right abstractions, command combinations should come almost for free.
It's fine for simple transcoding jobs. Once it reaches certain complexity you at the very minimum want to outsource it to a shell script to organize the arguments in multiple lines or switch to some other language bindings that aren't limited by shell argument parsing.
Several years ago I was attempting to automate a very long chain of AV transformations using ffmepg and such. I distinctly remember having a dream of a green-text black window command line prompt of ffmpeg incantations and that is when I realized that I had been perhaps digging too deep.
You probably want "-f bestaudio" instead of "--extract-audio". The former will download just the audio and skip the video which is significantly faster (and cheaper when on a metered connection). However you will have to do the conversion to MP3 yourself afterwards (e.g. with ffmpeg) if you want that format specifically.
These days, with AAC being supported pretty much everywhere you're probably better off just using the " -f 140" option to get the 128kbps m4a file, which at least saves the further (albeit subtle) degradation caused by another lossy to lossy transcode.
`ffmpeg` reminds me of `convert` from ImageMagick, it's complex because there's a lot of media formats and the command lets you do a lot things to the media as well.
I liked using WinFF a time ago for simple conversion jobs though where I didn't care too much about tweaking all the knobs.
Is WinFF still around in some capacity? The website doesn't load for me. Too bad if it's abandoned, because I remember it being pretty amazing.
EDIT: From Big Matt's blog:
> Unfortunately I lost the WinFf.org domain. I was broke but I don't think it's that important after all these many years. You can still go to github, video help, and others.
works as well, but there's just _so_ many more options inherently involved in converting a video: most inputs and outputs will be at least 2 tracks, so that's 2 codecs, and the majority of containers used now (MP4, MKV, etc.) support all sorts of codecs, so while ffmpeg will "guess" what you want just like ImageMagick does in your short example, the chances that it guesses right are a lot lower just on the basic level of what format you wanted your output to be. But it's not exactly ffmpeg's fault.
I think gstreamer's gst-launch syntax is a bit easier to read and write, although it has a different set of capabilities than ffmpeg, so may not be a whole replacement for what you want to do.
ffmpeg can now extract data from the .tun & .pcm file formats (but it might not be able to decode it yet, depending on the encoding used in such files).
With so much video processing research using neural networks now I hope ffmpeg gets better support for it. They have some filters that use them, like sr[0] for super resolution and dnn_processing for general processing, but the user experience isn't great. They need a model file that's not included, and you need to train one yourself since there doesn't seem to be any included. Hopefully they add better support in the future, together with more dnn filters.
ffmpeg is fantastic. I'm an iOS dev and when I put something on Github, I first record the iOS simulator with QuickTime. Then I convert the resulting .mov file with ffmpeg:
ffmpeg -i example.mov -r 15 example.gif
Voilà, an animated gif. Quality is atrocious but it gets the message across, plus the filesize is not too big.
FFmpeg is hands down one of the most powerful and feature-packed tools I've used out there.[0] The associated complexity is also daunting, but thankfully there's a lot of documentation out there and it reflects the low level nuances of audiovisual formats.
I highly recommend anyone struggling to utilize it to write wrapper scripts around it so you only need to figure out things once. Here are some things I've done with it by that approach:
* Extracting any embedded subtitle files from MKVs. Nice if I want to search them or make changes.
* Back when GIFs were more popular, I converted any that were over 3MB to video to save space. If the output wasn't small enough, it would do a second pass with different settings to get it more compact. Not needed that much these days.
* "Barcodes" for videos, that is, it takes every second converted to a vertical sliver and combined you get an overview of how the average color of the film changes through its duration.
* A tool for creating video excerpts that lets me specify a start time and end time in more flexible timestamp formatting, and other things like a simple parameter for the output width.[1] It also allowed specifying a target filesize and did the math so the right bitrate would be chosen. I even include metadata so I know which original file it was made from and the parameters specified.
* Thumbnail previews. A lot of file sharing sites will include a file that includes some timestamped screenshots in a grid with encoding information at the top. This is good for movies so you see a high-level overview. The best part about doing this myself is that I could make it highly configurable, like choosing exactly how many images I want, the interval, whether I want timestamps, etc.
Note, for some of these, I also needed ImageMagick.
Also, when compiled with the right flags and libraries, FFmpeg has some really neat features: things like embedding subtitles, stabilizing video, hiding logos, etc. I recommend looking into the filters.
Thank you for all the manpower that goes into the project!
[0]: Two other ones that are also powerful are ImageMagick and Pandoc.
[1]: I initially wrote this in Bash, but later converted the code to Python to better handle command line arguments and allow things like using config files.
ffmpeg is absolutely phenomenal. I recently used it to combine multiple separate audio tracks from a webrtc session into a single file.
For anyone that hates compiling ffmpeg from source, John Van Sickle does an amazing job of doing the work for you by making binaries publicly available for each version:
https://johnvansickle.com/ffmpeg/
I rely on those static builds and they are a lifesaver. Careful with linking to this website from automation scripts, etc. as the downloads are regularly swapped and only certain versions (IIRC latest in a point release) are kept available. E.g., 4.3.0 will be taken down once 4.3.1 is available, however 4.3.1 might be permanently moved to the archive if it shall be the last 4.3.? release before 4.4.0.
great point. What I've done in the past is have a jenkins job that points at the latest release and trigger job manually when you know you want to upgrade to the latest version.
I don't usually rely on a specific version and am fine running a release (or five) behind. If you don't need the latest and greatest pointing to an archived old release may save you some hassle.
Good resource, but I usually end up building my own to take advantage of the improved AAC encoding with libfdk-aac which can't be distributed pre-compiled.
If anyone is compiling their own version for use with IP video streams, here is a modification that adds to the av_read_frame() function a call to the avformat interrupt callback.
This effectively adds the ability to monitor the IP stream for unexpected termination.
The current implementation of av_read_frame() will hang if, for example, a human trips over a camera's cables and the stream abruptly terminates. Without modification to the API, this change to av_read_frame() calls the avformat interrupt callback each loop through av_read_frame()'s reading of packets. All the callback needs to do is look at the time, and signal error if the time between callbacks exceeds something reasonable.
I am not sure why, but this change was not accepted by the ffmpeg developers. I find it essential for working with IP video and IP video cameras.
I've been an ffmpeg user for something like 15 years now. The project never ceases to amaze :heart:
I remember using it to encode MPEG-2 DV footage to FLV (yes, flash video) to live-stream footage captured over firewire from early prosumer HD cameras :) It's always been a solid video swiss-army knife!
FFmpeg is one of the prime examples of open-source software, but a lot depends on the 3rd party libraries it uses. I tried using my Android phone to convert my videos from H.264 to HEVC (H.265) for storage, but it was ~3.5 times slower in comparison to x86 CPUs, in terms of frames encoded per second per watts of power consumed (FPS/Watt): https://quanticdev.com/articles/h265-encoding-on-arm-cpus
Though still, I can't even imagine attempting this test without FFmpeg in the first place. It is available directly in Termux on Android.
FFmpeg started out as frustrating for me but the more I use it the more I love it. The ability to split videos into smaller segments and applying different filters to each segment and finally combining the segments is just great.
I've been working on a web-based video editor for app features: https://glitter.now.sh/ and i've had tons of fun tweaking FFmpeg.
My only wish would be that the documentation would include video samples for the example commands (I'd love to help with this).
I use ffmpeg to run a fun twitch stream VOD to highlight reel pipeline. Example with explicit language: https://www.youtube.com/watch?v=ETR3IXyGgEo.
ffmpeg handles everything from frame-extraction (to feed a deep learning model), audio spectrogram calculation (also features for the model), video trimming (to cut the interesting clips), video concatenation (to join the clips), and the text overlay
It's just a standard vision convnet like ResNet-18, or ResNet-50. It gets fed facecam with an audio spectrogram concatenated to it (pretty hacky, but seems to help). All it does is binary prediction of {interesting, not interesting}, and I use some heuristics to pick regions of video based on how many frames were "interesting" to the model.
Huge thank you to FFmpeg for making my Video Hub App possible: extracting screenshots from all videos in your video collection to create an easy to browse/search library.
Good way to describe it. A "Hub" is all the videos in a folder and all its subfolders. The demo will be 100% the same as the final app but with a 50 videos per hub restriction.
You can build your own (without 50 videos limitation) with just `npm install` and `npm run electron:windows` (or `mac`, or `linux`).
If you choose to buy, it's $3.50 minimum - you can pay more. $3.50 of every purchase goes to Against Malaria Foundation
Does anyone have a better documentation source for modern ffmpeg? Usually when I use it I get all sorts of different answers with different 'methods', and the official docs only confuse it more.
Also I hope to see pure GPU transcoding sometime. H264 to h265 transcodes in pure GPU space are Uber fast, but so far only done by other software.
It doesn't work pure GPU for transcoding between formats. You can do h264 to h264, but not h264 to h265, which can be done with other software if the hardware supports it.
What is wrong with these people? Why the "libvidstab" is always disabled? Are they trying to save valuable disk space? -- Libvidstab works quite good and better than anything else readily available in Linux.
My favorite memory of FFmpeg involved wowing a friend with the "magic of computer hacking" by slicing a subsection of a youtube video, cutting the source audio, and replacing it with audio from a song he had discovered "fit perfectly". Never mind that the entire process took about 20 minutes of googling and trial/error with the Linux Subsystem for Windows. It was better than the native windows alternative :P
nice, I’ve been toying with it recently because I wanted to get some video on the Apple TV I’ve got connected to my TV.
In the process I was reminded of how important Intellectual Property is in our field.
I dowloaded a video from youtube [0] and learned that the video is in a WEBM container for certain types of video compression formats (VP8/9 + another one) as well as vorbis/opus audio. It turns out that to get the best quality on the Apple TV, I should encode to HEVC (H265) video and I guess aac audio.
There’s some sort of history behind this divide. A pain in the neck for people who want to toy around with video, but huge decisions for these companies in choosing the formats they use to move these bits around.
So I can use ffmpeg to re-encode into the new format, and I can play it on my apple devices if the file is local, but I can’t shoot it via UDP to the TV. When VLC (app on apple tv) is listening on a port for UDP I only get choppy audio:
Doing this on the sending side doesn’t seem to work:
We recently started building a video player for go-flutter-desktop using FFmpeg. It's been immensely frustrating as the audio and video is slightly out of sync and requires a bit of tuning.
I agree with the shared sentiment that FFmpeg is awesome.
It's the only way I can get mkv converted for playback in browsers. I believe that browsers don't support mkv/aac natively because of licensing but I would be interested if anyone has a different solution for browser playback.
Love me some ffmpeg, but would be nice to have some bullet point summaries of updates in the release notes instead of pointing to the changelog dump. I know there is some information loss, yadda yadda
I've been feeling stupid these past few days because I've been using ffmpeg for the first time. I've been trying to take an rtsp stream from a security camera and transcode it, restream it, to something that can be viewed in an html video tag. It works, with poor quality, in Firefox, but in Chrome just a green screen. (I also tried using vlc/cvlc's streaming). I've been feeling stupid because of all the copy/pasting from the web into the command line without understanding what I was doing.
does anyone have a good installation guide, which includes all the stuff that isn't supported in the default installation? ffmpeg is great, but a lot of times I go to use it and find a feature isn't supported because I didn't compile it with a flag set.
ffmpeg is great tool, Once I needed to do some basic video editing stuffs like cuts, fixed logo and some sound background on low resource computer, I quickly downloaded ffmpeg and work finished with few google searches and 10 minutes without any load on system.
I use especially yd and ypl constantly. Sites where youtube-dl doesn't work, this usually does: Get the master m3u8 or an mp4 link from the page using Developer tools->Network[1], copy link to clipboard, and
vd myfile - downloads the video as myfile.mp4
vd () { youtube-dl -o "$1.mp4" "$(pbpaste)" ; }
[0] Save them in your .bash_profile or equivalent on your machine.
[1] i.e. with Network tab open, refresh page and start video playing.
Just a reminder, depending on your installation method of youtube-dl , you may have to manually install ffmpeg globally on your system in order to download/mux highest quality videos.
And for good reason Insane amount of customization, and once you finally have pieces together your 10 line command with 40 parameters, you can just give it a huge list of URLs and let it do all the work in the background.
This is useful if for some reason you want to archive them or play them in an app that doesn't constantly change its UI and bombard you with ads.
[0]: https://www.kylepiira.com/2019/05/12/how-to-break-audible-dr...