Companies that caused HEVC patent mess had it coming.
So I understand AV1 is great for streaming content when you have a lot of encoder computing resources.
What about other video codec application areas?
Like real-time peer-to-peer low latency streaming (<50 ms codec latency)? Like for remote desktops, remote gaming, video conferencing, etc. What rules the roost in that department?
More specifically, which codec would have the best image quality for low-latency 400-4000 kbit/s (== ~50-500 kB/s) 720p/1080p streaming for different computing resources (like say for these cases: ARM A53, high-end ARM (like phone flagship models), mobile GPU, modern x86 chip and discrete GPU)? h.265 with B-frames disabled? Or something else?
What can achieve the lowest latency while maintaining low bandwidth requirements? (So not including obvious bandwidth-hogging techniques like streaming jpegs or [codec x] I-frames.)
Streaming of computer screens is quite different compared to real life footage. UIs contain very shallow gradients (thus you want 10bit+), lots of flat areas which are easy to entropy-encode without any frequency domain quantization at all (so a modern codec that has passthrough blocks) and font rendering engines use subpixel anti-aliasing so you want 4:4:4 chroma. And that annoying effect where image quality refines gradually after large-scale changes on a screen (not enough bits allocated to an I-frame?).
Screencasting is basically quite a different beast compared to live-action or even animation footage. At a minimum you need higher profiles of a modern codec and tweak the encoder settings for low latency AND different bit allocation compared to the usual.
> Streaming of computer screens is quite different compared to real life footage. UIs contain very shallow gradients (thus you want 10bit+), lots of flat areas which are easy to entropy-encode without any frequency domain quantization at all...
Well, you can get pretty good results with h.264 without B-frames. Modern composited desktops have caused traditional changed block (RLE, LZ) lossless codecs to perform badly.
> font rendering engines use subpixel anti-aliasing so you want 4:4:4 chroma
I think you can just gradually stream more (chroma) bits to stationary areas containing high detail (frequency) information, like text. In other words, tunable bit allocation between motion and stationary detail.
No matter what, there's no way to know what's the subpixel layout on the receiving end. Luckily Retina/HiDPI displays are increasingly lessening importance of subpixel rendering.
The thing is that 4:2:0 chroma is the default in most codecs because the main profiles only support that. You couldn't stream full chroma video on twitch even if you wanted to.
And subsampled chroma looks terrible on colored text, edges between flat colored areas and similar things.[0] So 4:4:4 is a must, only once you have that you can think about adaptive quantization for chroma.
I agree with that, but 4:4:4 also means there's twice as much data.
I think you should be able to get away with using chroma subsampling for moving portions while allocating bits in data stream for stationary areas, without subsampling (by using some heuristics, perhaps.)
Chroma from luma happens to work particularly well for video game content. It's one of the reasons Twitch is interested in using AV1 as soon as possible. Here's a talk by Thomas Daede where he discusses chroma from luma:
4:4:4 doesn’t mean twice the compressed bitrate, however. First, chroma residual already takes a minority of bits - luma residual and prediction info take way more bits and those are unchanged going to 4:4:4. Second, quantizing chroma more so there’s the same number of chroma bits as there would have been in 4:4:4 is approximately equivalent as subsampling chroma in the first place (okay, its actually slightly worse, so yeah you need maybe 1% greater bitrate to match quality with 4:2:0 with modern codecs on sequences for which 4:2:0 subsampling is transparent.)
> So I understand AV1 is great for streaming content when you have a lot of encoder computing resources.
> What about other video codec application areas?
Having lots of compute means you can run the algorithm in a software encoder on general-purpose hardware. This is slow/expensive, but it means you can deploy the encoder relatively quickly - say, weeks - to start getting benefits of saved bandwidth.
If you need to do encoding in a low-latency/real-time or low-power environment, basically you need to use hardware accelerated encoding/decoding. That means waiting years (sometimes decades, depending on the industry -- say, broadcast TV) to deploy the new codec. First the bitstream has to be frozen, then the hardware folks need to develop a hardware implementation, and then enough users need to buy devices with a new enough GPU or chip.
AV1 encoder has a latency allowance parameter, which you can set to zero. Something like av1enc ... lag-in-frames=0
Obviously h.265 has same low latency option like h.264, in form of not using B-frames.
The question remains, which one performs the best?
Is AV1 with lag-in-frames=0 actually usable in practical applications with the available hardware? How much quality suffers from not being able to predict from future frames in comparison to other codecs?
Anyways I really like the ability to specify balance between latency and quality in AV1 and VP9.
The referenced blog post by the founder/chairman of MPEG is worth reading.[1] In it, Chiariglione expresses regret not for the money and control his organization is losing, but for the loss of competition which he claims will slow innovation. It will be interesting to see if this comes to pass, but my hope is that the collaboration of these companies will outpace the historically siloed research consisting of a dozen companies duplicating work.
HEVC has been such a pain for me. You need to run High Sierra to use it, it's not just a plug in for the mac but a whole operating system change. Then GoPro switched to it and as a moderator for the GoPro subreddit, it has caused so many support questions as to why they can't edit the video they took. Yes there are benefits, but man it's been such a pain.
The article mentions that AV1 is slow. I seem to remember that H.264 was painfully slow in the beginning. Is this simply a matter of waiting for hardware to add support for AV1? Or does someone need to sit down at the spinning wheel and write some assembly?
Yes, the h.264 and hevc reference implementations are also terribly slow and alternative implementations with performance optimizations, tradeoffs and additional encoding features were developed over long time spans.
The bitstream formats provide enough flexibility for the encoders to go beyond what the reference implementation does.
Software encoders provide the best quality, hardware encoders are kind of meh but do their job in a low power envelope.
Well, yes, the reference implementation are, reference implementation. But h.264 and HEVC Ref Encoder and let say X26x encoder were only a difference of 10 - 50 times.
Probably a bit of both. H.264 hardware encoding is generally available, but it's more used for real-time communication (Skype, etc) or places where software encoding isn't viable (mobile devices with power limitations) than for reference-quality encodes.
HEVC is starting to get some usage in pirate distributions. Most TV and Movies are released with H.264 video. But some releases are also coming out in HEVC / x265. The files are typically 1/3 to 1/4 the size for the same visual quality.
The pirating community appears to have embraced HEVC however, and they can have a lot of influence in determining which codecs/containers are supported on playback devices like game consoles, phones and tvs.
So I understand AV1 is great for streaming content when you have a lot of encoder computing resources.
What about other video codec application areas?
Like real-time peer-to-peer low latency streaming (<50 ms codec latency)? Like for remote desktops, remote gaming, video conferencing, etc. What rules the roost in that department?
More specifically, which codec would have the best image quality for low-latency 400-4000 kbit/s (== ~50-500 kB/s) 720p/1080p streaming for different computing resources (like say for these cases: ARM A53, high-end ARM (like phone flagship models), mobile GPU, modern x86 chip and discrete GPU)? h.265 with B-frames disabled? Or something else?
What can achieve the lowest latency while maintaining low bandwidth requirements? (So not including obvious bandwidth-hogging techniques like streaming jpegs or [codec x] I-frames.)