I am more looking forward to future release. The current version is mostly a bitstream freeze. And it is anywhere from few hundred to few thousands times slower at encoding. While the current version isn't optimized at all, I wonder how long would it take them to reach, say within 5x the speed of x265.
In another news, we have xvc, which is taking most of the H.266 purposed ideas into its implementation. And it is already showing 30% better compression then AV1.
Note: AOMedia have also updated its website.[1] One thing i notice is Apple is the only one not using its Logo in the member site. And this feels a little strange.
> I am more looking forward to future release. The current version is mostly a bitstream freeze.
It's not even a bitstream freeze. This 'release' was put out by the marking folks, and wasn't even discussed with people on the AOM list (I'm part of AOM via VideoLAN). The bitstream remains under development.
Near as I can tell this is just a PR piece before NAB.
Oh Christ. I thought they were fast, only last week they said they have a few bugs still left to be fixed. ( And only last month they said they were still listening to hardware makers on suggestions and changes to improve decoding speed )
Thanks for pointing this out.
One reason why I dont like / trust the folks at On2 / VP8, it is nice AOMedia now has folks like you to keep them honest and humble.
There is concerted effort on the AOM list to finish and close all bugs or features that require normative bitstream changes, so I would expect it isn't too much longer. The number of remaining issues is small-ish, but not zero.
The involvement of hardware people has been a boon, though tough for software people at times :).
Oh no :( I monitor the AV1 bugs, and repos and was indeed puzzled to see this. So it hasn't actually been released. OMG, this is really bad and could backfire.
Even following the things above, it can be pretty hard to see what's going on.
EDIT: mailing list is not accessibly, had some bits flipped in my wetware
I figured everyone understood this, but you are right, it bears spelling out:
Any modern [video] compression format is extremely flexible and allow great liberty for the compressor (whereas the decompressor is strictly defined). This means that there is ample room for work on improving the quality and efficiency of the compressor for years to come.
In other words, AV1, the spec, isn't some magical unicorn. It's the sandbox in which the 'corns can be raised.
My personal analogy for this is that video encoding standards are basically like toolkits for building houses, while encoding software is like a robot trained to build those houses using the tools in question. New standards bring new tools to the toolkit, but training the robots to use them effectively is going to take its time and is often even worse than the previous robots in the beginning simply because the older robots have gotten so proficient with their respective toolkits. x264 in particular is basically the top robot in this department and the standard which any new entrants should aim to beat both in quality (not necessarily too hard) and in time efficiency (which is the much bigger challenge).
I don't think time efficiency is the primary concern for the big encoders like Netflix and YouTube. They're more interested in lowering the bitrate while maintaining image quality and are willing to throw compute resources at slower encoders which achieve that. Netflix, for example, does multiple encodes per scene in multiple formats to get the best possible quality for each format for each scene.
VP9 outperforms H.264 (libvpx versus x264) for Netflix's use case. Here are some articles from the Netflix TechBlog on their experiences with VP9 and their encoding approach in general:
You're right, time efficiency certainly isn't as big of a concern for companies with effectively infinite resources to throw at encoding, but the rest of us probably want to finish our encodes within this decade :) (the reference AV1 encoder is extremely slow)
It is a concern even for Netflix. The number only starts to make sense assuming a 30% bitrate reduction and 5x speed encoding time. This means the current encoder needs to speed up anywhere from 50x to 500x to get to that point.
Imagine if Netflix spends 1M on encoding, even 5x is 4M more, and the current encoder is anywhere from 250M to 2.5B more. This is not a small sum of money.
I think before they talk about speed up, they just need to iron out all the bugs and bitstream freeze before we speak.
Netflix have said they'll roll it out once it's below 5x slower.
But they can start with only the most popular videos to get the most bang per buck, they'll also likely target geographic areas with low bandwidth but decent spec desktops, which means they're getting a return on investment by increasing reach, not just saving bandwidth.
So basically, there's plenty of niches where it makes sense almost as soon as the spec is frozen and they can expand the roll out as it proves itself and speeds up.
Time efficiency is still important for live streaming. Last time I checked (OK that was one year ago) x265 was way too slow for live, even at not so high resolution.
Compressing a video looks like a huge decision tree to me, and newer standards add more interdependencies and more variables at each stage. I'd expect that the resulting optimization problems approximated by encoders are computationally difficult. (Which would explain why they always get better even after many years)
Potentially unrelated but having run events with Apple as a sponsor in the past, they are especially stringent on how/if they allow you to use their logo or branding. We were also forced to go this route.
It is if you think about it from a patents and licensing perspective. However I will try to take another spin to it which might be something they are trying to do without actually saying it.
They implement all the H.266 JVET ideas and tools into xvc. Not every one will make it into the standard due to speed, politics or whatever. Once the H.266 has been standardise, they now have a decent working encoder they could turn on the the features are in JVET and be the first JVET encoder on the market.
The JVET working group knows there is a threat in Royalty free codec like AV1. So I am hoping they take the patents and cost into account. But given how Qualcomm, Ericsson, Sharp has been acting in HEVC i am not entirely sure it will be smooth. Or if JVET has a future at all.
I am COMPLETELY aware of the difference and I was deliberate in writing that. We need the kind of love, hard and excellent work on AV1 software (encoder is the harder, being open ended) that brought us the x264 encoder. AV1 decoders will come aplenty I have no doubt and I'm hoping for a good FPGA implementation.
>We're probably not going to see AV1 implementations on the same level as x264 for at least a few years.
True, on that note, over at the Doom9 forums, the x265 spokesperson there said that they will consider making a AV1 encoder should there be a market.
Given the massive amount of support gathered for AV1 from web and hardware giants, and how it's a royalty free codec and thus in a great position to be the next generation 'de facto' video standard on the web, I'd wager there is a good market for a third party encoder from excellent developers like those behind x265.
The base was VP10 + tools from Thor (Cisco) and Daala (Xiph.Org/Mozilla), and it evolved from there to AV1 through a process of "experiments", test, IP checks and then its enablement.
Assuming that VP10 shares a significant base design with VP9 it would not be surprising if some part of VP9 sillicons decoders could be leveraged on customer hardware, while awaiting for more dedicated circuits.
But on the software end, libaom (AOM reference implementation) is indeed a fork of libvpx. But this library is not broadly considered a good implementation, even for VP9.
Pehaps the guys behind EVE for VP9 [1] will produce an AV1 implementation based of their codebase.
Interesting, but since I could conceivably be exposed to a malicious AV1 stream, I'd value a Rust AV1 decoder even more (which might exist, I don't know).
That's a very good point. I chose Rust more to get ergonomics and more reliable threading - security was not the primary motivation. That said, the encoder is exposed directly to the web via the MediaRecorder API.
Indeed! This implementation was started by some people working on AV1 to test libaom's specification implementation correctness.
There was a presentation about it at VDD17, if I remember correctly.
The github counter is misleading - it is counting auto-generated C headers used for calling assembly code. The repo also contains a libaom submodule, but the only parts that are used are initialization tables and the transforms.
The stats aren't wrong. If it isn't 'real code' then you shouldn't store it in your repository, you should autogenerate it as part of your build process.
The C part is mostly for low-level functions and was brought in to help bootstrap development (it's easier to work on improving a working encoder than one that doesn't work yet). The amount of Rust code is expected to increase a lot over time, while the amount of C code is expected to either decrease or remain constant.
It’s crap and slow as molasses built off an inferior earlier encoder’s codebase. It proves the concept, but it’s not something you want to need rely on in production.
What is missing in the announcement is a link to a quality / bitrate comparison with perceptual quality. Netflix seems to have a good automated perceptual metric. PSNR is still a popular metric because it's "objective", but codecs optimized for PSNR produce ugly, blurry results, like all the VPx codecs. That said, a codec with great PSNR can probably be tuned for great perceptual results.
The Vorbis people, who were involved in AV1, have produced some impressive perceptual improvements even with inferior technology (Ogg Theora, based on VP3.2), so they know what to do and how: https://people.xiph.org/~xiphmont/demo/theora/demo9.html
It's not going to be decipherable without some help to point you to what to compare and how to read the charts, but https://arewecompressedyet.com/ contains many such comparisons and is what they use to evaluate branches and features, etc. The Daala team has mostly focused on four metrics, of which PSNR is just one and probably the least valued. PSNR-HVS-M is the "hardest" one, but it also includes SSIM and another SSIM variant.
I'm sure one of the team will chime in shortly pointing to some recent results.
This will shake up the digital media industry. You can safely bet on this format, it's going to stay for long and promisses extremely good compression quality ratio.
> This will shake up the digital media industry. You can safely bet on this format…
(1) Maybe (2) Not today, and realistically not for a few years (if ever). "Standard" (whether de jure or de facto) compressed media formats (e.g. MP3, AAC, H.264) depend on an ecosystem of supporting products and services, and the existence of that ecosystem for any given format can't be assumed.
But how good is the quality? I watched the Mozilla Demo and at the 30 second mark the "Robot Hand" distorted the green trees in the Background. Even at 720p@800kbps. Haven't noticed this with other hevc encodes at similar bitrates.
Actually had to double check this to make sure it wasn't in the sources. It seems the the encoder basically sees the trees as only having a small difference, but really visually it's quite large. The solution is a better distortion function in the encoder - libaom's is simple but dumb. (luckily this means no decoder or format changes)
> It's hard to believe since MPEG-4 and HEVC are already everywhere.
HEVC is far from being everywhere. Huge services like Youtube, Amazon Video, Netflix and others are going to use AV1, not H.265. H.265 is already dead, it just didn't admit defeat yet.
I don't think HEVC will go away; there's already a lot of fixed-function embedded hardware out there with HEVC decoders that will never get AV1 support. All the streaming players have to support those for the foreseeable future.
Hardware gets obsolete. Fees for HEVC do not, so new hardware won't be using it. For the legacy support, video services will use H.264 like Youtube does now. They just won't offer higher resolution in such cases, which is fine.
There's already tons of devices with H.265 encoders and decoders in hardware. Those aren't going anywhere anytime soon. Plenty of high resolution security cameras use H.265.
Heh, is not that easy. The reason is simple: high resolution. iirc, for example, Netflix uses h.265 for 2k+ resolutions. They don't even allow 2k+ on hardware that do not have h.265 hardware decoder for that reason.
Not using high resolution for legacy cases, would be an incentive to update devices to those which support AV1. So it only helps this. Youtube does exactly that already.
> They don't even allow 2k+ on hardware that do not have h.265 hardware decoder for that reason.
So Netflix will swap this requirement from H.265 to AV1. Problem solved.
They don't do it for a long time, because that means they will have to KILL support for TVs, Xbox and laptops that already have 4k support. They just use hardware deciding (because of DRM).
Not kill, will just reduce the resolution for them and limit them to H.264 which they support. That would be an incentive to get a device which supports AV1 sooner if someone really needs that high resolution. I'm sure all AV1 backers will do it at some point.
Oddly enough, the actual link to download the specification, reproduced below for your perusal, seems to link to outlook.com and includes a bunch of other interesting data:
Edit: I started reading the spec and found that the bulk of it it appears to be mostly fragments of de-semicolon'd C code and plenty of lookup tables; in other words, a lot of "how" but not much in the way of "why" or "what". There's a noticeable lack of diagrams as well --- IMHO very important for describing something as visual as a video codec. For comparison, I certainly found the H.264 spec to be much more understandable than this AV1 one.
That first link looks like it came from email that went through Office 365, probably copy pasted. We see the same in our Office 365 environment, links are re-written to redirect through outlook.com's safelinks service.
> And then patented stuff will start being phased out everywhere.
JPEG is patented, VP9 is patented, and AV1 is patented. Luckily, baseline JPEG is licensed under royalty-free terms and so are VP9 and AV1. The issue is never the patents but rather the licensing of those patents.
All the more disappointing to see both Google and Microsoft embrace HEIF for images in their next operating system versions. They really couldn't wait another couple of years to develop an open AV1-based image format? Come on.
Who cares about what Apple does or doesn't do? Apple certainly doesn't seem to care what Google and Microsoft do, which is why it continues to ignore open standards like Vulkan and now went ahead and adopted another one of MPEG-LA's proprietary and patent-encumbered formats.
Because they couldn't wait, now we may be stuck with another proprietary standard for the web for another 20 years. Not to mention there could still be others besides MPEG-LA to claim patents on HEIF and accuse developers of infringement, just like it happened with HEVC. This mess could have been avoided with a little bit of patience.
Honestly, couldn't they just adapt WEBP for it? It's basically just an RIFF container with a VP8 key frame. Then again, it's WebP, and there's no need to bloat that thing anymore.
Edit: Nevermind, I see. Going by the document, it's meant to be an exact mirror to HEIF, but for AV1. Some interesting features too.
My point is that if you claim to read HEIF files you need to read all possible HEIF files, including ones that contain H.264/5 data. Don't claim to support a standard if you don't support the most commonly-used part of it.
The .heic format now used by iPhones is the same idea (https://en.wikipedia.org/wiki/High_Efficiency_Image_File_For...). But wider use of that is going to be constrained by the cost problem HEVC has in general (two patent pools to deal with). Some of WebP is based on VP8 (not VP9) intra coding too.
There are a lot of tools packed in AV1's intra coding (and HEVC's, though I've read less about that). Block sizes range from 4x4 to 64x64 and there's a mix within one image, so the encoder can use the right size for the level of detail in each area. There are more ways to predict a block's content from what's to the top and left, which leaves less work for the JPEGish DCT part. There's clever de-ringing post-processing that, in effect, blurs away many of the attention-getting JPEG-y DCT artifacts around edges, while 1) being aware of the direction of the main edge itself to avoid blurring that away and 2) using contrast thresholds to preserve as much other legitimate detail as it can--more about deringing at https://people.xiph.org/~jm/daala/deringing_demo/ .
Relatedly, given the complexity, I wouldn't expect this to, like, take the world by storm in the next three months. The unoptimized encoder is still _really_ slow. Google has designed a hardware implementation, but of course hardware designs take time to get integrated, fabbed, and into shipping products. Given who's involved, I'm hopeful it does get wide support. (Would love to hear Apple's plans given their current support for x265; their decision to join AOMedia is a good sign at least.) Anyway, looking forward to seeing the results.
> "Some" here means WebP is literally a single VP8 frame shoved into a RIFF container :D
Though it's less used than the lossy format, lossless WebP is a thing too; if I said WebP was exactly VP8 intra someone might nitpick that. There's no winning, haha!
I think it's an ideal use case for WebAssembly. It'd be a good way to roll out any future AV1-based image format until native support arrived in all browsers.
Heh, fun consequence I just realized is that, in theory, you don't even have to wait for video-based still formats to be standardized and deployed if you're willing to play some games with single-frame videos.
So, like, h.264-based "images" can be shown in most browsers now, and VP9-based in many (but not Safari or IE). And you can fake AV1 images as soon as you get AV1 video.
Using video decoding in an odd way like that probably isn't especially practical or wise, but fun that the capability's there/reachable.
There are differences--you're often willing to do more work decoding a still than a frame, say--and in the first story I linked there's a quote from a Googler wondering about whether to make the image format exactly track AV1 or to tweak it. So, there's a real question.
That said, the still demo at https://people.xiph.org/~tdaede/av1stilldemo/ is really encouraging, and Apple's seen a win in doing something parallel with HEVC/.heic. AV1's already gone through a lot of optimization and IP vetting (patents caused trouble for previously proposed replacements for JPEG), and has a hardware decoder design and a lot of companies signed on. Like everyone I just have to wait and see what they do, but I'm more or less on team "ship it" here.
On the other hand the bar for still images is very low due JPEGs dominance. That's why it makes sense to use video encoders for stills even if they are not ideal, you are still going to get much better results than JPEG.
As an aside: A wrapper within FFmpeg for libaom is only a few hours work, but if you want to play with it Right Now, VLC has support. A native decoder will indeed take significantly longer though.
In another news, we have xvc, which is taking most of the H.266 purposed ideas into its implementation. And it is already showing 30% better compression then AV1.
Note: AOMedia have also updated its website.[1] One thing i notice is Apple is the only one not using its Logo in the member site. And this feels a little strange.
[1]https://aomedia.org/membership/members/