There are some design decisions in Brotli I just don't quite understand [1][2][3], like what's going on with its dictionary [2]. One of the Brotli authors is active in this thread, so perhaps they can talk about this.
Zstandard is pretty solid, but lacks deployment on general-purpose web browsers. Firefox and Edge have followed Google's lead and added or about to add support for Brotli. Both Brotli and Zstandard see usage in behind-the-scenes situations, on-the-wire in custom protocols, and the like.
As for widespread use on files-sitting-on-disk, on perhaps average people's computers, I think we're quite a few years and quite some time away from replacing containers and compressors that have been around for a long time, and are still being used because of compatibility and lack of pressure to switch to a non-backwards-compatible alternative [4].
This is some sort of misunderstanding. If one replaces the static dictionary with zeros, one can easily benchmark brotli without the static dictionary. If one actually benchmarks it, one can learn the two things:
1) With the short (~50 kB) documents there is about an 7 % saving because of the static dictionary. There is still a 14 % win over gzip.
2) There is no compression density advantage for long documents (1+ MB).
Brotli's savings come to a large degree from algorithmic improvements, not from the static dictionary.
The transformations make the dictionary a small bit more efficient without increasing the size of the dictionary. Think that out of the 7 % savings that the dictionary brings, about 1.5 % units (~20 %) are because of the transformations. However, the dictionary is 120 kB and the transformations less than 1 kB. So, transformations are more cost efficient than basic form of the dictionary.
Brotli's dictionary was generated with a process that leads to the largest gain in entropy, i.e., every term and their ordering was chosen for the smallest size -- considering how many bits it would have costs to express those terms using other features of brotli. Even if results looks disgusting or difficult to understand, the process to generate it was quite delicate.
The same for transforms, but there it was mostly the ordering that we iterated with and generated candidate transforms using a large variety of tools.
It is superior to Brotli in most categories (decompression, compression ratios, and compression speeds). The real issue with Brotli is the second order context modeling (compression level >8). Causes you to lose ~50% compression speed for less then a ~1% gain in ratios [1].
I've spoken to the author about this on twitter. They're planning on expanding Brotli dictionary features and context modeling in future versions.
Overall it isn't a bad algorithm. Brotli and ZSTD are head and shoulders above LZMA/LZMA2/XZ. Pulling off comparable compression ratios in half to a quarter of the time [1]. They make GZip and Bzip2 look outdated (which frankly its about time).
ZSTD really just needs a way to package dictionaries WITH archives.
[1] These are just based on personal benchmarks while building a tar clone that supports zstd/brotli files https://github.com/valarauca/car
What use case do you have in mind for packaging dictionaries with archives? There is an ongoing discussion about a jump table format that could contain dictionary locations [1].
It is interesting that Japanese, Russian and Thai benenefit more (30 %) from brotli, than latin languages (25 %). This is because of the utf-8 context modeling in brotli.
I think the feature in question is declared in the source code here [1]. The RFC goes into some detail about what this means [2] and how it's used [3]. I'd love a whitepaper but the RFC is fairly descriptive and is the best source I can find.
The first draft of the article actually had that reason, but there is also a strong correlation between the size of the dict (these dicts are almost 1Mb, while other languages are closer to 500kb) and compression ratio improvements. Therefore I've played it safe and attributed it to the window size.
Though for languages like Korean and Chinese (whose size is more inline with latin languages) we see 27.5% improvement, which is most likely due to context modeling.
Therefore I assume ratio improvement is split ~50/50 between these two. It was easy to verify that by compressing data with `brotli --window 15` and comparing ratios there, but I was lazy there. I'm sorry.
PS. I've also skipped NFC/NFD part of the post which is very interesting for Korean, where NFC normalized text occupies 30% less space. It also gives additional ratio 5% for brotli and 15% for gzip.
I use brotli on my personal website, and usually notice 15-20% smaller files than the gzip equivalent. It's a bit of a pain to install a brotli compressor but then very easy to add a single build step to compress all static assets.
Security patches only. IE11 exists on Windows 10 only for the benefit of enterprises that can't drop horrible legacy technologies like ActiveX. All new feature development is on Edge.
Haha, I don't think the previous poster was referring to the error bars. That's not "XKCD-style", it's just standard practice in statistics: https://en.wikipedia.org/wiki/Error_bar
I think what he was referring to was the fact that the graphs appear to be hand drawn.
(Oh, speaking of which, in case you’re wondering why all the graphs are in xkcd style: no reason really, it’s just fun. If your eyes bleed from the Comi^WHumorSans typeface there are links to “boring” SVGs at the bottom.)
I also was wondering if it was meant to be a joke (did XKCD roast compression protocols?). But it was legitimately hard to read the graphs on a Retina screen, which amuses me because I'm sure that's what the Dropbox people use.
https://sites.google.com/site/powturbo/home/web-compression Only facts and numbers, no rumors, no speculation, no hype,...