Hacker News new | past | comments | ask | show | jobs | submit login
Deploying Brotli for static content (dropbox.com)
87 points by grey-area on April 7, 2017 | hide | past | favorite | 30 comments



It may be interesting to look at a web content benchmark with the best gzip compatible compressors and peak memory usage:

https://sites.google.com/site/powturbo/home/web-compression Only facts and numbers, no rumors, no speculation, no hype,...


I wonder who will win in the long term between Brotli and zstd. http://facebook.github.io/zstd/


There are some design decisions in Brotli I just don't quite understand [1][2][3], like what's going on with its dictionary [2]. One of the Brotli authors is active in this thread, so perhaps they can talk about this.

Zstandard is pretty solid, but lacks deployment on general-purpose web browsers. Firefox and Edge have followed Google's lead and added or about to add support for Brotli. Both Brotli and Zstandard see usage in behind-the-scenes situations, on-the-wire in custom protocols, and the like.

As for widespread use on files-sitting-on-disk, on perhaps average people's computers, I think we're quite a few years and quite some time away from replacing containers and compressors that have been around for a long time, and are still being used because of compatibility and lack of pressure to switch to a non-backwards-compatible alternative [4].

[1] https://news.ycombinator.com/item?id=12010313 [2] https://news.ycombinator.com/item?id=12003131 [3] https://news.ycombinator.com/item?id=12400379 [4] https://news.ycombinator.com/item?id=13171374


> https://news.ycombinator.com/item?id=12003131

This is some sort of misunderstanding. If one replaces the static dictionary with zeros, one can easily benchmark brotli without the static dictionary. If one actually benchmarks it, one can learn the two things:

1) With the short (~50 kB) documents there is about an 7 % saving because of the static dictionary. There is still a 14 % win over gzip.

2) There is no compression density advantage for long documents (1+ MB).

Brotli's savings come to a large degree from algorithmic improvements, not from the static dictionary.

> https://news.ycombinator.com/item?id=12010313

The transformations make the dictionary a small bit more efficient without increasing the size of the dictionary. Think that out of the 7 % savings that the dictionary brings, about 1.5 % units (~20 %) are because of the transformations. However, the dictionary is 120 kB and the transformations less than 1 kB. So, transformations are more cost efficient than basic form of the dictionary.

> https://news.ycombinator.com/item?id=12400379

Brotli's dictionary was generated with a process that leads to the largest gain in entropy, i.e., every term and their ordering was chosen for the smallest size -- considering how many bits it would have costs to express those terms using other features of brotli. Even if results looks disgusting or difficult to understand, the process to generate it was quite delicate.

The same for transforms, but there it was mostly the ordering that we iterated with and generated candidate transforms using a large variety of tools.


ZSTD.

It is superior to Brotli in most categories (decompression, compression ratios, and compression speeds). The real issue with Brotli is the second order context modeling (compression level >8). Causes you to lose ~50% compression speed for less then a ~1% gain in ratios [1].

I've spoken to the author about this on twitter. They're planning on expanding Brotli dictionary features and context modeling in future versions.

Overall it isn't a bad algorithm. Brotli and ZSTD are head and shoulders above LZMA/LZMA2/XZ. Pulling off comparable compression ratios in half to a quarter of the time [1]. They make GZip and Bzip2 look outdated (which frankly its about time).

ZSTD really just needs a way to package dictionaries WITH archives.

[1] These are just based on personal benchmarks while building a tar clone that supports zstd/brotli files https://github.com/valarauca/car


What use case do you have in mind for packaging dictionaries with archives? There is an ongoing discussion about a jump table format that could contain dictionary locations [1].

[1] https://github.com/facebook/zstd/issues/395


For large files >1GiB a library + archive is often smaller then the archive on its own.


How are you compressing the data?

I would expect a dictionary to be useful if the data is broken into chunks, and each chunk is compressed individually.

If the data is compressed as one frame, I would be very interested in an example where the dictionary helps.


In my benchmarks brotli compresses more densely, compresses typically faster to a given density, but decompresses slower.

I benchmark with internet-like loads, not with 50-1000 MB compression research corpora.


When i last ran the numbers a few months ago[1], for the same time spent in the compressor, zstd almost always produced a smaller output than brotli.

1. https://code.ivysaur.me/compression-performance-test/


For now at least, Brotli is the winner. It's already in the browsers.


It is interesting that Japanese, Russian and Thai benenefit more (30 %) from brotli, than latin languages (25 %). This is because of the utf-8 context modeling in brotli.


I think the feature in question is declared in the source code here [1]. The RFC goes into some detail about what this means [2] and how it's used [3]. I'd love a whitepaper but the RFC is fairly descriptive and is the best source I can find.

[1] https://github.com/google/brotli/blob/master/dec/context.h [2] https://tools.ietf.org/html/rfc7932#section-2 [3] https://tools.ietf.org/html/rfc7932#section-7


The first draft of the article actually had that reason, but there is also a strong correlation between the size of the dict (these dicts are almost 1Mb, while other languages are closer to 500kb) and compression ratio improvements. Therefore I've played it safe and attributed it to the window size.

Though for languages like Korean and Chinese (whose size is more inline with latin languages) we see 27.5% improvement, which is most likely due to context modeling.

Therefore I assume ratio improvement is split ~50/50 between these two. It was easy to verify that by compressing data with `brotli --window 15` and comparing ratios there, but I was lazy there. I'm sorry.

PS. I've also skipped NFC/NFD part of the post which is very interesting for Korean, where NFC normalized text occupies 30% less space. It also gives additional ratio 5% for brotli and 15% for gzip.


There is no Thai or Korean in the dict. The total size of the dict (including all languages) is 120 kB.


By "dict" I meant the data we are compressing: these are basically dictionaries for "English to X" translation.

What was saying is that there is a strong correlation between size of the data I was compressing and compression ratio improvements over gzip.


You can also deploy Brotli for dynamic content: at setting 4 it's both faster AND compresses more than gzip.

https://certsimple.com/blog/nginx-brotli


I use brotli on my personal website, and usually notice 15-20% smaller files than the gzip equivalent. It's a bit of a pain to install a brotli compressor but then very easy to add a single build step to compress all static assets.


Does someone know if IE11 will get brotli support? Or will IE11 only receive security patches?


Security patches only. IE11 exists on Windows 10 only for the benefit of enterprises that can't drop horrible legacy technologies like ActiveX. All new feature development is on Edge.


Seems like most of the image optimizations are undone by WordPress/CDN, saved >60% on the xkcd-styled charts.


Why are their graphs xkcd-style?


I've seen it suggested before to imply a bit of a margin of error in the numbers. I personally like it a lot


Haha, I don't think the previous poster was referring to the error bars. That's not "XKCD-style", it's just standard practice in statistics: https://en.wikipedia.org/wiki/Error_bar

I think what he was referring to was the fact that the graphs appear to be hand drawn.


> I think what he was referring to was the fact that the graphs appear to be hand drawn.

Yes, and so was your parent comment. Some people have suggested that the hand-drawn appearance communicates imprecision. For example:

> The rough, seemingly hand drawn nature of the graph provides a visual hint as to the imprecision of the results.

https://www.chrisstucchio.com/blog/2014/why_xkcd_style_graph...


(Oh, speaking of which, in case you’re wondering why all the graphs are in xkcd style: no reason really, it’s just fun. If your eyes bleed from the Comi^WHumorSans typeface there are links to “boring” SVGs at the bottom.)


I'm fine with stuff like that as long as the text is readable. On a 1080p screen unfortunately it is not. I don't know what they were thinking :/


I also was wondering if it was meant to be a joke (did XKCD roast compression protocols?). But it was legitimately hard to read the graphs on a Retina screen, which amuses me because I'm sure that's what the Dropbox people use.


I'm not sure if this is what they're using, but this plot style was added into matplotlib.

http://jakevdp.github.io/blog/2013/07/10/XKCD-plots-in-matpl...


This is indeed what was used there. Here are all the matplotlib docs rendered with XKCD: https://matplotlib.org/xkcd/gallery.html




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: