Hacker News new | past | comments | ask | show | jobs | submit login
Brunsli: Practical JPEG repacker (now part of JPEG XL) (github.com/google)
233 points by networked on March 1, 2020 | hide | past | favorite | 72 comments



Preemptive disclaimer: I don't want to belittle the work the authors did here in any way, and are actually excited especially about the reversible, lossless jpeg<->brunsli coding and that google's buy in and network effects will most likely mean this comes to a viewer/editing software near you in the not so distant future (unlike lepton, which never got out of it's tiny niche).

But seeing the 22% improvement figure reminded me that the typical JPEG file on the internet is rather unoptimized even on write-once-read-many services like imgur or i.reddit.com which transform files (stripping meta data etc) and do not preserve the original files. Just using the regular vanilla libjpeg encoder you can usually save 5%-10% just by lossless recoding of the coeffs and the better-yet-more-computionally-intense mozjpeg coder can even get you a bit further than that.

Then again, the imgur single image view page (desktop) I just randomly opened by randomly clicking an item on their homepage transfers 2.9MiB of data with ads blocked (3.9MiB deflated), 385KiB of which was the actual image, and that image can be lossless recoded by mozjpeg to 361KiB (24KiB difference, a 6.2% reduction), so the 24KiB (0.8%) reduction out of 2.9MiB of cruft hardly matters to them I suppose and may be cheaper in bandwidth and storage cost to them than the compute cost (and humans writing and maintaining the code).

Using brunsli, that same 385KiB imgur file went down to 307KiB so roughly a 20% reduction, but still only 2.6% reduction of that massive 2.9MiB the imgur site transferred in total.


> still only 2.6% reduction of that massive 2.9MiB ... in total.

This is unfair argument from my point of view. When we deployed a 30 % improvement for fonts (woff to woff2), people argued that fonts is just a small part of websites. When we deployed a 30 % improvement for PNG-like images in WebP lossless, people argued that they are only a small part of the traffic. When we deployed a 15-30 % improvement for CSS, JS and HTML with Brotli people argued that those are only a small part of the web. When we deploy improvements to video, people argue that 'yes it is a lot of data, but it is buffered so people are not waiting and they care less than for other types of data'.

Let's review each technology within its scope.


Yes not so useful for small images but when users view multi-megabyte images then I guess the savings will start to be significant even with the page overhead. Say a 4Mb image can be optimised to 2.5Mb then total download comes down from 6.9Mb to 5.4Mb which is not trivial, especially when the same page has multiple images (some people upload a gallery of images on sites like imgur) or when viewing lots of image URLs. Especially on a limited data plan these small savings will start adding up.


Yes, it would be a win for the users, especially on limited data plans. Though I wouldn't hold my breath for most sites to actually implement it; most sites do not mind serving megabytes of data for a single page view to users, incl megabytes of scripts, even to mobile users.

4MiB to 2.5MiB is out of range for brunsli, more like 3.2MiB if you're lucky.

I also tried with a 54 MiB JPEG[1] just now. The brunsli coding is 49MiB so not even a 10% reduction on this particular file. And it took a wallclock time of 16 seconds on my last gen Intel. Decoding it back to JPEG took 11 seconds on the same box.

[1] A picture of a wedding cake, taken with a NIKON D810, one of the largest JPEGs I had available. I was "exported" by Lightroom 9 it seems from a NEF/RAW source, and is full of meta data too, around 100KiB of it.


> I also tried with a 54 MiB JPEG[1] just now. The brunsli coding is 49MiB so not even a 10% reduction on this particular file.

What about JPEG XL? That's the primary usefulness of brunsli IMO, it's built into JPEG XL as one of the ways it achieves much better compression (comparable to, or possibly better than, AVIF).


Just tried using their latest code. Same 54MiB JPEG yielded (with cjpexl -q x)

    3.8 MiB (q=70)
    13 MiB (q=90) ("visually lossless (side by side). Default for generic input.")
    19 MiB (q=95) ("visually lossless (flicker-test).")
    40 MiB (q=100) ("mathematically lossless. Default for already-lossy input (JPEG/GIF)")
I'm guessing q=100 is essentially (mostly, minus rounding errors) lossless non-reversible brunsli JPEG-XL.

To compare, webp (cwebp -q x, also operating on a q=0...100 scale)

    9.9 MiB (q=70)
    18 MiB (q=90)
    23 MiB (q=95)
    28 MiB (q=100)
    65 MiB (z=6, medium lossless preset)
    
I didn't do exact timings but JPEG XL was noticeably faster, then again they use multithreaded (using 4 threads here) AVX2 code, which cwebp does not (used the debian provided package for webp), so I wouldn't pay too much attention to this result.

Keep in mind JPEG XL isn't finished, and that neither are their defaults and recommendations.


That does sound very impressive. (Keep in mind, though, that there's no reason to think the q= settings give equivalent quality between JPEG XL and webp.)

Also, your lossless encoding was only 40 MiB instead of 49 MiB for pure Brunsli, so presumably JPEG XL is capable of some stronger lossless compression.


Regarding brunsli vs JPEG XL "lossless": yes, JXL fared better there, but then again did not have the requirement of reproducing the bit-exact JPEG back on demand. So (as far as I understood) JPEG XL in q=100 will go back to the pixel data and run the the full JPEG XL coder on it (with no quantization to avoid loss), while JPEG XL in reversible "bg" mode[1]/brunsli will only decode back to the coeffs (so there is no loss at all) and then use their superior lossless (brunsli) coding to store these coeffs more efficiently along with some metadata to allow reconstructing the original file on demand.

I could be wrong, I only skimmed the docs and code; when I was quite tired.

[1] JXL has a "bg" mode too that is reversible to the original JPEGs, but that's currently not exposed by the cjpegxl tool. But I saw in in their benchmark tool, and there it produced essentially the same size files than plain brunsli tool did, probably because it's essentially just the brunsli tool.


Yes, I believe that's right. JPEG XL is able to compress JPEGs in a way that is pixel for pixel identical to the output of a JPEG decoder, but can't be reversed into a bit exact copy of the original JPEG.


Just one clarification about this: As stated before in these comments, JPEG XL includes brunsli, so JPEG XL allows for byte exact transport of the original JPEG file (with -22 % in size). This can be practical for example for file system replication or for long term storage of a lot of original JPEGs. The other, non-brunsli JPEG XL modes can do pixel exact, but byte exact can be more appealing for some use cases.


I tried to order some takeout the other day and the place was busy so I got voicemail. The beginning was pretty solid, but as he went on you could tell that he was making it up as he went. By the end he was struggling to pull it out of a spiral and it sounded awkward.

This lead to a conversation with my friend about how it was a recording, and I never needed to hear this cut of the message. You could do it over and over until you got it right and I would be none the wiser. But somehow when we record things we feel like it’s “out there” and we can’t take it back.

Or, we make the reverse mistake and do things “live”, resulting in tremendous amounts of resources being spent to redo work that could have been one and done, or really only changes infrequently. In the analog world, or with software.

In the middle on the software side are tools in the vein of continuous improvement. There’s that service that will file PRs to fix you dependencies. There should be linters and test fuzzers that do the same in the easy cases.

We have tools to scan the assets we already have and try to precompress them better. New ones like this one arrive from time to time. But doing them prior to publication introduces friction and people push back. And once it ships to our servers we erroneously believe it’s too late to change them and I don’t know why.

Are we stuck in the old headspace of shrinkwrapped software that you can’t change without enormous difficulty? Or is something else going on?


I get what your saying but when your storing millions of images and transferring them frequently those small reductions in size are very significant when you get your storage and/or transit bill. A.lot of the fluff and filler you mentioned is cached, so your not bringing it down every request. Even if it wasn’t you can assume it’s deemed becc


You'd think so, but imgur for example is literally in the business of serving images to a lot of users (and some videos, too) and yet they did not even implement the vanilla libjpeg optimized coding it seems. And I am not just picking on imgur, they are just an example for today; other similar services didn't do much better last I checked, neither when it comes to serving optimized JPEGs nor overall page sizes either.


I think that's his point - you can already reduce JPEG size a bit by using fancier libraries, but even huge sites like Imgur apparently don't bother.


Maybe Imgur doesn't bother, I don't know. I work for Cloudinary, and we do bother. A lot.


The small reductions in image size still pale into insignificance compared with the gains they'd get if they just cut 90% of the crud out of every imgur page load. In the past few years that site has become exponentially more bloated and resource intensive for zero apparent gain (to the users anyway).


Imgur is even more awful on mobile, with its infinitely loading interface. I wrote about a dozen rules for uBlock that try to get rid of everything but the image on Imgur's mobile site, and this seems to help quite a lot. Tends to break the page and require retooling pretty often though.


For a meaningful comparison, you have to divide ¿a large? part of that 3.9 MiB by the number of times it will get reloaded from cache.

_If_ users, on average, look at lots of images, that divisor could be large.


Sorry, can someone here clarify --

Does this reduce the size of JPEG files, maintaining JPEG format?

Or is this about repacking JPEG files into JPEG XL files (while reducing size), while maintaining the ability to losslessly revert to JPEG?

The page never explicitly states what format the "repacking" is into, and it has so many references to JPEG XL that it no longer seems obvious that it's into just JPEG?


It does not maintain the JPEG format, tho it will be part of the JPEG XL format.

It is repacking/recoding jpegs in a lossless and reversible manner, so that clients supporting brunsli can be served directly with the optimized version (their apache and nginx modules seem to serve files with a image/x-j mime), and clients without support can be served with the original jpeg file (or served with a brunsli file and decoded with a wasm-brunsli->native jpeg decoder pipeline if wasm is an option), while you only have to keep the brunsli (or original jpeg) file around.

Since JPEG XL isn't finished yet, there still might be minor differences in the end that make the current brunsli format incompatible to the JPEG XL format, so I wouldn't mass-convert your files just yet.


My understanding is it converts normal JPEGs to incompatible JPEG XL files, losslessly.

The README does a poor job explaining this.


This could be what provides the 'activation energy' to get large Web sites working with JXL; it avoids the situation where you don't want to touch a new standard until everyone else switches too. If Google ships a decoder in Chromium, I would give JXL decent odds of getting a lasting niche.

Also, 20% off JPEG1 sizes is better than it may sound; you only save about 30% more by switching to any of the new codecs (JXL's VarDCT or the video-codec-based ones) that apply a ton of new tools. Given JPEG1 was published 1992, that just confirms to me Tim Terriberry's quip that it was "alien technology from the future."

Working against JPEG XL: it's four modern image formats in one (VarDCT, Brunsli, FUIF, and a separate lossless codec) so a ton of work to independently implement. Also, video codecs already have ecosystem support, and have or will probably get hardware support first.

Further out: "JPEG restoration" is something I mostly see experimental work about but could also take the old format a bit further. The idea is to use priors about what real images should look like to make an educated guess at the information that was quantized away, so you get less of the characteristic blocking and ringing from overcompresed images.

(For example, look at Knusperli, JXL's "edge-preserving filter", or AV1's CDEF. The "quantization constraint" supported by the first two of those is, to me, what makes it "restoration" and not just another loop filter: it can always return pixels that could have compressed to the exact DCT coefficients in the file.)


The separate lossless codec and FUIF and some additional ideas have been integrated into the modular mode. The modular mode and VarDCT can be mixed in the same image.

JPEG XL decoder no longer performs 'quantization constraint', it is a classic loop filter, just with heuristics and control fields to maintain detail much better than loop filters in video codecs.

If you start with sharp high quality originals that don't have yuv420 in them, you usually will see 65 % savings with JPEG XL, i.e. more than 30 % on top of brunsli. You can think of VarDCT-mode as guetzli(-35 %) + brunsli(-22 %) + format specific changes(-30 %) like variable sized dcts, better colorspace, adaptive quantization and loop filtering.


Thanks, I appreciate the corrections and detail.

I was mostly going by the committee draft on the arXiv (https://arxiv.org/abs/1908.03565 ), which still mentioned the quantization constraint (J.3) and a "mathematically lossless" mode (annex O) distinct from modular. "Four formats" was just imprecise wording on my part.

The material already public about JPEG XL's design rationale, testing, and so on has been really fun to follow as an outsider. I selfishly hope there's more of it as the work on it and rollout continue. Even with the standard itself public, a lot can be mysterious to those not immersed in this stuff.


Yes, quantization constraint was removed post-CD and Annex O was recently merged into the modular mode to simplify the standard. JXL's page count is 1/2 to 1/3 of contemporary video codecs.


Thanks. If there are places the public can follow progress post-CD, I'm curious. (I see https://gitlab.com/wg1/jpeg-xl/ was updated late Feb, yay.)


:) There are updates every ~3 months at https://jpeg.org/news.html.


Does anyone know how this compares to other projects such as Lepton?

https://github.com/dropbox/lepton

The goals and results appear similar. Is the primary difference that brunsli will likely actually be part of a standard (JPEG XL)?


The files created by lepton cant be displayed by any client. Brunsli is a converter for JPEG <-> JPEG XR which is losless, and by the improved jpeg xr algorithms decreases filesize.

The interesting part is you can therefore convert between these formats thousands of times without visual regressions. You could (in a few years) only store the jpeg xr file and your webserver may transcode it to jpeg for legacy browsers.


>Brunsli is a converter for JPEG <-> JPEG XR which is losless...

You mean JPEG XL?

JPEG XR is a completely different thing [1]

[1] https://en.wikipedia.org/wiki/JPEG_XR


Extremely good branding work there, JPEG.


Don't forget JPEG XS and JPEG XT.

https://jpeg.org/jpeg/index.html


They may still be comparable. If the point of Lepton is to save space, and if a round-trip through Brunsli costs less than Lepton encoding while saving similar amounts of space, then it could be a design alternative.


Their respective READMEs both claim a 22% size reduction, which sounds like an interesting coincidence. Have they identified a similar inefficiency in the format itself?


JPEG's entropy encoding is ancient. Adding modern arithmetic coding can save significant bits without changing the actual visual data.


Another inefficiency of JPEG is that each block (8x8 pixels in size) is compressed independently[^]. This means that the correlations between pixels that are adjacent across a block boundary are not used. If I were to take a JPEG image and randomly flip its blocks (mirror them along x and/or y axis), the resulting JPEG would have a very similar filesize, even though it's a much less likely image.

Brunsli and, IIUC, Lepton, make use of these correlations.

[^] the average color of blocks is not compressed strictly independently, but the space used on those is small compared to all the rest of the information about a block

Disclaimer: I've worked on Brunsli.


Very interesting. This independence across blocks can presumably be leveraged at decode time for faster decoding though. Surely there must be decoders out there parallelizing over the blocks on multi-cores arch /GPU?

Do you know how Brunsli & Lepton fare when it comes to parallelizability?


I assume that you mean parallelizability of decoding and not of encoding.

JPEG's decoding is poorly parallelizable: the entropy decoding is necessarily serial; only inverse FFTs can be parallelized.

Sharing the data about boundaries need not hamper parallelizability in its most simple meaning: imagine a format where we first encode some data for each boundary, and then we encode all the blocks that can only be decoded when provided the data for all its four boundaries.

However, what often matters is the total number of non-cacheable memory roundtrips that each pixel/block of the image has to take part in: a large cost during decoding is memory access time. If we assume that a single row of blocks across the whole image is larger than the cache, then any approaches similar to the one I described in the previous paragraph add one roundtrip.

Another consideration is that a single block is often too small to be a unit of parallelization -- parallelizing entropy decoding usually has additional costs in filesize (e.g. for indices), and any parallelization has startup costs for each task.

The end-result is that a reasonably useful approach to parallelization is to split the image into "large blocks" that are large enough to be units of parallelization on their own, and encode _those_ independently.


Brunsli and Lepton are both sequential.

In JPEG XL, there's both the original sequential Brunsli, and a tiled version of it (using groups of 256x256 pixels) which can be encoded/decoded in parallel. If you have a 4:4:4 JPEG (no chroma subsampling), you can also instead of Brunsli, use the VarDCT mode of JPEG XL, where all the block sizes happen to be 8x8. Compression is similar to that of Brunsli, and it's even slightly faster (but it doesn't allow you to reconstruct the JPEG _file_ exactly, only the _image_ itself is lossless in that case).


That might prove to be a good measure of image compression 'efficiency'.

Present to a user two images, one an image compressed by image compressor X, and one compressed by the same image compressor with a single bit of output flipped.

In an ideal image compression scenario, the decompressed images would not be the same, but a user could not tell which was the 'correct' image, since both would look equally realistic.


If a scheme had something like that property and satisfied some simpler conditions, I would wager that it necessarily is a good compression scheme. However, this is very much not required of a good compression scheme:

Imagine that a compression scheme used the first bit to indicate if the encoded image is an image of a cat or not. Changing that bit would then have very obvious and significant implications on the encoded image.

If that example seems too unrealistic, imagine a modification of a compression scheme that, before decoding, xors every non-first bit with the first bit. Then flipping the first bit in the modified scheme is equivalent to flipping a lot of bits in the unmodified scheme, but they are equivalently good at encoding images.

Edit: To put it short, the important property is that equally-long encoded images are "equally plausible": it's not important how many bits differ between them, and it doesn't matter if they are similar to each other.


In the thought experiment, I don't think the user is told beforehand what the image is.

So you flip the cat bit and get an image of a helicopter, and they still can't tell which one is 'correct'.


Ah, thank you. I misread the GP. It seems that he is saying nearly[^] exactly what I wanted to say in the edit.

[^] the property should hold not only for single-bit changes, but all length-preserving changes -- it's perfectly fine for all single bitflips to e.g. result in invalid codestreams.


Single-threaded brunsli runs 2.5x faster than single-threaded lepton, but brunsli compresses 1 % less. Brunsli is able to compress more JPEGs (including more exotic JPEG configurations) than lepton.


Just to put numbers on this, on an Intel Skylake CPU, brunsli compressed a 1MB image in .16s, lepton needed .5s, more than triple the CPU time. Brunsli peaked at 100MB while lepton peaked at 44MB of RAM.

Decoding with lepton wants .23s and 8MB of RAM while brunsli used .13s and 42MB of RAM.


Have you tried to test on your machine? You'll see various results with lossless/lossy image optimisers.

I got <22% with Brunsli.


And how does it compare against HEIF?


Not comparable. Brunsli and Lepton are lossless compressors for JPEG files; HEIF is a completely different lossy image encoder.

To compare the size of a Brunsli/Lepton encoded JPEG file with an HEIF image, you'd need to define some sort of quality equivalence between the two, which gets complicated fast.


HEIF can't compress JPEG images losslessly


Indeed, HEIF/HEIC is basically a slightly dumbed down HEVC (h.265) i-frame (full frame) (HEIC)[1] and new container format (HEIF)[2], similar to WEBP being VP8 i-frames in a RIFF container. So they are used as full blown codecs in practice, usually not in a lossless mode, so shifting JPEG to HEIC or WEBP will lose some quality.

[1] Decoding HEIC in Windows (10) requires you to have installed a compatible HEVC decoder. Which is 99 cents (and the hassle of setting up a store accounts and payment processing with MS) or an alternative free one which will use the HEVC codec that is shipped with hardware such as newer Intels (QSV) or GPUs. Thank you patent mess!

[2] HEIF the container format can contain JPEG data, but in practice does not or only as a supplementary image (previews, pre-renders, thumbnails, etc)


Below 0.5 bpp, both HEIF or BPG [1] and AVIF performs quite a bit better than JPEG XL , XL shines in 0.8 bpp. At least in initial testing.

[1] https://bellard.org/bpg/


Do you have a link for JPEG XL comparisons ? The comparisons on that page are with JPEG XR, which is a different thing.


Unfortunately It couldn't find the exact one, it was from the author's presentation against a few other codec on common metrics such as PSNR and SSIM.

There is a thread on Doom9 [1] with similar results though.

[1] https://forum.doom9.org/showpost.php?p=1894341&postcount=167


If someone is wondering where the name comes from: https://www.saveur.com/article/Recipes/Basler-Brunsli-Chocol...

Not sure if it is the best recipe - the ones I use are usually written in German.


Brotli, another compression algorithm created by Google is also named after a Swiss baked good: https://en.wikipedia.org/wiki/Spanisch_Br%C3%B6tli


Zopfli also comes from a Swiss bakery product. Developers must like carbohydrates :)


And let’s not forget Gipfeli and Knusperli ;)

https://github.com/google/gipfeli

https://github.com/google/knusperli



Huh, I always just used good ol’ jpegtran:

  jpegtran -copy none -optimize -progressive -outfile "$image" "$image"
I have a wrapper script around this to allow bulk optimization as well as calculating stats.

Time to switch I guess.


You should use mozjpeg instead. It's a jpegtran fork and drop in replacement which optimizes the Huffman table even better.


Does it really optimize the Huffman trees better? I wouldn't think there's more you can do to optimize Huffman trees than make them reflect the actual histogram.

Mozjpeg does try different progressive scan scripts – different scripts lead to different compression behavior. That helps a bit, though I don't like it when it separates the Cb and Cr planes, becauses that leads to "flash of weird color" when decoding progressively. See also: https://cloudinary.com/blog/progressive_jpegs_and_green_mart...


Actually, I’ve been using mozjpeg’s version of jpegtran instead of libjpeg’s for god knows how long.


JPEG -> JPEG optimization certainly makes sense, since it doesn't require any decoder to be updated.

JPEG -> [something else] compression (like Brunsli or Lepton) can compress better though, since they can use (better) predictors, entropy coding, context modeling, etc. But they do require either a new decoder (saving both bandwidth and storage), or an on-the-fly conversion back to JPEG (saving only storage; bandwidth wouldn't change).


I like the check list for features... can't wait for

"Nginx transcoding module"


I understand what your saying but when your storing millions of images and transferring them frequently those small reductions in size are very significant when you get your storage and/or transit bill at end of month. A lot of the fluff and filler you mentioned is compressed and cached, so your not bringing it down every request and not in raw form. Even if you are sending a huge amount of uncachable stuff with each request it doesn’t mean savings wouldn’t be appreciated. It’s funds for a team lunch if nothing else!


Could someone explain what a "repacker" is?

What I know: In Photoshop, when I save an image as JPEG, I can decide the "quality" (Low, Medium, High, etc). The lower it is, the smaller the file size but the image will have (more) artifacts. The resulting image can then be opened in any image viewer including browsers.

Also, I was told to save the "master" copy in a lossless format (e.g. TIFF or PNG) because JPEG is a lossy format (like MP3 to WAV).

So how does a "repacker" come to play?


JPEG compression works in two phases:

1. Discrete Cosine Transform, discarding insignificant information.

2. Compression of bitstream.

Step 1 is lossy in practice because it throws away things that seem insignificant. The "quality" control determines what counts as insignificant. Step 2 is lossless, and just tries to make the data coming out of Step 1 take up as little space as possible.

A repacker redoes Step 2 better: it makes the file smaller without reducing the quality, by changing how the data is compressed, not changing which parts are kept.


> Brunsli allows for a 22% decrease in file size while allowing the original JPEG to be recovered byte-by-byte.

It's a lossless file compression application that's specialized on compressing a specific file format, so it can beat generic compression tools like gzip.


> Brunsli has been specified as the lossless JPEG transport layer in the Committee Draft of JPEG XL Image Coding System and is ready to power faster and more economical transfer and storage of photographs.

I thought the lossless part of JPEG XL was done by FUIF, am I misunderstanding something?


Lossless transcoding of JPEG bitstreams is done by Brunsli, there is also lossless storage of pixels, based on tech from FUIF plus an adaptive predictor by Alex Rhatushnyak.


Plus the 'select' predictor from WebP lossless :-)


Plus tons of cross-pollination from the PIK/VarDCT mode, e.g. the (MA)ANS and Brotli entropy coding and the Patches intra/inter block copy mechanism :)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: