Despite being lossy, JPEG is one of the more magical forms of image compression in my experience. The implementation for a JPEG encoder/decoder can be made to be insanely fast on SIMD-capable hardware (see: libjpegturbo). The practical effect of JPEG on a complex scene is almost a miracle when you consider how the alternatives work. The frequency domain and our perception of it is truly a wonder.
This has more value than education. I wish that I could use this as an editor to put finishing touches on images so web pages go faster for people with less bandwidth. Photoshop, GIMP, and ImageMagick can't do this. The quality slider is a blunt tool that applies to the whole image. It should be possible to define a quality mask, since things outside the field of view can safely be compressed more. But classic algorithms aren't good at determining what that is, plus jpeg isn't conventionally thought of as being the canvas.
It would be interesting to pair this with something like MaskRCNN or something which does semantic masking, and then be able to choose the compression levels of different masks.
From what I remember, you could choose a layer to use as a mask for quality. However, I haven't touched PS in more than ten years, and I can't even find screenshots of versions earlier than ~CS5 from a quick search. Perhaps it was such an obscure feature than it was yanked out.
I may also be mistaking this with settings for some filter or such—but I'm fairly sure it was a setting particularly for quality of something.
I once wrote a JPEG 'recompressor' that could throw away some of the DCT coefficients, so you could dynamically adjust the size of images when streaming them to a web browser with limited bandwidth - just like the slider in this demo.
It was clunky and unreliable (estimating bandwidth quickly is hard!) but I really liked the way that the JPEG format allows you to increase the compression of a file without having to fully decompress and then recompress the image. You literally can just throw away some bits from the file here and there. I wonder if that is possible with other lossy data formats (video or audio)?
I seem to remember that "bit peeling" was designed into Ogg Vorbis audio from early on, such that the data stream could be encoded in order of decreasing audio perceptibility such that data frames could be truncated at certain points and you'd get a good lower-bitrate stream. However, I've never read of a Vorbis encoder that actually encoded that way.
I think it's unique to JPEG due to its simplicity. JPEG encodes blocks independently, which makes manipulation of block data easy (also enables progressive rendering).
All later DCT-based formats try to get more compression by predicting future blocks from previous ones, which makes all blocks potentially interdependent, and distortions from such quick recompression could be amplified exponentially.
Often the chroma planes are stored at a smaller resolution and scaled up when decoding, since details in chroma don't matter as much as details in brightness.
Yep -- superb work and it makes it much more concrete.
I was (maybe quite unusually for a CS degree at the time) taught about JPEG compression in 94/95; I've carried a sort of bare-minimum working knowledge of how it works along with me through web development and semi-pro photography, and it can be quite useful knowledge (e.g. when optimising "retina" images for the web).
But I'd have loved an interactive tool like this at the time, because it is not the easiest thing to absorb.
Just to be nitpicky: JPEG is only one form of lossy image compression. Lossless algorithms are completely different. MPEG & co are based on something similar like JPEG. Apparently WEBP (the lossy format that has been gaining some traction in the last years) also uses DCT (or a Walsh–Hadamard transform) for its blocks, but with some additional "tricks" like block prediction.
Right. The lossless algorithms are mostly related to general insights for efficiently storing common patterns of data, plus some component of figuring out how to express image data in a way that makes it most suitable for that existing "common pattern" compression. This begins with Run Length Encoding, the insight that we can store 8 times Z rather than "ZZZZZZZZ" if we're careful to do so unambiguously.
In contrast the lossy algorithms are about specifically what humans don't see well or don't care about. For example in vision humans don't see colours with the same accuracy as brightness because of how their vision works. When it comes to moving images, humans don't see as much detail at all when things are moving so you can blur them and humans barely notice. Humans care a lot about the details of human-like faces, and scarcely at all about grass or water. In audio humans can't really hear certain frequency combinations, so the quieter frequencies can sometimes just be omitted.
it'd be interesting to see how well a crowd of humans using tools like this could replace the encoder. I wonder how much smaller jpegs could get with human-level psychovisual optimization
This is the key insight (psychoacoustics) that led to the MP3 codec: not just converting a waveform into frequencies, but altering the result in a way that throws away most of the data yet remains indistinguishable to most listeners.
That JPEG can be losslessly (bit-wise identical) transcoded to JPEG XL and back to JPEG, while the JPEG XL is smaller than the JPEG is bizarre to me. Does anyone know how that works?
Better entropy coding. TL;DR JPEG and other "lossy" compression formats are two steps: the lossy step and a lossless step like ZIP. You keep the lossy step as-is and improve the lossless step.
Love this project! I personally had a really good time understanding JPEG format and writing a decoder in Python. It is fairly short and succint. If anyone is interested, I wrote an article about it: https://yasoob.me/posts/understanding-and-writing-jpeg-decod...