Hacker News new | past | comments | ask | show | jobs | submit login
Interactively edit individual DCT blocks in any JPEG image in the browser (github.com/omarshehata)
223 points by tambourine_man on March 8, 2022 | hide | past | favorite | 27 comments



Despite being lossy, JPEG is one of the more magical forms of image compression in my experience. The implementation for a JPEG encoder/decoder can be made to be insanely fast on SIMD-capable hardware (see: libjpegturbo). The practical effect of JPEG on a complex scene is almost a miracle when you consider how the alternatives work. The frequency domain and our perception of it is truly a wonder.


This has more value than education. I wish that I could use this as an editor to put finishing touches on images so web pages go faster for people with less bandwidth. Photoshop, GIMP, and ImageMagick can't do this. The quality slider is a blunt tool that applies to the whole image. It should be possible to define a quality mask, since things outside the field of view can safely be compressed more. But classic algorithms aren't good at determining what that is, plus jpeg isn't conventionally thought of as being the canvas.


It would be interesting to pair this with something like MaskRCNN or something which does semantic masking, and then be able to choose the compression levels of different masks.


> should be possible to define a quality mask

I vaguely remember that PS's ‘save for web’ dialog actually had that.


I'm looking at both their save for web dialogs presently and I can assure you that it doesn't.


From what I remember, you could choose a layer to use as a mask for quality. However, I haven't touched PS in more than ten years, and I can't even find screenshots of versions earlier than ~CS5 from a quick search. Perhaps it was such an obscure feature than it was yanked out.

I may also be mistaking this with settings for some filter or such—but I'm fairly sure it was a setting particularly for quality of something.


I once wrote a JPEG 'recompressor' that could throw away some of the DCT coefficients, so you could dynamically adjust the size of images when streaming them to a web browser with limited bandwidth - just like the slider in this demo.

It was clunky and unreliable (estimating bandwidth quickly is hard!) but I really liked the way that the JPEG format allows you to increase the compression of a file without having to fully decompress and then recompress the image. You literally can just throw away some bits from the file here and there. I wonder if that is possible with other lossy data formats (video or audio)?


I seem to remember that "bit peeling" was designed into Ogg Vorbis audio from early on, such that the data stream could be encoded in order of decreasing audio perceptibility such that data frames could be truncated at certain points and you'd get a good lower-bitrate stream. However, I've never read of a Vorbis encoder that actually encoded that way.


It was a valiant effort, but ultimately abandoned.


I think it's unique to JPEG due to its simplicity. JPEG encodes blocks independently, which makes manipulation of block data easy (also enables progressive rendering).

All later DCT-based formats try to get more compression by predicting future blocks from previous ones, which makes all blocks potentially interdependent, and distortions from such quick recompression could be amplified exponentially.


So you're the reason Cache-Control:no-transform exists. Bravo.


Wow, this feels like the most efficient way I could possibly have ever learned how JPEG compression works. Well done!

Now I wonder how color channels are added.


Color comes from two additional "chroma planes", which are also monochrome images that are compressed the same way.

Example of an image split into three planes: https://upload.wikimedia.org/wikipedia/commons/d/d9/Barns_gr...

Often the chroma planes are stored at a smaller resolution and scaled up when decoding, since details in chroma don't matter as much as details in brightness.


You have 2 chroma channels [1] at half resolution (4:2:0) or full resolution (4:4:4):

https://en.wikipedia.org/wiki/YCbCr#/media/File:CCD.png

It's almost always 4:2:0. Then you compress the chroma channels the same way as the luma.

[1]: https://en.wikipedia.org/wiki/Chroma_subsampling#Sampling_sy...


Yep -- superb work and it makes it much more concrete.

I was (maybe quite unusually for a CS degree at the time) taught about JPEG compression in 94/95; I've carried a sort of bare-minimum working knowledge of how it works along with me through web development and semi-pro photography, and it can be quite useful knowledge (e.g. when optimising "retina" images for the web).

But I'd have loved an interactive tool like this at the time, because it is not the easiest thing to absorb.


Yep, this is a great little demo. I love these little bite-sized pieces of instruction. Well done.


I never really understood how image compression works but now I feel a bit closer. Cool project!


Just to be nitpicky: JPEG is only one form of lossy image compression. Lossless algorithms are completely different. MPEG & co are based on something similar like JPEG. Apparently WEBP (the lossy format that has been gaining some traction in the last years) also uses DCT (or a Walsh–Hadamard transform) for its blocks, but with some additional "tricks" like block prediction.


Right. The lossless algorithms are mostly related to general insights for efficiently storing common patterns of data, plus some component of figuring out how to express image data in a way that makes it most suitable for that existing "common pattern" compression. This begins with Run Length Encoding, the insight that we can store 8 times Z rather than "ZZZZZZZZ" if we're careful to do so unambiguously.

In contrast the lossy algorithms are about specifically what humans don't see well or don't care about. For example in vision humans don't see colours with the same accuracy as brightness because of how their vision works. When it comes to moving images, humans don't see as much detail at all when things are moving so you can blur them and humans barely notice. Humans care a lot about the details of human-like faces, and scarcely at all about grass or water. In audio humans can't really hear certain frequency combinations, so the quieter frequencies can sometimes just be omitted.


it'd be interesting to see how well a crowd of humans using tools like this could replace the encoder. I wonder how much smaller jpegs could get with human-level psychovisual optimization


This was jokingly called the “graduate student algorithm” in the context of fractal image compression. http://mcs.csueastbay.edu/~grewe/CS6825/Mat/Compression/Frac...


This is the key insight (psychoacoustics) that led to the MP3 codec: not just converting a waveform into frequencies, but altering the result in a way that throws away most of the data yet remains indistinguishable to most listeners.


Actually the JPEG quantisation tables were meticulously hand-crafted already


That JPEG can be losslessly (bit-wise identical) transcoded to JPEG XL and back to JPEG, while the JPEG XL is smaller than the JPEG is bizarre to me. Does anyone know how that works?


Better entropy coding. TL;DR JPEG and other "lossy" compression formats are two steps: the lossy step and a lossless step like ZIP. You keep the lossy step as-is and improve the lossless step.


Love this project! I personally had a really good time understanding JPEG format and writing a decoder in Python. It is fairly short and succint. If anyone is interested, I wrote an article about it: https://yasoob.me/posts/understanding-and-writing-jpeg-decod...


Very interesting to play with. Now I have to go read then JPEG spec.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: