Another inefficiency of JPEG is that each block (8x8 pixels in size) is compressed independently[^]. This means that the correlations between pixels that are adjacent across a block boundary are not used. If I were to take a JPEG image and randomly flip its blocks (mirror them along x and/or y axis), the resulting JPEG would have a very similar filesize, even though it's a much less likely image.
Brunsli and, IIUC, Lepton, make use of these correlations.
[^] the average color of blocks is not compressed strictly independently, but the space used on those is small compared to all the rest of the information about a block
Very interesting. This independence across blocks can presumably be leveraged at decode time for faster decoding though. Surely there must be decoders out there parallelizing over the blocks on multi-cores arch /GPU?
Do you know how Brunsli & Lepton fare when it comes to parallelizability?
I assume that you mean parallelizability of decoding and not of encoding.
JPEG's decoding is poorly parallelizable: the entropy decoding is necessarily serial; only inverse FFTs can be parallelized.
Sharing the data about boundaries need not hamper parallelizability in its most simple meaning: imagine a format where we first encode some data for each boundary, and then we encode all the blocks that can only be decoded when provided the data for all its four boundaries.
However, what often matters is the total number of non-cacheable memory roundtrips that each pixel/block of the image has to take part in: a large cost during decoding is memory access time. If we assume that a single row of blocks across the whole image is larger than the cache, then any approaches similar to the one I described in the previous paragraph add one roundtrip.
Another consideration is that a single block is often too small to be a unit of parallelization -- parallelizing entropy decoding usually has additional costs in filesize (e.g. for indices), and any parallelization has startup costs for each task.
The end-result is that a reasonably useful approach to parallelization is to split the image into "large blocks" that are large enough to be units of parallelization on their own, and encode _those_ independently.
In JPEG XL, there's both the original sequential Brunsli, and a tiled version of it (using groups of 256x256 pixels) which can be encoded/decoded in parallel. If you have a 4:4:4 JPEG (no chroma subsampling), you can also instead of Brunsli, use the VarDCT mode of JPEG XL, where all the block sizes happen to be 8x8. Compression is similar to that of Brunsli, and it's even slightly faster (but it doesn't allow you to reconstruct the JPEG _file_ exactly, only the _image_ itself is lossless in that case).
That might prove to be a good measure of image compression 'efficiency'.
Present to a user two images, one an image compressed by image compressor X, and one compressed by the same image compressor with a single bit of output flipped.
In an ideal image compression scenario, the decompressed images would not be the same, but a user could not tell which was the 'correct' image, since both would look equally realistic.
If a scheme had something like that property and satisfied some simpler conditions, I would wager that it necessarily is a good compression scheme. However, this is very much not required of a good compression scheme:
Imagine that a compression scheme used the first bit to indicate if the encoded image is an image of a cat or not. Changing that bit would then have very obvious and significant implications on the encoded image.
If that example seems too unrealistic, imagine a modification of a compression scheme that, before decoding, xors every non-first bit with the first bit. Then flipping the first bit in the modified scheme is equivalent to flipping a lot of bits in the unmodified scheme, but they are equivalently good at encoding images.
Edit: To put it short, the important property is that equally-long encoded images are "equally plausible": it's not important how many bits differ between them, and it doesn't matter if they are similar to each other.
Ah, thank you. I misread the GP. It seems that he is saying nearly[^] exactly what I wanted to say in the edit.
[^] the property should hold not only for single-bit changes, but all length-preserving changes -- it's perfectly fine for all single bitflips to e.g. result in invalid codestreams.
Brunsli and, IIUC, Lepton, make use of these correlations.
[^] the average color of blocks is not compressed strictly independently, but the space used on those is small compared to all the rest of the information about a block
Disclaimer: I've worked on Brunsli.