tl;dr: 00xxxxxx - copy (x+1)-th last EXPLICITLY ENCODED pixel (i.e. ignoring rep...

mkl · on Nov 24, 2021

You might be able to do better compression by traversing the pixels in Hilbert [1] order or Z [2] order, to take into account vertical neighbours as well as horizontal. It might cost time or space though, as you couldn't stream the pixels through it anymore - you'd need them all in memory.

[1] https://en.wikipedia.org/wiki/Hilbert_curve

[2] https://en.wikipedia.org/wiki/Z-order_curve

orlp · on Nov 24, 2021

You could still stream the pixels, you'd just have to stream them in Z order. This may seem pedantic, but this applies to any mismatch between the order in which you store pixels and the order in which the format stores pixels.

E.g. some file formats may be row-major, others column-major, some block-based (like JPEG) or a fancy interleaving scheme (like Adam7). Some might store the channels interleaved, others separate the channels. If any of these choices doesn't match the desired output format, it breaks streaming.

akx · on Nov 24, 2021

And formats like BMP are stored upside down...

EdSchouten · on Nov 24, 2021

The nice thing is if M >= B^2 (i.e., total memory is large enough to fit a square region of the image, where each row/column of the square fits a full block/page of memory), you can transform from row/column order to Hilbert/Z-order without needing to do more I/Os.

So you can't do such a conversion in a streaming fashion, but there is no need to load all data in memory either.

andai · on Nov 24, 2021

See also: Intel's guide to looping over smaller chunks of 2D arrays for better cache utilization:

https://www.intel.com/content/www/us/en/developer/articles/t...

EdSchouten · on Nov 24, 2021

Yeah, those images explain it pretty well. There is one slight difference, though.

Because the Hilbert/Z-order curves are defined recursively, algorithms for converting between the curve and row/column order are "cache-oblivious", in that they don't need to take an explicit block size. You write it once, and it performs well on any system regardless the CPU cache size and/or page size.

andai · on Nov 24, 2021

I wonder if the pixel data were transformed to/from a different order, before and after the compression/decompression, if that would speed things up without introducing too much slowdown of its own?

adrianN · on Nov 24, 2021

Iteration order through arrays has a big impact on performance due to cache effects. I'd guess that you'd lose a lot of performance by a complicated iteration scheme.

chrisdew · on Nov 24, 2021

The current encoding is complete, so no future extensions are possible.

A fix would be to change:

  110rrrrr ggggbbbb - copy the last pixel and adjust RGB by (r-15, g-7, b-7)

to:

  1100rrrr ggggbbbb - copy the last pixel and adjust RGB by (r-7, g-7, b-7)

This leaves:

  1101???? ...

available for future extensions.

HanClinto · on Nov 25, 2021

So I implemented this change and tested it out. Turns out, in my simple test suite, this actually changes the size of images. Not drastically. Often just a 5-10 kb difference (on a 500-600kb file), but that's still more than I expected for changing r-15 to r-7.

phire · on Nov 25, 2021

Suggests the DIFF16 delta-color mode is responsible for quite a bit of the savings. Maybe it would be worth experimenting the exact encoding.

One idea would be to start a predicted color (calculate the delta of the previous two pixels, apply that, then specify a delta to that). Another would be to encode the delta in YUV or some other color space, and then experiment with the best balance of bits between those channels.

Perhaps it would be better to steal bits from RUN16 instead, I somewhat doubt it's usefulness.

HanClinto · on Nov 24, 2021

I really like this.

bonzini · on Nov 24, 2021

> 00xxxxxx - copy (x+1)-th last EXPLICITLY ENCODED pixel (i.e. ignoring repeats)-

Not exactly. It's "copy the last pixel color for which (r^g^b^a)&63 == x".

lifthrasiir · on Nov 24, 2021

Oh you are right, I only took a cursory look at the header file and thought the hashing is only used to speed up the lookup.

userbinator · on Nov 24, 2021

I see it as a variant of LZ specialised to images, so its performance should not be too surprising. Running an uncompressed bitmap through a general-purpose compression algorithm tends to yield similar results to PNG.

miohtama · on Nov 24, 2021

This is so neat. Reminds me the recent LZ4 post who manually hand tweaking the encoding one could still create a decent compression boosts after all these decades.