You might be able to do better compression by traversing the pixels in Hilbert [...

orlp · on Nov 24, 2021

You could still stream the pixels, you'd just have to stream them in Z order. This may seem pedantic, but this applies to any mismatch between the order in which you store pixels and the order in which the format stores pixels.

E.g. some file formats may be row-major, others column-major, some block-based (like JPEG) or a fancy interleaving scheme (like Adam7). Some might store the channels interleaved, others separate the channels. If any of these choices doesn't match the desired output format, it breaks streaming.

akx · on Nov 24, 2021

And formats like BMP are stored upside down...

EdSchouten · on Nov 24, 2021

The nice thing is if M >= B^2 (i.e., total memory is large enough to fit a square region of the image, where each row/column of the square fits a full block/page of memory), you can transform from row/column order to Hilbert/Z-order without needing to do more I/Os.

So you can't do such a conversion in a streaming fashion, but there is no need to load all data in memory either.

andai · on Nov 24, 2021

See also: Intel's guide to looping over smaller chunks of 2D arrays for better cache utilization:

https://www.intel.com/content/www/us/en/developer/articles/t...

EdSchouten · on Nov 24, 2021

Yeah, those images explain it pretty well. There is one slight difference, though.

Because the Hilbert/Z-order curves are defined recursively, algorithms for converting between the curve and row/column order are "cache-oblivious", in that they don't need to take an explicit block size. You write it once, and it performs well on any system regardless the CPU cache size and/or page size.

andai · on Nov 24, 2021

I wonder if the pixel data were transformed to/from a different order, before and after the compression/decompression, if that would speed things up without introducing too much slowdown of its own?

adrianN · on Nov 24, 2021

Iteration order through arrays has a big impact on performance due to cache effects. I'd guess that you'd lose a lot of performance by a complicated iteration scheme.