[0] Which somewhat confusingly credits "John Barkaus's LZW and GIF Explained" which actually turns out[1] to be... the text of "LZW and GIF explained----Steve Blackstock" via a news group post by John Barkaus (and is the text at OP's HN post URL).
[2] Including this fantastic example of visualizing individual sections of a multi-part binary file format, the UX/UI of which I think would be a great addition to a "hex viewer" application: https://www.matthewflickinger.com/lab/whatsinagif/gif_explor...
(I've done some VJing in my time, so it's always cool to see related projects--I particularly like the effect on the ping-pong ball in the examples with its bounce path trail. :) )
I found out the other day that zstd supports dictionary compression [0]. It's a pretty neat concept: you can "train" a dictionary on your corpus of text, and use this dictionary to compress subsequent documents. This is really useful if you have a lot of small documents with a similar distribution of characters. You only have to transmit the dictionary once over the wire once, and then clients can get great compression ratios after that.
For example, suppose you have a lot of chess PGNs you'd like to store [1]. If you compress each PGN separately, you won't get nearly as high of a compression ratio as if you stored them all in the same file and compressed that file. But storing games individually is more convenient. Here are the numbers:
I'm thinking about ways to compress at an even higher ratio. Chess PGNs are a great candidate for this task because they have a huge amount of redundant and repeated text.
...though with the slightly unexpected side effect (for Brotli, at least) that your executable may end up containing (~200KB, from memory) of very unexpected[-1] plain text strings which might (& has[0]) lead to questions from software end-users asking why your software contains "random"[1] text (including potentially "culturally sensitive" words/phrases related to religion such as "Holy Roman Emperor", "Muslims", "dollars", "emacs"[2] or similar).
(I encountered this aspect while investigating potential size optimization opportunities for the Godot game engine's web/WASM builds--though presumably the Brotli dictionary compresses well if the transfer encoding is... Brotli. :D )
[1] Which, regardless of meaning, certainly bears similarities to the type of "unexpected weird text" commonly/normally associated with spam, malware, LLMs and other entities of ill repute.
[2] The final example may not actually be factual. :)
Lichess is open source and has a very clever custom pgn compression. They analyze the chess position and use various features to determine how likely various moves are, then use that distribution to encode the move.
* The dictionary part takes up a similar amount so maybe 2x that
I think some image viewers didn't like how I made the gif data and then I didn't care enough to make it perfect because LZW was the interesting part and that worked.
That's probably an OK writeup but seems focused on someone who's good at learning without doodles and pictures. A visual / kinesthetic variation would be nice, even if it's just ASCII art of the data patterns.
* "What's In A GIF": https://www.matthewflickinger.com/lab/whatsinagif/
It's a five part explanation that includes both great visualisations and bespoke analysis tools[2].
Here's a direct link to the section on LZW compression[0]:
* https://www.matthewflickinger.com/lab/whatsinagif/lzw_image_...
[0] Which somewhat confusingly credits "John Barkaus's LZW and GIF Explained" which actually turns out[1] to be... the text of "LZW and GIF explained----Steve Blackstock" via a news group post by John Barkaus (and is the text at OP's HN post URL).
[1] http://web.archive.org/web/20050217131148/http://www.danbbs....
[2] Including this fantastic example of visualizing individual sections of a multi-part binary file format, the UX/UI of which I think would be a great addition to a "hex viewer" application: https://www.matthewflickinger.com/lab/whatsinagif/gif_explor...