LZW and GIF explained

follower · 2024-05-25T08:31:46 1716625906

My personal favourite GIF-related explainer:

* "What's In A GIF": https://www.matthewflickinger.com/lab/whatsinagif/

It's a five part explanation that includes both great visualisations and bespoke analysis tools[2].

Here's a direct link to the section on LZW compression[0]:

* https://www.matthewflickinger.com/lab/whatsinagif/lzw_image_...

[0] Which somewhat confusingly credits "John Barkaus's LZW and GIF Explained" which actually turns out[1] to be... the text of "LZW and GIF explained----Steve Blackstock" via a news group post by John Barkaus (and is the text at OP's HN post URL).

[1] http://web.archive.org/web/20050217131148/http://www.danbbs....

[2] Including this fantastic example of visualizing individual sections of a multi-part binary file format, the UX/UI of which I think would be a great addition to a "hex viewer" application: https://www.matthewflickinger.com/lab/whatsinagif/gif_explor...

aqf0 · 2024-05-26T22:12:29 1716761549

Speak of the devil! I also cited Matthew's specific blog post in my LZW GIF encoder implementation: https://github.com/alexqfredrickson/LzwGifTools

follower · 2024-05-29T18:26:58 1717007218

Also, nifty: https://github.com/alexqfredrickson/vcvj :)

(I've done some VJing in my time, so it's always cool to see related projects--I particularly like the effect on the ping-pong ball in the examples with its bounce path trail. :) )

follower · 2024-05-29T18:20:51 1717006851

Understandably, it's such a great resource! :)

salamo · 2024-05-25T03:08:00 1716606480

I found out the other day that zstd supports dictionary compression [0]. It's a pretty neat concept: you can "train" a dictionary on your corpus of text, and use this dictionary to compress subsequent documents. This is really useful if you have a lot of small documents with a similar distribution of characters. You only have to transmit the dictionary once over the wire once, and then clients can get great compression ratios after that.

For example, suppose you have a lot of chess PGNs you'd like to store [1]. If you compress each PGN separately, you won't get nearly as high of a compression ratio as if you stored them all in the same file and compressed that file. But storing games individually is more convenient. Here are the numbers:

Original PGN file: 780 bytes

Default zstd compression: 508 bytes (35% decrease)

With dictionary: 376 bytes (52% decrease)

I'm thinking about ways to compress at an even higher ratio. Chess PGNs are a great candidate for this task because they have a huge amount of redundant and repeated text.

[0] https://news.ycombinator.com/item?id=35917279

[1] https://en.wikipedia.org/wiki/Portable_Game_Notation

kevingadd · 2024-05-25T04:08:27 1716610107

The shared dictionary technique you describe is presently used for modern versions of HTTPS and for Brotli, IIRC. Very effective.

follower · 2024-05-25T09:22:20 1716628940

...though with the slightly unexpected side effect (for Brotli, at least) that your executable may end up containing (~200KB, from memory) of very unexpected[-1] plain text strings which might (& has[0]) lead to questions from software end-users asking why your software contains "random"[1] text (including potentially "culturally sensitive" words/phrases related to religion such as "Holy Roman Emperor", "Muslims", "dollars", "emacs"[2] or similar).

(I encountered this aspect while investigating potential size optimization opportunities for the Godot game engine's web/WASM builds--though presumably the Brotli dictionary compresses well if the transfer encoding is... Brotli. :D )

---- footnotes ----

[-1] An aspect also previously mentioned on HN: https://news.ycombinator.com/item?id=27160590

[0] "This needs to be reviewed immediately #876": https://github.com/google/brotli/issues/876

[1] Which, regardless of meaning, certainly bears similarities to the type of "unexpected weird text" commonly/normally associated with spam, malware, LLMs and other entities of ill repute.

[2] The final example may not actually be factual. :)

RicoElectrico · 2024-05-25T10:20:13 1716632413

Just XOR it with 0x55 ;)

follower · 2024-05-25T20:35:53 1716669353

"Pros always use this one weird trick Brotli compression hates." :)

kjhcvkek77 · 2024-05-27T10:12:55 1716804775

Lichess is open source and has a very clever custom pgn compression. They analyze the chess position and use various features to determine how likely various moves are, then use that distribution to encode the move.

fsmv · 2024-05-26T17:00:51 1716742851

Thanks for reminding me I implemented this 10 years ago. Here it is in 180 lines* of C https://github.com/fsmv/Tiny-GIF/blob/master/src/LZW.c

* The dictionary part takes up a similar amount so maybe 2x that

I think some image viewers didn't like how I made the gif data and then I didn't care enough to make it perfect because LZW was the interesting part and that worked.

mjevans · 2024-05-25T00:14:14 1716596054

That's probably an OK writeup but seems focused on someone who's good at learning without doodles and pictures. A visual / kinesthetic variation would be nice, even if it's just ASCII art of the data patterns.

duskwuff · 2024-05-25T03:07:33 1716606453

A comment in the HTML source says this page was last modified in 1997. I think I can forgive the author for not including diagrams.

mjevans · 2024-05-25T17:33:32 1716658412

Surely PRE blocks existed in 1997. "even if it's just ASCII art of the data patterns."

robinsonb5 · 2024-05-25T08:30:05 1716625805

Yeah, .gif files weren't around back then. ;)