Hacker News new | past | comments | ask | show | jobs | submit login

I think the LZ family are all pretty intuitive --- just replace repeated sequences with a reference to where they occurred before. Even human languages have things like contractions, abbreviations, and acronyms. Thus it is, at least to me, somewhat surprising that LZ was only documented academically several decades after Huffman; perhaps it was thought to be too trivial? LZ can also be thought of as an extension of RLE.

In any case, an LZ decompressor fits in less than 100 bytes of machine instructions and a simple compressor can be implemented in not more than several hundred, all the while providing extremely high compression for its simplicity. It will easily outperform order-0 static or dynamic Huffman on practical files like English text, and would probably make a good assignment in an undergraduate-level beginning data structures/algorithms/programming course; yet it seems more popular to give an assignment on Huffman using trees, which is somewhat ironic since in the real world Huffman is implemented using bit operations and lookup tables, not actual tree data structures.

To give a trivial example, LZ will easily compress ABCDABCDABCDABCD while order-0 Huffman can't do much since each individual symbol has the same frequency.




Another LZ fan here.

My guess is that the "late" development of LZ is due to mainly two reasons:

i) At that moment the pattern matching algorithms were not so advanced. E.g. suffix tree was very recent, and in the next years lots of advances occurred in that area...

ii) Although LZ can appear easier or more intuitive than Huffman, I think it is much less intuitive to prove a good bound in the compression achieved by LZ. OTOH, Huffman is build in a way that shows that it achieves zeroth-order compression.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: