Hacker News new | past | comments | ask | show | jobs | submit login

Whenever new compression algs come up I can't help but think they're cheating with their dictionaries by pre-defining most common words and character sequences, such as html tags. I would wonder if it's more ideal for human language to simply have the best dictionary. Perhaps even a negotiation on both ends to define what that dictionary is. If browsers had an efficient dictionary of all English words and phrases, compression would seem to tiny.



In this competition, and in similar competitions, the size of the binary used to decompress is taken into account. If you wanted to use a dictionary, you would need to pay for it in binary size. In this competition, the file must be self-decompressing.

Dictionaries are powerful tools when compressing small data. But once the data is large enough they stop mattering so much. See the dictionary compression section of https://engineering.fb.com/core-data/zstandard/.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: