The thing is, we typically have three parts of a compression scheme: the data, t...

duskwuff · on Jan 1, 2024

> {Edit} Ha, looking at the competition, it does indeed include the full size of the gzip binary in the metric. Weird.

The competition is specific to one corpus of test data; it doesn't include testing on other data. If the size of the program weren't included, you could hard-code all of the test data into it and claim victory with an infinite compression ratio.

Fun fact: the idea of using an AI language model as compression has existed since at least 1988. It was mentioned in passing by Vernor Vinge in his short story The Blabber.