There's the hutter prize which is related to that. I feel like it sort of missed the mark in terms of scale though: the compression techniques needed to compress a relatively "small" and specific dataset like Wikipedia are completely different from the techniques needed to "compress" the entire internet. It's only with the latter that we're seeing something interesting start to happen. The other constraint the hutter prize has is lossless compression, which isn't really conducive to general "learning"