The author of the article would have been wise to include this information. I di...

The author of the article would have been wise to include this information.

I did find "Clustering by Compression", https://arxiv.org/abs/cs/0312044

The emphasis in this paper was not on "efficiency", but rather on phylogenetic grouping. The authors used the "Universal Declaration of Human Rights", which apparently has 52 translations as the subject text. See Fig 13 and section 5.2 for details. Instead of just compressing and normalizing, they developed a normalized compression distance, which involves concatenating two texts, compression, and division by the largest of the two texts, as compressed alone.

I think that using the same encoding of all the languages and some kind of normalizing would wash the results of differences due to encodings.