Using that comment as a springboard: CRIME is one of The Great crypto bug classe...

AnIrishDuck · on Aug 1, 2013

Are you aware of any research into compression schemes that address these kind of length-leak attacks?

Intuitively, I'm not even sure such a thing is possible.

Scaevolus · on Aug 1, 2013

It's possible, but it would require the compressor to be aware of locations of 'secret' data in its input.

More details:

The very common compression algorithm used by TLS and gzip is DEFLATE: an LZ77 transformation combined with Huffman coding. LZ77 turns a sequence of text into a sequence of literal instructions (output "abcd"), and copy instructions (output 5 bytes from 10 bytes back). Huffman coding lets you encode an alphabet with varying token probabilities with different length strings of bits-- think Morse code, where a common letter like E is one dot, but an uncommon letter like Z is dash-dash-dot-do.

CRIME detects the different ciphertext sizes that result when encoding a secret as literal data versus as a copy instruction.

There are a few tweaks that could be effective. Assuming the secret data is identified, it can be specifically coded so that it's coded as part of a literal -- not replaced by a copy instruction, and not used as reference data for any future copy instructions.

This could still leak some data, since a secret with more common letters might be Huffman coded to a shorter bitstring and detected. Luckily, DEFLATE has a way to indicate a "raw" block, which is just the literal data: 8 bytes of secret data would always take 8 + 5 (block header length) bytes to encode, and wouldn't be referenced by any future copy instructions.

How should a web app indicate to potential upstream compressors that some portion of its output is secret? A special HTML tag? A new HTTP Secret-Ranges header?

AnIrishDuck · on Aug 2, 2013

Sure, you can manually isolate secret information from attacker information. That wasn't really what I was looking for.

I was referring to a general compression scheme that could be applied pre-encryption and not leak info when combined with partial plaintext oracle attacks. I don't think such a thing is possible, but it would be awesome if some really smart researchers could prove me wrong.

Scaevolus · on Aug 3, 2013

Most compressors are adaptive: they adjust their state depending on previous inputs. A static compressor wouldn't leak anything-- for HTML, it might have preset dictionaries for common tags and attributes (instead of <div>, output a single token), and have a static Huffman encoding tuned to "average" HTML pages. The compression wouldn't be nearly as good as DEFLATE since it has to compress each chunk in isolation, but it would still beat plaintext.

agl · on Aug 1, 2013

It's certainly possible: https://www.imperialviolet.org/2012/09/21/crime.html

api · on Aug 1, 2013

My thought exactly: Is it now very dangerous to combine compression with encryption? Is there a general mitigation strategy or are the two mutually exclusive?

IvyMike · on Aug 3, 2013

If we had fixed-ration compression, the attack would be impossible.

You could attempt to simulate that: after compression, add random padding so that it looks like a fixed 2:1 compression ratio.

This is wasteful if the data compressed better than 2:1. And if the data doesn't compress as well as 2:1, I'm not sure what you would do. (Maybe do no compression. But now you've leaked a tiny bit of info...)

snowwrestler · on Aug 2, 2013

Seems to me that if a compression scheme simply introduced a random size adjustment with each compression, it could defeat the subtle size measurements necessary for this attack to work. Might not even have to be very large random adjustments.