CRIME is one of The Great crypto bug classes to be found in the last 10 years or so; it's a side channel leak based not on timing, error handling, or power consumption, but on pure traffic analysis. Seriously excellent work.
Importantly, and I think this a point Juliano and Thai undersold, the big issue with CRIME wasn't simply that it impacted TLS, but that it implicated an entire cryptographic implementation technique (one that I'd add Applied Cryptography recommends). The immediate feeling I got when I learned about CRIME was "this is going to take out a lot of systems"; it's a serious threat anywhere you have chosen plaintext and compression.
It's not surprising that there are other scenarios in TLS where CRIME works, but the big thing to learn from today's talk is that you should go out and look for other places that compress before encrypting.
It's possible, but it would require the compressor to be aware of locations of 'secret' data in its input.
More details:
The very common compression algorithm used by TLS and gzip is DEFLATE: an LZ77 transformation combined with Huffman coding. LZ77 turns a sequence of text into a sequence of literal instructions (output "abcd"), and copy instructions (output 5 bytes from 10 bytes back). Huffman coding lets you encode an alphabet with varying token probabilities with different length strings of bits-- think Morse code, where a common letter like E is one dot, but an uncommon letter like Z is dash-dash-dot-do.
CRIME detects the different ciphertext sizes that result when encoding a secret as literal data versus as a copy instruction.
There are a few tweaks that could be effective. Assuming the secret data is identified, it can be specifically coded so that it's coded as part of a literal -- not replaced by a copy instruction, and not used as reference data for any future copy instructions.
This could still leak some data, since a secret with more common letters might be Huffman coded to a shorter bitstring and detected. Luckily, DEFLATE has a way to indicate a "raw" block, which is just the literal data: 8 bytes of secret data would always take 8 + 5 (block header length) bytes to encode, and wouldn't be referenced by any future copy instructions.
How should a web app indicate to potential upstream compressors that some portion of its output is secret? A special HTML tag? A new HTTP Secret-Ranges header?
Sure, you can manually isolate secret information from attacker information. That wasn't really what I was looking for.
I was referring to a general compression scheme that could be applied pre-encryption and not leak info when combined with partial plaintext oracle attacks. I don't think such a thing is possible, but it would be awesome if some really smart researchers could prove me wrong.
Most compressors are adaptive: they adjust their state depending on previous inputs. A static compressor wouldn't leak anything-- for HTML, it might have preset dictionaries for common tags and attributes (instead of <div>, output a single token), and have a static Huffman encoding tuned to "average" HTML pages. The compression wouldn't be nearly as good as DEFLATE since it has to compress each chunk in isolation, but it would still beat plaintext.
My thought exactly: Is it now very dangerous to combine compression with encryption? Is there a general mitigation strategy or are the two mutually exclusive?
If we had fixed-ration compression, the attack would be impossible.
You could attempt to simulate that: after compression, add random padding so that it looks like a fixed 2:1 compression ratio.
This is wasteful if the data compressed better than 2:1. And if the data doesn't compress as well as 2:1, I'm not sure what you would do. (Maybe do no compression. But now you've leaked a tiny bit of info...)
Seems to me that if a compression scheme simply introduced a random size adjustment with each compression, it could defeat the subtle size measurements necessary for this attack to work. Might not even have to be very large random adjustments.
CRIME is one of The Great crypto bug classes to be found in the last 10 years or so; it's a side channel leak based not on timing, error handling, or power consumption, but on pure traffic analysis. Seriously excellent work.
Importantly, and I think this a point Juliano and Thai undersold, the big issue with CRIME wasn't simply that it impacted TLS, but that it implicated an entire cryptographic implementation technique (one that I'd add Applied Cryptography recommends). The immediate feeling I got when I learned about CRIME was "this is going to take out a lot of systems"; it's a serious threat anywhere you have chosen plaintext and compression.
It's not surprising that there are other scenarios in TLS where CRIME works, but the big thing to learn from today's talk is that you should go out and look for other places that compress before encrypting.