Hacker News new | past | comments | ask | show | jobs | submit login

If you really want to understand this you should read the paper. However, I will try my best to summarize my understanding of what is going on. (Disclaimer: I am not a professional cryptographer, but I know something about crypto-complexity).

Many proofs in cryptography start with the following assumption: assume x is a bitstring of length N drawn from a uniformly random distribution of bit strings. This assumption is then used as the basis of the proof of the cryptographic construct.

For instance in a provably secure stream cipher you start with that assumption then proceed to show that if weak one-way-functions exist then you can construct a secure pseudo random number generator that is polynomially indistinguishable from a true random source. You then use your assumed random bit string, x, as the key to the psuedo random number generator to generate a one time pad to XOR with your message.

Since the secure psuedo random number generator is indistinguishable from a true random number generator (by a polynomial attacker) the message is secure (from a polynomial attacker). However, this is all predicated on the key being drawn from a uniform distribution. If the key is NOT drawn from a uniform distribution than there may be a potential speed-up available to the attacker.

What the paper shows is that often it is assumed that words drawn from an IID distribution of letters can be treated as if the distribution of words is uniform (given a sufficiently large word size). However, this distribution while very flat is not quite uniform. They show how this can be exploited using the "Guesswork" framework to come up with a tighter bound on the time it would take to guess a chosen word. This bound is much better than the traditional bound used.

What this means is that there needs to be more margin given than previously thought when dealing with keys derived from non-uniform distributions. Such keys include passwords.




The paper has nothing to do with the key being chosen at random(edit: in general, in the linked biometric paper it might have been the case) in the way you are suggesting. Not just are key's assumed to be chosen uniformly at random, in practice they actually are (either via software PRNGs or hardware).

The paper is talking about the distribution of plain-texts. As you say, that the IID distribution of letters cannot be treated as if the distribution of words is uniform

This is not a common assumption in cryptography(typically because we assume the attacker knows the real distribution of the plain-texts, the idea being they learn no more about the distribution given the ciphertext) and the papers cited that make this assumption are from sub-par conferences. So it would be more accurate to say some people made a mistake that is neither that common nor a core tenant of cryptography.


One of the most fascinating things from my crypto classes was that you can prove an algorithm secure if the keyspace is drawn from all possible keys, including a null key that returns the same ciphertext as plaintext.

If you alter your keyspace to exclude that one null key, the proof no longer works.

I still feel like sitting down and staring at the wall when I think about it.


That reminds me of the Enigma machines in WWII: The thing was set up such that a given letter in plaintext could not end up as that same letter in cyphertext. This was supposedly to add security, but it turned out not only to be a weakness, but one that was critical in breaking the code.

That one makes sense, in a way: If you see a S in cyphertext, you know the plaintext did not have an S in that position. That eliminates 4% of all possible plaintexts, and by extension 4% of all private keys. That's kind of a metric ass-ton of entropy you just lost right there, for every single character.

My guess is, your proof works by a qualitatively similar principle: Restricting the keyspace, no matter how trivially, gives the attacker information. And encryption is all about magnifying information.


I think you meant "could not end up as that same letter"?


Usually that's an indication that the specific mathematical property that you're proving is not what we mean when we say 'secure'.


Yes, but in a good way, generally it's far stronger than our common sense idea of "secure".


I meant quite the opposite, that if what the GP said is true, than the specific theorem being proved does not share the properties implied by the common sense idea of "secure," and is therefore insufficient.


That's not neccessary true. Take for example, in the case where the proof shows that resulting ciphertext would be indistinguishable from a random function, so the adversary would have 1/2^n (with n = key length, 128/192/256 etc, ) chance to decipher the text by guessing the key uniformly random. And he also has another 1/2^n chance to break the system, by us picking the null key (randomly). That would result in the chance of 1/2^(n-1)

On the other hand, the proof shows that leaving the null key out of sample space means the adversary would have 1/2^48 (arbitrary number) to distinguish the ciphertext from a random function, which means that it's "insecure" in the context of the proof.

It seems to me that this notion of secure would still be stronger than our common sense idea of "secure".


FTA:

> Traditionally, information-theoretic analyses of secure schemes have assumed that the source files are perfectly uniform. In practice, they rarely are, but they’re close enough that it appeared that the standard mathematical analyses still held.

After reading your comment, I'm confused. Are they talking about the entropy of the key or of the plaintext?


> What the paper shows is that often it is assumed that words drawn from an IID distribution of letters can be treated as if the distribution of words is uniform

Is this assumption really that common? It seems like a strange assumption to make, since even biased coin flips with P(Head) = 0.99 would be independent and identically distributed.


Not being any sort of expert I don't know how important their result is. They cite a couple of papers in which use AEP (Asymptotic Equipartition Property) to treat words drawn from IID distributions of letters as uniform. The paper I looked at was in turn about brute force attacks. So essentially this paper is arguing that analysis of brute force attacks must necessarily account for the non-uniform structure of the source distribution.

On the surface it seems like this only really effects things which rely on treating these types of distributions. It is difficult for me to say how this effects the analysis of primitives when using keys derived from user input. In practice I don't think it makes much difference as we are already getting quite good at exploiting the distributions from which people seem to draw passwords. However, it may have some important implications for the theoretical side of cryptography.


That doesn't sound identically distributed to me... am I missing something?


It's not that the P(heads)=P(tails), but rather the coins are identical to each other. Each throw of the coin has the same probability and is independent of other throws of the coin.


What the paper shows is that often it is assumed that words drawn from an IID distribution of letters can be treated as if the distribution of words is uniform (given a sufficiently large word size). However, this distribution while very flat is not quite uniform. They show how this can be exploited using the "Guesswork" framework to come up with a tighter bound on the time it would take to guess a chosen word. This bound is much better than the traditional bound used.

By it being very flat but not uniform, what do you mean? I'm not clear on this point.


I mean that the distribution is very close to uniform. A uniform distribution is drawn a flat line when you graph it. (Inconstrast a normal distribution is a bell curve.) So when I say it is very flat I mean that it there is very little variation in the the pdf.


I really dont understand the article nor your explanation. What does it matter if the plaintext isnt random? The csrng will hide any un-randomness.

> However, this is all predicated on the key being drawn from a uniform distribution

So, we're talking about the generation of the key? Are you trying to say that /dev/random isnt secure (i.e. the source when using algos like SSL)?


What does "a polynomial attacker" mean? I think I get the idea, but the term is not very well known by google.


Generally in theoretical cryptography the adversary is a probabilistic Turing machine that can take K steps to break your encryption, where K is asymptotically bounded by a polynomial function of the length of the input it is trying to break.

You will see this called a 'PPT' in the literature, where PPT = Probabilistic Polynomial Time.


Very generally, if your string is N bits, a polynomial attack is one that can solve it in O(n^i) bits, where i is some fixed number. A non-polynomial time attacker would be something beyond that, like O(2^n).

Technically this treats O(n^2) and O(n^10000) as the same effort.


I think you can loosely interpret polynomial as "practical." Worse than polynomial typically means exponential, which makes the problem too big to be practical to solve. A polynomial-sized problem is considered practical to solve.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: