No. I don't believe there's any disadvantage to this. An MD5 hash is a 128 bit r...

tedunangst · on June 8, 2012

Pedantically, it's 16 random bytes. You aren't going to lose much entropy from an upper/lower/number password unless it's at least 20 characters, or even longer if it's a pass phrase.

tptacek · on June 8, 2012

Sorry; C programmer.

stavros · on June 8, 2012

What's your opinion on SuperGenPass and the like? I'm sure it would still get cracked, but I feel that it would take a lot of effort to crack the provider's hash, even if it were md5, and then less effort to crack the SuperGenPass hash, so hopefully nobody would recognise it or bother...

simonbrown · on June 8, 2012

Not entirely on-topic but SuperGenPass and probably similar bookmarklets has a security problem:

http://akibjorklund.com/2009/supergenpass-is-not-that-secure

stavros · on June 8, 2012

Ah, I hadn't considered that... That's a shame...

icebraining · on June 8, 2012

Just wanted to point out that any implementation as a browser extension (as opposed to bookmarklet) is safe from DOM manipulation; searching on Google for "supergenpass extension" returns results for at least Chrome, Firefox and Opera.

stavros · on June 10, 2012

I'm not so sure, the extension still appends DOM elements, I'm sure those are just as susceptible to sniffing...

simonbrown · on June 10, 2012

I haven't looked at the code or even used it that much, but it seems like it only uses content scripts to insert the password into the field, and everything else is dealt with by the popup/background page, which websites don't have access to.

lukeschlather · on June 8, 2012

An MD5 hash is not a random number; it is generated from some text string. It's possible that bcrypt(salt + MD5(text)) opens you up to collision attacks that are not possible with bcrypt(salt + text). It seems unlikely that it would open you up to attacks that are not possible with md5(text) but MD5 is not a random number so I'm not too sure.

harshreality · on June 8, 2012

The best means of colliding MD5 seems to require one collision block plus some extra "birthday" bits, all of which are controllable by the attacker. [1]

The idea is that you have two messages, m1 and m2, or m1 and m1' if you prefer, and you vary bits in both until you get a collision. You need some area of m1 and m2 that doesn't matter for the application, so that you can change those bits and find a collision. Since all bits of m1 are supplied when entering the password, you have no ability to modify it without getting the user to change his/her password.

If you could collide any arbitrary m1 as it's given to you, then attacks like fake certs with signed MD5 hashes could create the fake cert after submitting it to the CA and getting the signed cert back, rather than before.

Also, the collision process requires knowledge of m1 so you can see the intermediate hash states. If you know m1, the password/passphrase, why are you trying to find a new m1' that hashes to the same value rather than using the pass you already know?

An attack of concern for using MD5 as a password hashing step would be a first preimage attack. [2]

[1] https://www.google.com/search?q=md5+collision+block+birthday... (first link at present is http://www.win.tue.nl/hashclash/SingleBlock/ )

[2] http://en.wikipedia.org/wiki/Preimage_attack

joshuahedlund · on June 8, 2012

I'm still trying to learn this stuff, but I do not understand fundamentally how bcrypt(salt + MD5(text)) could be worse than bcrypt(salt + text). What if everyone's plaintext password was already a string of characters identical to some MD5(text)? If bcrypt(salt + MD5(text)) could be bad, then doesn't that mean bcrypt(salt + text) could be bad too?

lmkg · on June 8, 2012

If you compose hash functions, you get the union of possible collisions.

Let's say that "foo" and "bar" are two distinct passwords that have the same MD5 hash. Then bcrypt(md5("foo")) == bcrypt(md5("bar")), regardless of how bcrypt("foo") compares to bcrypt("bar"). By pre-hashing with MD5, you have added possible collisions that weren't there previously, and those collisions remain regardless of how many more hashes you pile on top.

chc · on June 8, 2012

We're not pre-hashing with MD5. The MD5 was already there. It's the only source text we have. The proper comparison here isn't MD5+bcrypt vs. just bcrypt — it's MD5+bcrypt vs. just MD5. So any collisions that MD5 causes are immaterial — they'd be there either way.

It seems to me that the most obvious problem is that you get two chances at colliding — once with MD5 and once with bcrypt. But bcrypt is not known to be especially vulnerable to collision attacks, so this setup is probably not noticeably worse than MD5 alone. But that's just looking at probabilities — I ain't no fancy crypto expert or nothin', so there might be much more subtle vulnerabilities than the added chance of collision.

lmkg · on June 8, 2012

> It seems to me that the most obvious problem is that you get two chances at colliding

Yeah, that's all I'm saying. I was answering a question about being "fundamental worse," and fundamentally, there are now two sources of potential collisions instead of one. In theory, that's twice as insecure! However, the practical effect is unlikely to rise above absolute nil anytime soon.

tedunangst · on June 8, 2012

Let's say that "foo" and "bar" are two distinct passwords that have the same MD5 hash.

As a practical matter, we can basically say that never happens. Certainly not for passwords that are user selected and not designed to collide. And since the hash itself is hidden by bcrypt, the attacker won't know md5("foo") even if they were inclined to find a "bar" with the same hash.

tptacek · on June 8, 2012

Can you cite a single example Of a pair of password-length strings that could be entered on a standard keyboard that collide in MD5?

Cryptographic pseudorandom number generators also collide and produce cycles. MD5 hashes of arbitrary ASCII strings are reasonably modeled as random numbers, and the concern you cite is unmeaningful.

pbhjpbhj · on June 9, 2012

That seems like a big ask, these are the closest I could find - http://www2.mat.dtu.dk/people/S.Thomsen/wangmd5/samples.html.

An upper limit on string length suddenly makes some sense.

tptacek · on June 10, 2012

Huh? Because some unsuspecting user might choose a password so long and so random that it turns out to be collidable with one other string?

pbhjpbhj · on June 10, 2012

I did say "some" sense. I was thinking choosing a string to collide and enter an account without it being apparent that you entered the account with anything other than the correct password. Alright the chances of someone wanting to do that seem slim but I can see someone saying in a meeting "if we leave the passphrase open at the max length side then people could enter a string with a matching hash".

abscondment · on June 8, 2012

I'm very open to correction, but won't the increased time complexity of bcrypt also mitigate this kind of attack?

pilsetnieks · on June 8, 2012

To nitpick a bit, it's 16 bytes, usually represented by 32 hexadecimal numbers.

tptacek · on June 8, 2012

The hexadecimal is just UI; ignore it.

16s · on June 8, 2012

Amen to that. It could be b64 encoded or whatever. Focus on the raw bytes. They are truth.

cheatercheater · on June 10, 2012

Red light coming up in the back of my head. Wouldn't the MD5->scrypt pipeline expose new attacks that scrypt doesn't have? Maybe there's a higher collision probability or some known-text attack, but I'm really shooting in the dark here.

tptacek · on June 10, 2012

cheatercheater · on June 11, 2012

That's one person whose "no" I'll accept without further explanations. Thanks for clarifying.