The comment thread on Reddit, which the article references, is definitely worth reading. There's some good discussion about the difficulties involved in upgrading people's hashes to use a more secure system (eg: http://www.reddit.com/r/WTF/comments/f96w7/amazon_security_f...)
Is there a good way to update passwords to a new encryption scheme? the article tries to ding amazon for failing to do this, but I can't think of a way to reinforce the passwords without announcing that there's a flaw in the implementation. Is there a standard way around this?
The standard way I've seen is to do this when the user logs in: if in checking their password you notice they're using the old hashing scheme, you use the plaintext password provided by the login process to generate a hash using the new scheme.
For example, lets say you're a PHP function that takes in $username and $password (both plaintext). You're "upgrading" from unsalted MD5 to unsalted SHA1 (NOTE: DO NOT DO THIS IN REAL LIFE. READ http://codahale.com/how-to-safely-store-a-password/ AND UNDERSTAND WHY UNSALTED MD5 / SHA1 SHOULD NOT BE USED FOR PASSWORD SECURITY). Your code to upgrade the hashes would look something like this:
if (md5($password) === $cur_user['password'])
$cur_user['password'] = sha1($password);
Now, you may have realized that there's a flaw here. Since the server uses the hash for verification, it can't be 100% sure that the plaintext password is actually correct. If there's a collision (two plaintexts hashing to the same value), you could inadvertently store the wrong password in your new hash.
In this case, the problem is that passwords longer than eight characters are silently truncated. So, if my password is something like supersecretpassword, I could also enter supersec and have it accepted (since they hash to the same value). That would be a problem for Amazon, since someone could make a typo in their password and have the wrong value stored in the new hash.
>Now, you may have realized that there's a flaw here. ...
Not really. It's more of a fundamental property of hashing functions, and it's technically true of any hashing function, and collisions are nearly just a technical possibility unless you're using an incredibly-small hash output. Though that would likely be preferable to truncating before hashing - fewer practical collisions and safer, as reversing the hash is less useful due to more collisions actually existing in breakably-short password lengths.
You're right. The reason I pointed it out as a flaw is because it has major consequences in this case. It's true for every hash function, but highly unlikely to happen in most cases.
I think he was thinking of truncation by crypt, so that even passwords that are completely different (bar the first 8 chars, bar collisions) would match.
I think the point is, the user's password could potentially be set to something other than the original if they happen to type it "close enough". That would cause major confusion going forward.
1. They could force everybody with an old password to reset it. This would announce there was a flaw, but shouldn't be any riskier than having a "forgot password" link.
2. Silently upgrade passwords as users log in. This is risky, because many users might not log in for months.
3. Use their clusters to reverse weak password digests. This would be mostly transparent to customers, but runs a risk of accidentally locking out some customers. For example, if "bar" and "foo" both crypt() to "abcd123", and the user has password "foo", the recover-and-redigest method might change the password to "bar".
4. Wrap old passwords in a new digest -- presumably they have some way to determine which algo is used per-user. They could treat the output of crypt() as the input to a more secure digest, like scrypt, and then just store "crypt-plus-scrypt" in their database as the digest type.
They can't do #4, as they don't have the old password. They'd either end up having to reverse the hash (with the same problems as #3), in which case why keep the crypt() hash? Or they could use the crypt() output as the input to the new hash, which wouldn't change the set of passwords accepted, and therefore wouldn't help at all.
Wrapping the output of crypt() should reduce how many passwords would be compromised if an attacker gained access to the password database. That's the whole point of using digested passwords instead of plaintext.
If done deliberately, could this be a useful feature so that users can typo a password (say levenshtein distance <= 1 or 2) and still login? Obviously the major downside is that you would need a longer password to get the same level of security and it could be difficult to implement especially since the password should be hashed. Is this feasible? I'm guessing the answer is no, but was wondering what other people thought.
Only if you want to store all the passwords in reversible encryption* or do some crazy scheme which you check every possible combination of hashes within the distance of the current password.
* If I find out you've been storing passwords in plain text I will hunt you down and slap you. :)
I don't think checking every hash within one of the password is going to cost much, worst case your checking less than 1,000 passwords. However, I would much rather a system that checked 2 passwords one with and one without the caps lock on. AKA 1Password and 1pASSWORD.
"I don't think checking every hash within one of the password is going to cost much, worst case your checking less than 1,000 passwords."
If that's the case, you're doing it wrong (it being password hashing). Because if you can hash 1000 passwords fairly quickly, that's the lower bound of what a dedicated attacker can do. ;)
Might improve user experience a bit, but your password hashing scheme needs to take much longer if an attacker knows he can rule out lower-case letters.
For example, 8-character alphanumeric passwords:
((26*2)+10)^8 / (26+10)^8 = 77.4
You have 77 times fewer passwords of length 8 (probably worse than that, most people skew more towards letters than numbers), so it should take 77 times longer to test one.
My guess is the average Amazon user isn't going over 8 characters and aren't using multi-case passwords to begin with. While stronger passwords would be ideal, most non-savy people are still going with "password", "letmein" and "123456" which aren't secure under any hashing schema.
What exactly does this hurt? It cuts down on customer-support traffic, makes life easier for the user, and (as far as I can imagine at least) doesn't make anyone any more vulnerable. Under a fuzzy comparison scheme, weak passwords are still weak, and strong passwords will still be strong.
A strong password that gets a lot of its strength from length and mixed case will not still be strong under this error.
For example, the password fOoBaRbAzbArBaZFoO would be a pretty strong password. foobarba is much weaker, and this error would make foobarba (and foobarbaz, etc) work.
A password of that sort is both not that hard to remember and probably fine on a post-it not on the monitor of your home computer. There's really not much overlap between the sorts of people who will break passwords and who will be in a position to read things on your monitor.
(and if they can, some browser password auto-save feature is going to screw you anyway so it doesn't matter)
I couldn't replicate the article's results for the 8 character cutoff, but I did verify my password is case insensitive. For the record, I registered in 1996 or so, and probably haven't changed my password since.