I don't think it's demonstrating anything other than:
1) many developers don't use salts / HMAC
2) MD5 is popular
3) hashed passwords end up on Google
From these 3 points I don't think it follows that MD5 is horrible.
Any hashing function would have the same issues, simply because a hashing function is a mathematical function, so for any X from the domain of definition, H(X) will always have the same value, on every call. Therefore, if a hashing method is popular enough and developers don't salt their hashes, then hashes for common passwords will inevitably end up on Google. However, if you salt your hash calls with your own key, the hashes produced will be different from everybody else's.
Think of (3) as saying without salting, with Google and the internet, the precomputed tables that are used for brute-force attacks are already out there and can be queried with a simple Ruby program.
Perhaps I should have written it as "unsalted MD5" instead of "plain MD5" to avoid confusion. Unsalted MD5, in my opinion, is horrible. MD5 plays it's part in the mess: it's quick to calculate, which means that anybody can churn out huge lookup databases. Missing salts make those databases universally usable.
The salt does not matter. Neither does the specific hash; you'd be just as boned using SHA256. All cryptographic hash functions are designed to be fast.
The vulnerability is "not using a password hash construction", of which the best known are bcrypt and PBKDF2.
Hate to be pedantic, and I'm sure you already know this, but you do realize that PBKDF2 uses SHA in HMAC mode right? There's nothing inherently slow about that, it's the repeated iterations of hashing.
You could slap MD5 into PBKDF2 with a high iteration count and achieve comparable security. The problem is that devs often use a hash function a single time.
Fair enough, I suppose I didn't read your original comment closely enough. I appreciate your essays on the topic, but I fear that too many people have, without a full understanding, drawn the conclusion from them that SHA = bad, bcrypt/pbkdf2 = good, without fully realizing that SHA is an integral part of pbkdf2.
Covering the whole solution space would be an enormous (as compared to just huge) undertaking. But if you limit yourself to the types of passwords people usually use, you can prune it down quite a bit.
The currently available databases are surprisingly large, even if not in the scale of 62^8. Consider a random pick from decrypt.fr: "phytostrote972". It's far from a random string, but not exactly a trivial one either.
This is where password management gets ridiculous, because you'll find a lot of registration forms limit the length of your password input to, say, 12 or 16 characters.
Why? Is it not being hashed? I have a (possibly very wrong) inkling that a longer phrase might increase the chance of collision but even so, so many places enforce a strong password but force you to keep it short.
The worst offender is NVidia. They have half-a-dozen different developer logins for different bits of their site - and they all have different rules = your CUDA one must have a symbol but the parralel Nsight one must not etc
If using your own salt is all you do, you'd still be susceptible to easy statistical attacks once someone compromises your database. But, yes, they probably wouldn't be available on google.
If using MD5 is all you do, you'd still be susceptible to brute force attacks. MD5 is a really fast hash to compute, salting or not.
The solution is to pick a better algorithm and learn how to use it securely. That probably won't happen unless all the ridiculous PHP 'security' tutorials are erased from the history of the internet and only correct methods are shown.
A standard assumption in cryptography, known as Kerckhoff's principle (http://en.wikipedia.org/wiki/Kerckhoffs%27_principle), is: "A cryptosystem should be secure even if everything about the system, except the key, is public knowledge." or "Don't rely on security by obscurity.".
No it is not, because the salt is stored in a plaintext or easily reversible format somewhere, and has to be in order for authentication to work, if you are in a position to grab the hashes, it will also be trivial to grab the salts. The point of the salt is not to add some padding bits to the password. Salts should be publicly knowable without it causing a loss of security.
The one and _only_ use of a salt is to prevent precomputation attacks (rainbow tables).
But isn't the SSH private key also stored in plain text?
I mean, yeah, you could also have a password for that key, but then most people use ssh-agent because typing that key every single time is annoying, which means the password is somewhere in memory. Or you could just install a keylogger on it and wait for the user to login.
If the user's computer is compromised, a hacker could gain access to his SSH credentials. Isn't that still security by obscurity?
And if you gain access to the source-code and to the database, I'm pretty sure you'll end up in a position to modify that source-code anyway, thus finding user passwords by simply logging them somewhere.
Security is a complex topic and relying on the slowness of an algorithm like bcrypt doesn't make me feel any safer, as people can always come up with a faster bcrypt. Instead we should rely on computational complexity, because no matter what we do, unless quantum computers become a realy, there are limits to what we can compute when exponential complexity is involved.
> Instead we should rely on computational complexity, because no matter what we do, unless quantum computers become a realy, there are limits to what we can compute when exponential complexity is involved.
Quantum computing does not help with exponential problems in general.
">But isn't the SSH private key also stored in plain text?
no, passwording the key encrypts it
>I mean, yeah, you could also have a password for that key, but then most people use ssh-agent because typing that key every single time is annoying, which means the password is somewhere in memory. Or you could just install a keylogger on it and wait for the user to login.
ssh-agent does not expose the private key to clients requesting it, thats part of its design, you can however get a login session to whatever hosts it holds keys for.
> Isn't that still security by obscurity?
That is much more involved than a hit and run attack where you download the DB, and much more likely to be detected/detectable before any harm is done, via such things as IDS, or just plain not possible due to how a system is locked down (stuff like ssh gateways that are heavily secured, etc).
>Instead we should rely on computational complexity ... there are limits to what we can compute when exponential complexity is involved.
the point of bcrypt isnt JUST that it is slow, its that each step requires the data from the previous step, many thousands of times over, that makes it impossible to parallelize across many GPU cores or similar arangements, thats a huge part of the weakness of stuff like plain hashing/salting, its trivial to parallelize and to scale up that parellelization till you are generating billions, or even trillions of hashes a second, that approch is totally useless on bcrypt."
It's somewhat optimistic to assume that the adversary knows no passwords - even if you can't create a user, phishing one isn't that hard. That makes recovering the salt much easier.
Well yeah, but then you're proposing a brute force against the salt and the salt can be anything of any length.
So if you're assuming a salt that can contain standard latin letters and digits and is 256 in length, that's 62^256 possible variations.
Do note that I'm not saying here that MD5_HMAC doesn't have flaws, but it definitely doesn't have the same flaws as MD5, cracking it ain't easy and I can't find a reference for an instance in which this was actually done.
Well, I thought it was assumed :) ... since salting a hash has inherent vulnerabilities if you're not doing it right and HMAC is a sane way to do it.
I really don't get this argument that since MD5 is fast to compute, that's the reason it is vulnerable. That's not true, MD5 is vulnerable for other reasons, like collision attacks are possible, preimage vulnerabilities were demonstrated, huge rainbow tables are precalculated and so on and so forth.
However, HMAC_MD5 does not have the same vulnerabilities and increasing the size of the salt increases the time of a brute force attack exponentially. 62^256 is a freakishly big number and it doesn't matter much if you divide the work by 100,000 computation units. But if that makes you feel uncomfortable, you can always increase the salt to 1000 chars.
And I simply don't buy that we have enough computation power in this world to brute force something with 62^1000 computational complexity.
Phishing the password from one user and recovering the salt shouldn't be useful in the first place. The parent example was only meant to show how difficult it is to recover a salt even with multiple examples of its use, not to give a real life example of password hash use. (Which was my point)
That said, I don't know how you would obtain a list of hashed passwords without also getting the associated list of salts (wouldn't they be in the same database?), so it is kind of a moot point. The different salts are intended to prevent against the ability to have a single rainbow table to crack every password in the database.
Instead, you need a table for each salt, which means that you basically have to brute force the entire database. This still doesn't really help if you are using something fast like MD5, as brute force solution will be possible with that algorithm. Which is why you want something reasonably slow.
Having the exact salt in the same database as the user data defeats the purpose of the salt.
Normally you have a global salt, somewhere in your source-code, which you combine with the per-user generated salt. It also doesn't have to be something obvious in the database (like a column named user_salt :)), you could just use something like HMAC_MD5(global_salt, email + username + joined_date) for each user.
Of course, this may seem like security by obscurity, but even in the case of SSH you keep your private key safe and as a business if you have both your database and your source-code compromised, you're fucked anyway.
The purpose of the salt is to defeat time/space tradeoff attacks by inflating the required space to the point of impracticality. ie. 20 bits of salt will increase the size of the rainbow table required a million times.
Mostly, there will be a column conveniently labeled 'salt'. And in e.g. a MySQL database, you can bet the native hashing format has been used.
Ignoring that, if someone got your database, you should assume they got your code. If you care about passwords not being lost, you use bcrypt or something similar.
If I have your hashes, I probably have your salts too. (Unless you use a single salt, in which case, I can easily get your salt if I've already got a username/password for your site.)
I don't believe you understand what the salt is for.
You use a salt so that time/space tradeoff attacks are no longer viable. All you need to do is provide a few bits (say 20) of entropy to make them infeasable.
I see that link referenced a lot and don't think that's a good thing. He's right, but he doesn't explain why we should use bcrypt (or any other adaptive password hashing function). Picking bcrypt without knowing why is just as bad as picking MD5 without knowing why.
No, it's really not, as long as you follow current guidance on counts for iteration. There are people smarter than <x> random person and sometimes (often with crypto) it's better to follow their advice. Anyone can make a system that they themselves cannot break. Don't be that person.
I strongly disagree. If you are the person tasked with implementing password verification, then you really should learn enough about the field to give a rough explanation of why you've used bcrypt (or whatever you use). If you know what MD5 (and other simple hashes)'s deficiency is (and Hale explains it), then you should know how bcrypt, etc. solve that problem. One should follow expert advice but not blindly.
For those who don't want to read the source code, BozoCrack has a simple algorithm. It googles the MD5 hash and hopes the first result page contains the plaintext password. It usually does.
"It usually does" isn't quite accurate here. Common or weak plaintexts might work, but for the vast majority of input you're SOL. Sure "nicetry" comes back, but "nicetry99" produces 0 results and for every "nicetry" there are an infinite number of "nicetry"+i hashes.
Ah, you are technically incorrect in saying there are an infinite number of "nicetry"+i hashes. There are an infinite number of "nicetry"+i passwords, but eventually there will be collisions as the hash set stays a constant size and the password set grows without bound. "Infinite" isn't a term to throw around too lightly.
The point is that it's unnecessary to find the plaintext; all you need is some plaintext that produces the same hash value. It doesn't matter if your actual password is "zipobibrok5x10^8" when "fordprefect" also gets you into the system. (That, of course, only applies to a single system -- or to a cluster of systems all using something like an unsalted MD5. It would matter if you're trying to leverage a password found on a cat fanciers' site to empty someone's bank account.)
I actually think you're missing the point here. While it is true that an infinite number of strings correspond with each md5 hash, the question was about trying to actually find a match. With a suitably large hash, say 256 bits, it becomes physically impossible to even count that high, let alone compute that many test hashes. A problem that is too large to evaluate is effectively infinite.
(Yes, md5 is 128 bits and might be possible if an entire country dedicated itself to the effort. Or an attack on its flaws could be used. But both these points are tangential to themouth's use of infinite.)
With the technique under discussion (using Google to search for the MD5 hash), it doesn't really matter what the computational cost of finding a plaintext for the every possible hash value is -- you're not brute forcing a collision, you're doing a search using someone else's enormous resources. That's always going to be O(1) from your point of view (with a lot of overhead, of course).
In my experience, it is accurate. Google doesn't know "nicetry99", but it does know "octopus99", "octopus98", "octopus97", "hacker99", "hacker98", "hacker97" etc.
Anecdotal evidence might as well be no evidence. I'm not saying that you don't have a unique or clever idea, but it should be judged on its merits. This tool will be great for finding common passwords and variants where a system uses unsalted MD5 hashes; but it relies on known hashes, so for the vast majority of possible input strings that have no reason to be generated or indexed by Google it's ineffective.
We inherited a system, without inheriting the administrator passwords, that we had to work on. It was a spaghetti mess, so creating a new account didn't seem obvious, and I wasn't sure how the passwords were hashed, but they did seem md5-like to me, so I googled one.
Turns out, 90% of them, including most of the admin passwords, were just four numeric characters, like 9678.
When people rush in to defend salted MD5/etc, they aren't actually doing it because they objectively think it's ok. They're doing it because presumably at some point in the past they have done it or allowed it to be done, they're just defending themselves. Unfortunate, since this is meant to be constructive criticism.
I would use bcrypt but it's not available as a encryption option for the realms in Apache Tomcat.
One possibility would be to write an own realm but it's not that easy. Plus additional work is needed to update existing hashes, currently I therefore use salted MD5-hashes.
Integrate your email systems with Google mail or MS mail. You'll quickly find that they do not accept bcrypt. Plain md5 or plain sha1 is all they support (at least that was the case two years ago). When you are forced to inter-operate with the big guys, you'll find not many actual use bcrypt.
Unfortunately, the same sites that are naive enough to use MD5 for cryptographic hashing are also likely the same sites naive enough to use oversimplified regexes that fail to validate all possible inputs.
(If I had a dollar for every time the 'emailaddress+foo@gmail.com' failed to validate....)
Ultimately, data integrity is the same thing as data security. If you cannot trust your data not be detectably corrupt in the face of a malicious collision attack, you do not have data integrity. Collided data can be used to cause a DoS, to overrun buffers, any number of nasty things that arbitrary user data can cause when trusted implicitly.
You are making assumptions that are unwarranted. There are other uses for MD5 besides data integrity and data security. For instance, I generate an MD5 hash of a user's email address and use that hash for Gravatar. While someone who knows quite a bit about MD5 and the other person's email address, can force a collision ... all that would get him would be the other person's avatar image ... which is public anyway. In other words, using MD5 is perfectly fine in situations where collisions don't pose a serious problem.
Microsoft Active Directory Server systems use MD4 in their central distributed credential database (ntds.dit). Yes, that's a 4, not a 5... MD4. This is the strongest storage hash that one can use in Active Directory.
Shock and horror when geeks meet the real-world. Yes, I know.
Your post feels rather optimistic to me. Not only do a lot of people still use md5, I'd argue a sizable number of sites still store passwords in plain text.
Yep. I find it fascinating why plain unsalted md5 hashes are as common as they are. Developers go through the trouble of hashing, but don't go the single necessary step further.
Sure. I guess there's not a whole lot of excuses to avoid bcrypt these days.
Bozo's idea was to show that unsalted MD5 is, for most passwords, as bad as no encryption at all. An attack doesn't get much easier than a lookup table.
To the other posts saying how MD5 is not bad and/or it's stupid to use MD5 without salt (or whatever):
See this script more as a fun little hack rather than a "Formal proof". Thanks aparadja for sharing. That being said, using MD5 without salt is asking for trouble. I mean, I know security is usually just a time vs $ vs quality problem, but it costs almost nothing more to add a salt in front of the password. Why not do it?
Thanks for the enlightenment. I have to agree that once you have access to the db, salt or not doesn't really matter security wise. But at the end, it really boils down to: Do you have a strong password? If yes, even md5 is "somewhat" safe. But to be fair, I don't know how big the rainbow tables are.. and could become.
For instance, I've got lots of password with over 20 characters mixed with upper/lower cases with a few symbols in it. (i.e. Seems hard but: phzb0xis@mynickname.on.hackernews.com is already pretty long and hard to crack but still easy for me to type/remember), btw that is not my password ;) I wonder if I encrypt that with MD5.. Could it be contained in a rainbow table somewhere?
Your search - 822d6c6e12b26dd30161967354aa4302 - did not match any documents.
A question on the practicality of expensive compute time for password hashes:
If somebody got read-level access to your password hashes, it follows (based purely on the assumption that any app with the rights to read the hash will probably have the right to change it when applicable) that one could simply overwrite the password hash with a new one that is already known, gain unauthorized access, and change the hash back to prevent the user from finding out. Unless you really need that password to crack multiple accounts that might be reusing it, it seems unnecessary. (And honestly I wouldn't care to implement these special password hashes just to protect extraneous accounts of my customers which I don't control)
edit I should clarify that while I agree that expensive compute time for password hashes helps prevent the ultimate compromise of a user's password, I find it a much more worrisome prospect that somebody got access to the database in the first place. To me, a person's password strength is almost irrelevant compared to the importance of preventing brute-force attacks on a login API or ensuring the integrity of the password database and db apps.
That being said, for those wishing to implement a bcrypt-type hash:
Strong password hashes don't protect your users from the consequences of security flaws in your own application.
They protect the (majority of) users that re-use passwords on multiple services, and, more importantly, they protect you from the PR shitstorm of having dumps of cracked passwords posted to Pastebin after a compromise.
security is about defense in depth, and doing as much as is feasible to mitigate damage. Proper password hashing wont protect you from a break in on a primary system, but it could very easily prevent a break in on secondary ones.
Also, a note to anyone reading the above post, that is not how bcrypt works and is incredibly insecure.
What is not how bcrypt works? The example that I gave? That comes from the 'crypt' man page explaining how to use blowfish encryption patched into libcrypt?
Because that way, anyone sniffing or monitoring your traffic doesn't even need to crack the systems to steal the hash, and so you'll be saving everyone some time?
Edit:
I'm trying to illustrate the general principle, that you shouldn't take any action thats visible outside your secure perimeter, that depends on knowledge of your password.
What you define as 'outside the perimeter' depends. In the case of your corporate systems, its probably everything outside the corporate network. In the case of your gmail password, its everything outside of [your computer, the SSL connection to google's auth servers, and those servers].
You shouldn't ever leak any information outside that perimeter, that reveals knowledge of your password.
Its generally pretty hard to steal the password hash; if you start revealing what your password hash is to someone doing passive analysis, you compromise a lot.
If its worth thinking about poisoning hashes to protect, then don't try and poison the hashes!
I don't understand. This is pretty much the definition of a dictionary attack. The dictionary is just stored and accessed through Google rather than packaged with the program.
I think this is exactly why this method is so effective. Google's index all of those sites.
The first result when I searched the hash for superman (84d961568a65073a3bcf0eb216b2a576) was a link to a page titled literally
"Google Hash: md5(superman) = 84d961568a65073a3bcf0eb216b2a576", the page is hosted at(http://www.nth-dimension.org.uk/utils/ghash.php), basically someone's gone through the trouble of making a rainbow table that's easily crawlable that makes this method of lookup via Google even easier.
The page has a description:
> Google Hash is a PoC implementation of an hash search engine using Google.
> Unlike other implementations, the aim here is to get Google to store the
> word and associated hash. We do this by putting them into the title where it
> will always be stored by Google's spider....
The next top hit is md5rainbow.com etc. etc.
I would guess that most of the positive results from BozoCrack.rb are thanks these sites.
Yes, as long as it's designed for speed, it's not a good fit for storing passwords. That's why people should use bcrypt or an equivalent. It has both built-in support for salting & a work factor.
Any hashing function would have the same issues, simply because a hashing function is a mathematical function, so for any X from the domain of definition, H(X) will always have the same value, on every call. Therefore, if a hashing method is popular enough and developers don't salt their hashes, then hashes for common passwords will inevitably end up on Google. However, if you salt your hash calls with your own key, the hashes produced will be different from everybody else's.