What do you mean by a password that can't be reasonably brute forced?
EDIT: To clarify, I mean specifically with md5. I'm by no means an expert, just curious because I had considered md5 so broken that this comment caught my attention.
Rumours of MD5's death have been greatly exaggerated.
MD5's weakness is that it's (relatively) easy to produce two strings which have the same hash. However, given an MD5 hash, it's not easy to produce a string which also has that hash.
In principle, one could intentionally construct two passwords which have the same hash. It's hard to see how that could be exploited maliciously - any attacker knows both passwords to begin with. Even then, making colliding strings that would make acceptable passwords hasn't been done yet, AFAIK: the shortest colliding strings found so far are 64 bytes long and contain several unprintable characters.
OTOH, computers are fast enough now that brute-forcing MD5 is practical for short strings with a limited set of characters, which is what passwords tend to be. One should use algorithms like PBKDF2, scrypt, and bcrypt which can increase their complexity as the computation capacity of potential attackers increases. This isn't because of a particular weakness in MD5, though, and one should equally avoid storing passwords as SHA-512 hashes, say.
The thing you definitely shouldn't use MD5 for is digitally signing a file you didn't make, because it's possible that whoever did make it also made another file with the same MD5 hash, for which your signature would also be valid.
On a side note: You can use such crafted strings as a black box testing tool to verify if a site does infact use md5 or other weak algorithms to store the passwords. This can perhaps be used in conjunction with other factors to craft an attack.
As a corrollary this can also be used as a testing tool by anyone for any third party site to determine known vulenrablities in their password storage
A preimage attack for MD5 has complexity of about 2^123. So, even if you get the MD5 hash for a password, it will be exceedingly hard to find a password that has the same hash (assuming the original password is long and random).
If you account for speed increases over the last 10 years and assume the password thief has access to a botnet, then it wouldn't surprise me if they've found collisions for the entire list.
Edit: Nevermind, the link finds two strings that hash to the same thing; it does not find a string that hashes to an existing hash.
The collision generator behind that link does not implement a preimage attack (given a string X, come up with another string Y with the same MD5 hash).
Instead, it implements the much easier collision attack (come up with two strings that have the same MD5 hash).
I thought the whole point of the MD5 vulnerability was that the limit was 2^128 and as such there are more inputs that possible output hashes, meaning more possible input collisions.
All hash functions have collisions. The point is that a good cryptographic hash function makes it very hard to find collisions.
The “preimage attack” on a cryptographic hash function tries to find a message that has a specific hash value. That is, you lock down a hash value (the MD5 hash for a password) and try to find a message that hashes to that value (the original password, or any other input that happens to have the same hash).
The best known preimage attack against MD5 has complexity 2^123. It's better than brute forcing, but still unpractical. Thus, if I come up with a good password that is long and random, you will have a very hard time coming up with a string that has the same MD5 hash value.
The practical attacks against MD5 are collision attacks. A collision attack tries to find two messages with the same hash value. With MD5 in particular, there's a chosen prefix collision attack, where you choose two messages and append to them so that the hashes will match. This was particularly devastating with X.509 signatures and certificates, where the attacker could have the MD5 hash signed by a certificate authority, and then use the same signature with their other message that has the same MD5 hash.
Instead of computing the MD5 of a huge number of passwords looking for a match, you simply store the precomputed password and hash pairs in a database table.
A rainbow table is just a precomputed table of hashes for a lot of passwords. Some tricks are used to make the table smaller, but you can think of it as just a lookup table. Only the passwords that were precomputed and put into the table will be found.
Rainbow tables are usually computed for short passwords (1-10 characters) and limited character set (say, alphanumerics). They are good for finding the bad passwords if you get your hands on a set of MD5 hashed passwords. But they are of no help if you need to reverse a good, long, random password.
Every hash has a finite output length, and therefore a finite number of possible outputs. 2^128 is a very large finite number. It's not that large in the grand scheme of things (there are over 2^260 or so atoms in the universe), and it's definitely better to use a hash with 2^256 outputs now that there exist good 256-bit hashes that are faster than MD5, but 2^128 is still quite a large number. The internets are quoting me about 10 billion hashes per second on a good GPU from a few years ago, which comes out to about one sextillion years to find an input for every possible output. (It divides linearly if you have more GPUs, but that clearly won't help very much.)
What's broken about MD5 is that, due to an algorithmic flaw, it's very easy to generate two inputs of your choice that have a matching output. That's great if you want to do things like spoof an SSL certificate (you generate two certificate signing requests, get one of them signed, apply the signature to the other), but not directly helpful for attacking a password hash where someone else chose the password.
What is conceptually broken is that such an algorithmic flaw exists, and also due to algorithmic flaws it takes a bit under 2^128 tries to find an input for a specific possible output. That worries mathematicians, because it's a sign the hash isn't behaving as randomly (speaking informally) as one would hope, and that people are starting to understand its structure. If that understanding continues, it might be broken more in the future, so you absolutely shouldn't build new systems on MD5 because we expect the research to happen at some point.
But, at least today, it's still true that you can have a password that can't be brute-forced despite the use of MD5. Maybe someone will present a paper tomorrow that disproves that.
All hashing algorithms that I am aware of have more inputs than outputs. By the pigeon hole principle, there will always be collisions. MD5 is weak, but it still isn't trivial to find an input that hashes to the same thing as a high entropy password.
> that hashes to the same thing as a high entropy password.
To be clear, it's not the entropy of the original password that matters, except for the fact that all common low-entropy passwords already have their MD5s stored in public databases. (What hashes to 5f4dcc3b5aa765d61d8327deb882cf99? You can look it up with Google.)
You can come up with two plaintexts that hash to the same thing in MD5. You can't come up with something that hashes to a new MD5 value given to you, aside from finding it in one of those databases.
If it's a password so long and complex it wouldn't be in any rainbow table computable in reasonable time. While MD5 can be computed quickly, there is still a limit to how many you can compute -- and there are an infinite number of possible passwords if they aren't length limited.
Interestingly even if the password has infinite length, an MD5 hash has a fixed finite length. You can think of it as a glorified modulus operator, beyond some point the longer passwords will have hashes that match shorter ones.
True -- but assuming these passwords aren't stored the same (very, very wrong) way on another site, and they're no longer useful on Yahoo, what's important is finding the real password, not just a password that happens to match the given hash.
Rainbow tables are attacks against secure algorithms.
MD5 is recognised as an insecure algorithm: given a known hash, there are multiple possible passwords that would resolve to the same hash, therefore appearing to be the correct password.
With MD5, it's not necessary to compute an infinite number of possible passwords, and it is possible that, given a particular hash, a collision can be found within a reasonable time.
Either a) you don't have a clue about the complexity involved in finding a collision for a specific hash or b) your definition of "reasonable time" is longer than the age of the universe and/or using 100 trillion state of the art GPUs is realistic.
I'm leaning towards option a, you read a blog post once and think you're an expert on cryptography now.
> the complexity involved in finding a collision for a specific hash
If it can be shown that a preimage collision can be computed in less time than an exhaustive search, the algorithm is generally regarded as having a weakness, even if the given "less time" is still a very very long time.
The theoretical complexity of MD5 is 2^128, but a preimage attack was discovered in 2009 which showed that a collision can be found in 2^123.4. [1]
Collision attacks against MD5 have become more practical, there are even frameworks for it [2]. The complexity of 2^123.4 still makes a preimage attack against MD5 computionally unfeasible, but given that it's been shown to be weaker than its theorerical 2^128, it's possible that MD5 has other weaknesses which would allow the complexity to be reduced to a level that is computationally feasible.
To be fair, pretty much every MD5 discussion I've ever seen or been involved in (including with "security expert" former coworkers) has had someone making the same claim.
What you're describing is the same for every having algorithm in existence. All hashes can represent multiple (indeed, infinite) passwords. So they all have collisions. This is because all hashes are fixed-length, and so finite, while the possible inputs are infinite.
This isn't the reason that MD5 is weaker than other algorithms.
the MD5 algorithm can be broken using various techniques like collisions, unsalted I believe means that their database would accept the hashes the third party has. End result is they should have migrated away from MD5 after it was declared unsafe.
1. If your password is very very good (a Diceware password would suffice), then any method of storing passwords that is better than storing them in plaintext will stop someone from brute forcing it.
2. If your password is very bad, then even an excellent password hashing algorithm will not save you.
"Just use bcrypt" is meant to save people who are in the middle.
No, a collision attack would not give you the plaintext from a hash. A first preimage attack would do that, but no computable first (or second) preimage attacks against md5 have been found.
EDIT: To clarify, I mean specifically with md5. I'm by no means an expert, just curious because I had considered md5 so broken that this comment caught my attention.