Reminds me of a famous prisoner-puzzle. Perhaps the solution will be clearer if you have the concepts of the article fresh when looking at it.
----
The prison warden brings a hundred prisoners to an empty room and hands each of them one of a hundred cards labeled from 1 to 100, and says:
"In the next room there are 100 drawers. Into each drawer has been placed one of a hundred cards labeled from 1 to 100."
"One by one we will let you into the next room. Each of you will be given fifty chances to look inside one of those drawers, and then you will continue into the final waiting room."
"If every single one of you is able to find the card in the drawers that matches the one you have been given, you will all be released from prison today."
"One more rule - if any of you, after entering the room with the drawers, communicates any information to the remaining participants, you all fail."
The prisoners talk among themselves, discussing strategies for opening the drawers that might let them all find their number, and be released today.
They know that the chance for one prisoner to find their card, choosing randomly, is 50/100, or 1/2. The problem is that they all need to find their card, and if they all choose randomly then the chance they all succeed is (1/2)^100 - or (7.9 * 10^-29)%
One of the prisoners, having just read about how rainbow tables work, realised there was a strategy that would give them a greater than 30% chance to all succeed.
What was their strategy?
----
It's trivial to find the solution to this problem if you've never seen it before, search for the 100 prisoners problem. I don't actually know if reading about ranbow tables makes it easier to reason the solution, but there seemed a strong similarity so hopefully someone can figure it out!
I feel like I'm missing something fundamental in this paragraph:
> Rainbow tables differ in that they don't use multiple tables with different reduction functions, they only use one table. However in Rainbow Tables a different reduction function is used for each column. This way different tables with different reduction functions aren't needed, because different reduction functions are used within the same table.
What exactly is the structure of the final rainbow table - does it contain a column for each reduction function, or does it still only contain "start text" and "last hash" of each chain? I would loved to have seen a diagram of the table structure here. (From my own reading, I think these "extra columns" are not stored anywhere).
You only store the start text and the last hash for each column. <intermediate> consists of a mix of passwords ("text ") and hashes. To see how they are computed, let us narrow our attention to a single column.
text1
<hash-1> computed using H(text1)
<text-2> computed using R_1(hash-1)
<hash-2> computed using H(text-2)
<text-3> computed using R_2(hash-2)
last-hash computed using H(text-3)
Each column will look like that. I'm using <stuff> to denote things that are not stored. H is the hash function you are creating the rainbow table for. R_1, R_2, etc. are the reduction functions. Each column uses the same hash function and reduction functions but uses a different starting text.
Note that in a rainbow table that consists of k columns, there is no need to recompute hash/reduction functions for each chain. Instead, the attacker computes R_last(target-digest), and checks that against all the endpoints (last hash) of all column. If it matches any endpoint, then that chain likely has the corresponding password. Otherwise, compute R_last(H(R_second_to_last(target-digest))), and compare the result with all endpoints. Rinse and repeat. In the worst case, you have to compute as many hash/reduction functions as there are rows (regardless of the number of columns since all columns use the same hash and reduction functions).
> Instead, the attacker computes R_last(target-digest), and checks that against all the endpoints (last hash) of all column.
Wouldn’t R_last generate text? Why would you compare the text to last hash? Wouldn’t you just compare the target-digest to the last hash directly? (Nice explanation but I got lost here)
Rainbow tables don't make sense for a salted hash. So you're immediately narrowing down to only cases where an unsalted hash is used, like Microsoft's NTLM system or some older web systems that do MD5(password) or similar.
Building a rainbow tables is much more expensive (compute time, storage) than just breaking any individual hash. So unless you break hashes all day every day, you probably need to share that expense somehow, but then you can't customize. Maybe a large group of you want all of the old "NT hash" values, that's easy enough, but agreeing to do 5-7 alphanumerics for MD5() means the person attacking a site with an eight character minimum gets nothing out of it.
So aside from things like NT hash it has fallen out of favour.
Rainbow tables can still make sense for a salted hash if the salt can be predicted. A current real-world example I just recently learned about: SSIDs (network names) are used to salt WPA-based WiFi network password hashes. If the SSID is a default name (examples: NETGEAR, LINKSYS, XFINITY, etc.), then you could use a rainbow table.
The salt is defined to be random. To the extent you shove something that isn't random in as "salt" you don't get the desired consequences.
And I think ISP hand out WiFi routers stopped having daft names like this a decade or more ago. The only AP SSIDs that aren't partly random nonsense I see around here are my own (named "Reformed Distributed Republic") and somebody's FON router, the main ones I've seen elsewhere in my city are "eduroam" and "govroam" which are federated systems and so do not have a PSK - in all four cases you aren't going to help yourself by calculating rainbow tables with those SSIDs.
I've definitely still seen several in-use private WiFi networks using "NETGEAR" (although they usually have 2 numbers appended to the end, so that expands the keyspace by 10^2 = 100 different rainbow tables to crack those hashes).
It looks like this generic, repeated-SSID-naming "vulnerability" might continue into the present day via the use of a default name on hotspot devices, or smartphones serving as hotspots -- notice that on the "top 25 pwned wifi network names in the @pwnagotchi project database"[1], the top 1st and 2nd place respectively go to "AndroidAP" and "iPhone". Looking at my Pixel 3, I notice that the default hotspot name doesn't appear to be randomized (it's "PixNet").
A new GPU is able to calculate ~50,000 million MD5 hashes per second, an MD5 hash is stored typically on a 32 bytes hex string. If you want to store that you'll need more than 1TB per second: https://gist.github.com/Chick3nman/bb22b28ec4ddec0cb5f59df97...
I used MD5 because that's the typical hash you find unsalted on leaks, but if you do the math with others it is almost impossible to find an example where storing beats using a GPU to crack (even an older one) for a couple of hours.
Are there any archaic hashes that are built to be "slow" where this might not apply? It feels like a lot of modern slow hashes have salts built in (e.g. Bcrypt, Scrypt, Argon2) but if one didn't use a salt it would definitely still make sense to use a rainbow table against these hashes.
Is the idea that password hashes should be slow relatively new?
No, the idea that password hashing should be slow is built into the basic understanding of what password hashing is for and should do.
It's just that security wasn't as important (limited web attack surface) or generally understood back in the day (so people were even less likely to ask "is this hash suitable for passwords rather than checksums/indexing/etc?" than they are today), or the slow ones from then were fine -then-, but advances in hardware, the availability of the cloud/GPUs (so massive parallelization without a cost of infrastructure only a nation state could afford), etc, means they're easily compromised today.
The GP is correct, and I made some mistake on my estimation.
The idea that password hashes should be slow is indeed relatively new. Also it's contemporary to the idea that algorithms should have salts builtin, so those features usually go together.
There are actually hash functions designed for password storage (e.g. scrypt, argon2). These functions provide many features in addition to salting. The scrypt whitepaper is pretty easy to read if you want to learn more about this.
For something ancient like original Unix crypt() with 12-bit salt sure, it's just discouragement. A time-space tradeoff like rainbow tables is only going to be ~4096 times worse to calculate and you can amortize that over just 4096 successful attacks.
But this isn't the 1970s suppose you have 32-bit salt, now you need to use the rainbow table in 4 billion attacks to amortize the extra cost. Hey maybe you can attack every adult in the world?
In reality modern hashes often use 128-bit salt. Now you need to do billions of attacks, for each of the billions of people on the planet, just to keep it only billions of billions of times more expensive than brute force per attack. Or to put it more simply: This prevents the use of rainbow tables.
In the last section talking about improvements to reduce storage, couldn't the indexes be completely left out? Just storing the chain-end number in an array and using the array's index as the starting number.
Edit: this article is from 11/12/2006 and the last section from 04/09/2009 according the the website's home page.
----
The prison warden brings a hundred prisoners to an empty room and hands each of them one of a hundred cards labeled from 1 to 100, and says:
"In the next room there are 100 drawers. Into each drawer has been placed one of a hundred cards labeled from 1 to 100."
"One by one we will let you into the next room. Each of you will be given fifty chances to look inside one of those drawers, and then you will continue into the final waiting room."
"If every single one of you is able to find the card in the drawers that matches the one you have been given, you will all be released from prison today."
"One more rule - if any of you, after entering the room with the drawers, communicates any information to the remaining participants, you all fail."
The prisoners talk among themselves, discussing strategies for opening the drawers that might let them all find their number, and be released today.
They know that the chance for one prisoner to find their card, choosing randomly, is 50/100, or 1/2. The problem is that they all need to find their card, and if they all choose randomly then the chance they all succeed is (1/2)^100 - or (7.9 * 10^-29)%
One of the prisoners, having just read about how rainbow tables work, realised there was a strategy that would give them a greater than 30% chance to all succeed.
What was their strategy?
----
It's trivial to find the solution to this problem if you've never seen it before, search for the 100 prisoners problem. I don't actually know if reading about ranbow tables makes it easier to reason the solution, but there seemed a strong similarity so hopefully someone can figure it out!