You're right, authentication rates are low. But Moxies point of concern with regard to requiring a heavier CPU and memory commitment on the server is still valid, particularly when faced with malicious traffic.
You need to make sure your login system can handle malicious clients, brute-force attempts from individual IPs, or against particular accounts, dropping back to CAPCTHAs or some other means of resource control when things look bad. This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.
IMHO, some of the new fancy password-based authentication protocols, like AugPAKE, which do hashing client-side, might ease this problem... but they're still stateful (SYN floods anyone?) and therefore still depend upon good DoS protection.
> Moxies point of concern with regard to requiring a heavier CPU and memory commitment on the server is still valid, particularly when faced with malicious traffic.h
How about this: Save two hashes, one low cost, one high cost--both must pass for login. Don't run the high cost unless the low cost one works first.
Note: not a security guy, so I'm probably overlooking something silly.
Edit: the two responses below (thus far) are reasons you should not do this. Duh on my part.
If you have a high cost and low cost hash, someone getting the password database can just crack against the low cost hash. Even if the low cost hash is short so that it frequently returns a false okay (say a nice round number like 1/256 FP rate), you're still reducing the amount of "hard" hashes the attacker has to computer by a factor of that false positive rate.
You can, however, do the high-cost hash on the client, then do a low-cost hash comparison of the result on the server.
So you do a cheap SHA256 hash comparison on the server, but because the input to that comes out of, say scrypt, it has 256 bits of entropy (or near), making cracking infeasible. (scrypt has variable length output, so you could use other hash sizes).
This is only done when checking; when setting the password, both the slow and fast hashes are done on the server (to stop someone just hashing 'password123' with SHA256 and sending that).
This is called server relief. There's a discussion of it elsewhere in this comment thread, it is well known, e.g. [1], though doesn't google particularly well.
The downside, of course, is that someone with a disabled or borked javascript can't login to your site.
That's a neat idea. What do you put in both? I see two options:
1) You copy the whole password, and use the whole password for both. Then an adversary can just attack the weaker, faster one.
2) You split the password; half goes in one and half in the other. Now user passwords have doubled in length---or they haven't, and you have 4-10 character half-passwords instead of 8-20 character passwords.
There are separate concerns of brute-force mitigation that people should look into regardless of their hashing technology. Things like exponential-growth cooldowns per IP/user account (with lots of caveats to prevent cooldown-DDoS) should be used.
>This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.
When there is a choice between making dealing with denial of service mitigation more difficult or making it easier for user passwords to be discovered after a breach, you should definitely choose dealing with denial of service mitigation. Not saying you might disagree with that sentiment but I would have very little sympathy about a company trying to make excuses like that to validate increasing the risk of user passwords being exposed.
Think infrastructure rather than appliances like Facebook. What do you think happens if certain state keeping big authentication servers or routers reboot? We have a customer that has more than one million users on a GGSN (which is a glorified router). If that happen to restart every single customer will immediately and automatically try to reconnect.
The systems involved actually doesn't tend to handle this many in reality (the ISP in question restrict it to some 20-30k/s to our system). These systems are starting to be better at supporting keeping state through restarts, but there can still be pretty impressive spikes of authentication traffic if the wrong system is rebooted (or crashes!).
But its a non-obvious (imo) attack vector opened simply by switching to scrypt / bcrypt.
In a perfect world, quality web apps have rate limiting built into their auth schemes. But it's important to acknowledge these two algorithms will put a much heavier burden on your CPU.
If you're authenticating with a username and password on every request you're doing something wrong. Once a user is logged in you should be setting a cookie, session, or some other token to identify the user.
Authenticate once, generate a token, and use the token for auth from that point. I can't think of a case where you'd need to send the u/p on each request.
If you're writing http headers out to your apache logs on your production server, you're doing it _severely_ wrong.
edit: I'm specifically talking about http basic auth with a precomputed "Authentication: base64($username + $passwd)" header, not a GET of "/foobar?api_key=12345abcd". The latter is obvious in it's failures and is not related to http basic auth.
No, of course not. The CS problem behind a password hash (or KDF) is to exploit time and memory to convert a low-entropy secret into something with the strength of a high-entropy secret. But when you're securing individual HTTP requests with session tokens, you simply use high-entropy secrets, and thus don't have the password hash problem.
If users picked 128 bit random numbers as their passwords, any hash function would suffice to store authenticators for them. You'd still want to hash them, because presumably users will re-use the same 128-bit random number on several machines. But it would be infeasible to brute force them, so MD4 would do just fine.
There is no reason to hash a secure cookie, because they aren't used across several machines. The compromise of a stored session key implies that every system where that cookie is valuable is also compromised.
A reason to hash cookies/tokens on the server side is so you don't have to reset them if your server database is compromised. A fast cryptographically secure hash with digest size equal to the token size is a good choice.
This also has the benefit of defending against timing attacks, for example, if you don't completely trust your code or the compiler / interpreter to not optimize your "constant time" comparison.
It would be good practice to reset your auth tokens in the event of a database compromise anyway. In most cases that's a low-cost operation. The advantage is that it rescues you from the possibility that the attacker logged auth tokens when they were passing through one of your servers.
I agree, of course there is nothing stopping you from resetting if and when you want to, and on your own schedule. That's not a reason not to run a hash on the input token. I think there's no possible reason not to run 'token = sha256(token)'.
> In most cases that's a low-cost operation
But resetting API tokens is not necessarily low cost, as it's literally pulling the plug on all your persistent connections, and requiring manual intervention to bring them back online. There are many cases where you can't simply pull the plug on all your clients.
Also, as a byproduct it enforces all sorts of best practices, like ensuring the token is just a token, and doesn't have data piggybacking on it.
So what are your cookies/tokens if hashing them makes a difference? Seems like cookies should be CSPRNG values, and it is unclear what hashing one of those gets you.
The tokens are CS-PRNG, e.g. 32-bytes or larger. The preimage is stored on the client as the token. The token value received by the server is sent through a hash or HMAC before being compared against the persisted value on your backend database.
If the backend database is compromised/leaked whatever, the attacker sees the result of the Hash/HMAC, but cannot generate a preimage, so your tokens stored by clients do not need to be reset. Otherwise, a leak of the backend token database grants full access to the attacker.
This is particularly useful for long-lived tokens, like API tokens, but can also be used for short-lived sessions alike.
This is not currently considered a must-have / expected best practice, but I believe it will evolve into one, because it's hard to get wrong, and it protects against a reasonably common attack vector. Guaranteed timing attack resistance is a bonus add-on which is valuable in its own right.
The backend has been so invasively compromised that attackers have dumped the session token store. The thing that the session tokens protect essentially belongs to the attackers. Even if, as a result of crazy overengineering, they can't read the original "seed" secret the store holds authenticators for, they can simply overwrite them.
There is no point to hashing cookies and session keys. Don't bother. It's not just not a best practice. It's something a smart auditor would file a sev:info bug against.
Writing a new token will make a lot of noise in the system, exposing the breach. Using a stolen token from a new IP, perhaps much less so, although that too should set off warning bells. Appeal to authority aside, I can't see a downside.
What I like most about 'token = sha256(token)' is that it expresses a significant part of the design in a single, obvious line of code; we don't store tokens, we can't provide tokens after the fact, we can't display tokens on our god-mode UI, if you lose your token a new one much be regenerated because we simply don't have it anymore, the token must not encode data beyond being a unique random value, we cannot be socially engineered into sharing your token with an attacker, the token will never be disclosed through a timing side-channel, insider threats cannot access your token, etc.
> Writing a new token will make a lot of noise in the system
Simply doing a hash of the token won't get you the desired "noise" in the system. That is, you need to do something else besides what you have said here. For example, writing a new token, whether produced by an attacker or a legitimate user, will produce the same some "noise", no? Unclear how, when the attacker owns your database and any triggers, you are going to tell if it is an attacker.
Similarly, the single obvious elegant line of code is not going to deter an attacker. Oh, I imagine that there are attackers that enjoy the beauty of code in a hacked system and put it on pastebin, possibly slowing them down.
And what are the chances if the attacker is that deep into your system, that they aren't reading the tokens before they are hashed?
The downside that I see, and I would flag this scheme, is that it doesn't provide the protection that the developer is led to believe.
An attacker can't change a token without locking out the valid client, that's the "noise" I was speaking of. If my requests suddenly start failing because my token is no longer valid, the attacker has shown their hand and we now have proof of a compromise. There are many attack vectors which could provide read or even read/write access to the token database without memory access or code execution. To name one from last week, how about Slack?
Nothing, literally, nothing, can provide protection against the attack vector you're saying this does not help against. Sure, and of course I agree. But obviously we don't give up and do nothing because sometimes exploits grant root. We add layers of defense and expect that some will hold, and some will not. Some will act to contain a breach across certain vectors, and some will be run over. Some will simply slow an attacker down, or prevent them from walking away with 100% of your data, and only get a percentage based on the time they were able to maintain the compromise.
I don't know what you mean that the line is "not going to deter an attacker". Surely you understand the difference between a short-lived attack which can only target persisted data, and a persistent attack with full memory access, and surely your threat model allows for one without the other in many cases? So green boxes in those cases for this technique, and N/A for the other boxes. What am I missing?
There are attacks it does not withstand, the same attacks that make password hashing useless by-the-way, and there are attacks that it certainly does withstand. There are also several attack vectors which it eliminates completely, and various follow-on benefits which I listed as well.
On the whole it is a large positive in a simple obvious package with no possible downside (except perhaps auditors flagging it as a bug). I never said it was a magic bullet which protects you from full persistent system compromise. I understand exactly what protections it provides and beneficial policy restrictions it imposes.
I find the resistance to such a simple defense-in-depth perplexing. It not being perfect is not a reason not to do it. Neither is password hashing, but we still do that. I really am shocked that token hashing would be at all controversial. I do it, and I think everyone else should too.
"It doesn't provide the protection that the developer is led to believe." Well this is much more true of bcrypt than it is of token hashing! What percentage of passwords does bcrypt protect, and for how many minutes after the database is popped? It's impossible to answer. What percentage of tokens does token hashing protect? If you can answer a) describe the access level granted by the compromise, and b) describe the time when the compromise began, then I can definitively answer that question.
The answer cannot be, assume the attacker had full root access to all systems. Because we are under attack every single day, and some of those attacks succeed in some limited measure. There are laws on the books today, and new ones coming, which require rapid disclosure of the actual extent of data breaches, so this isn't academic.
Of course the exact extent of a compromise is not always know, but likewise there are cases where the extent of a compromise is precisely known, and in a subset of those cases, a hashed token prevents an attacker impersonating your clients.
I mean, sha256(token) is child's play, to be arguing about it is just odd. I'd like to see homomorphic encryption stop even a memory-sniffing attacker from ever intercepting my tokens. Lets get serious! What, you'd rather just trust TLS? :-/
Facebook has 900M monthly users, 30M per day and probably most of those are cookied in. Even if not that's only 350 authentications/sec...