Hacker News new | past | comments | ask | show | jobs | submit login
Enough with the Salts: Updates on Secure Password Schemes (matasano.com)
339 points by saturdayplace on April 1, 2015 | hide | past | favorite | 221 comments



It's easy to say "use bcrypt" or "use scrypt," but I think it's important to acknowledge the challenges of deploying those functions in large scale high performance environments.

It's straightforward and effective when password hashing happens client-side (like local encryption for a bitcoin wallet), but using scrypt or bcrypt server-side at effective difficulty levels will either destroy your L1/L2 cache or exceed your CPU budget.

In practice, from what I've seen, this means scrypt is never used, and bcrypt is used with a work factor that's tuned low. I think this is just a fundamental asymmetry, and that at the end of the day people use bcrypt so that they can say "we used bcrypt!" and get a nod when their entire user database is compromised, even though it provides very little value the way it's deployed.


> I think this is just a fundamental asymmetry

Actually the Password Hashing Competition has shown this asymmetry can be exploited to the server's advantage: one of the ideas being studied is called server relief.

For example instead of storing the scrypt hash on the server you store the SHA256 hash of the scrypt hash (and the salt alongside), and ask the client to send the scrypt hash to authenticate. This way the per authentication cost to the client is expensive (one scrypt hash) but the cost to the server is cheap (only one SHA256 hash).

Attackers cannot take shortcuts. They cannot send random scrypt hashes to the server while skipping the scrypt computation step, because the space of possible scrypt hashes is just too large to brute force.

The only downside of server relief is that you need custom code on the client-side to do the computation. So it works with javascript in browsers, but not with traditional clients like an IMAP or POP3 mail app.

Edit: @__david__: yes you could SHA256() only the hash part of the scrypt hash, as opposed to the hash & salt parts. It is an implementation detail --not really important. And regardless of how you do it, if the DB get stolen the SHA256 hashes are useless to authenticate.

Edit: @espadrine: no, breaking a SHA256(scrypt()) hash is no easier than breaking a scrypt() hash. Server relief with SHA256 introduces no weaknesses. It is strictly more secure than standard scrypt hashes.

Edit: @plantbased: I recommend against MD5(scrypt()), because it is likely preimage attacks will be found on MD5 in the near future, so someone stealing your database of MD5(scrypt()) hashes will be capable of computing an arbitrary scrypt hash that authenticates to a given user. But you could certainly do MD5-crypt(scrypt()) with MD5-crypt being the salted & iterated version by PHK.


I see, so basically it's like forcing users to make long, un-dictionary-attackable passwords, without requiring users to remember long, un-dictionary-attackable passwords.


I like that idea. But why do you store the salted SHA256 of the scrypt hash and not the scrypt hash directly?

Edit: Thinking about it more, I'm guessing it is for the case where your DB gets stolen. With the salted hashes, attackers can't just send the direct scrypt hash down to the server to get authenticated.


Correct. If you store the scrypt hash directly, it effectively becomes a plaintext password and a compromise of your database lets the attacker login to any account.


The whole idea of salts is to prevent rainbow table attacks i.e. you can't use precomputed hashes.

In this case you'd need to precompute an SHA256 of a dictionary of scrypt hashes which would have been computed from a dictionary of english words.

In practical terms, if you steal the DB and wanted to reverse passwords, you'd need to scrypt lets say 100 million words, then SHA256 them, then compare them to what's in the DB.

The original blog entry argues (or maybe it was the post on the same site that linked to) that we should use more compute intensive hashing algorithms, rather than relying on salts. scrypt is definitely more compute intensive.

So if you didn't use a salt and wanted to throw a 100M word dict at this, how long would that take on decent hardware to create the rainbow table of 100M words which could be used repeatedly against stolen dictionaries? On a decent GPU you can do around 1.2 billion SHA256 hashes per second. So the SHA256 step is trivial.

ASIC bitcoin hardware can do around 4.5 million scrypt hashes per second: https://bitcoinmagazine.com/13000/new-benchmark-scrypt-minin...

So you could create your rainbow table to attack the non-salted version of this in around 25 seconds.

A salted version of this same system would need the scrypt to be salted because SHA256 is so fast. If you just salted the SHA256 you'd be able to use a precomputed scrypt rainbow table and then you only have to compute the salted SHA256's which as you can tell from my benchmark above is very fast.

So clearly the salt is needed and you need to compute scrypt(salt + password) -->> send to server and then SHA256(scryptHash). Note that you don't need to add the salt on the SHA256 side because you've already defeated a rainbow table attack by your initial salting.

So how fast could you attack a salted system like this using GPU's for your SHA256 and ASICS for your scrypt based on the benchmarks above?

You can pretty much ignore the SHA256 step because it's so fast, so you'd get around 4.5 million guesses per second using ASICS doing scrypt.

Compared to 1.2 billion guesses using a GPU for SHA256, that's not too bad.


Cryptocurrency mining ASICs cannot be used for password cracking because they are not designed to hash arbitrary strings, and do not support arbitrary scrypt parameterization. They are designed to take in a block template of some sort, and then internally handle iterating the nonce to minimize load on the communication bus and the host processor.

An interesting thing about GPU password cracking that many people don't realize is that the GPU is responsible for not only hashing candidate passwords, but also generating those candidate passwords. The bus bandwidth isn't enough and the CPU isn't fast enough to keep it fed with candidate passwords otherwise.

You could, of course, build password cracking ASICs that have internal processors for generating candidate passwords, but I'm not convinced that it doesn't make more sense to stick with GPUs or maybe FPGAs there.


You'd be better off with FPGA which are available in significantly smaller process sizes (and therefor lower running costs) than ASICs you roll yourself. There's also a certain level of reusability that you don't get with SHA256 etched into silicon.


Not true re GPU bandwidth/dictionaries. see cudahashcat - we're using it on a switched PCIe bus feeding dictionaries with plenty of bandwidth. You're not limited to mask attacks, which is what you're describing.


Are you sure about that? I'm more familiar with oclhashcat, and it certainly can't keep the GPU busy with a pure dictionary attack. It'll feed the GPU with a dictionary, then apply rules on GPU to generate variants.


This is something I've wanted in browsers since I was about 15 - I've written a proposal on Specifiction for discussion around putting this in the Web Platform: http://discourse.specifiction.org/t/html-server-relief-passw...


The problem with the approach you mention is that clientside bugs can evilly break things. If the clientside code silently gives wrong results, not only will you not know, but, assuming they run it from the start (i.e. when they register), it will actually "work" - until the client moves to another machine or updates. That's not a good situation.

You may be able to detect this with testing, but not necessarily. And yes, it'll probably happen sooner or later, given the nature of JS compilers.

I don't know if there's a method that avoids this problem that still has the advantages that you mention.


If you're worried about this, you could do server-side hashing during signup, and client-side hashing during login.


This is brilliant. But since we are often told that composing cryptographic promitives can turn out to be unexpectedly dangerous, can you point me to some articles where this technique is studied in more detail? Preferably by experts such as yourself.


mrb, this is awesome, thanks for sharing. Can you comment on storing the scrypt as an md5 hash and how that would impact security? [Asking because I'm confined to server side systems that only support md5]

IMHO While MD5 has a fraction of the keyspace of SHA2, it's still a very hard problem to reverse it and intuitively it seems this might provide a huge improvement over salted and stretched (multiple rounds) of md5 on the server.

Thanks again.


Since MD5 isn't really vulnerable to a preimage attack, this would probably work fine - 2^128 possibilities are essentially impossible to brute-force.

I'm curious - what kind of infrastructure would constrain you to md5?


The attacker would then intercept that scrypt hash sent from the client and use it to authenticate.


You can't. The scrypt hash should be protected by HTTPS the same way a website password is protected by HTTPS.


> The only downside of server relief is that you need custom code on the client-side

That, and the major reason we do hashes even over TLS: if the server gets cracked and the attacker has access to the stored hashes, breaking a SHA256 is trivial (ie, brute-forcing b when knowing s = SHA256(b)). Then, the attacker can assume the identity of any individual whose SHA256 s got broken, by sending the bcrypt b directly from a crafted client, without having to find p where b = bcrypt(p).


People have odd ideas about how serious the vulnerabilities in hashes are. These hashes, even old and obsolete ones, are still entirely secure for password like hashing if the password is random enough, which scrypt output is.

Even MD5 is still (AFAIK) effectively unbreakable in this use case (finding md5 collisions is possible, finding the input given an md5 is not). According to http://en.wikipedia.org/wiki/MD5#Preimage_vulnerability finding input to generate a given md5 has a complexity of 2^123.4 - i.e. never going to happen. And that's Md5, an obsolete hash!

Sha1, also considered obsolete, has a bad rep because there are scary theoretical collision attacks. This isn't good news, certainly, but for a sense of perspective: nobody, not even with lots of distributed computing power, has ever actually managed to find a collision since that would take around 2^60 attempts. I don't think anyone has ever found even a theoretical weakness in its preimage resistance.

And you're talking about sha2 - there have been no published reports on any attacks, even theoretical, on the full sha2 (they have found weaknesses on reduced versions, suggesting room for improvement - but that's largely of academic interest).

Collision attacks are another matter. If you're using a hash to sign potentially hostile content rather than recognize friendly content, md5 is clearly dangerous, and sha1 may well become so. But that's relevant to things like certificate authorities, not message authentication.


He's not using "broken" in the sense of "arbitrary preimages", is he? He's using "broken" in the sense of "easy to brute-force", right?


I meant it in the brute-force sense, yes. I'm not sure how practical it would be when the input of the SHA-2 is random, however.


Yeah - brute forcing is really only a relevant attack if the number of possibilities is fairly limited (even with a really fast hash). Scrypt output is likely to be effectively random (if salted) - and without salt, you'd need to build an expensive (but potentially feasible) rainbow table of possible scrypt output.


> you'd need to build an expensive (but potentially feasible) rainbow table of possible scrypt output

I don't think that's feasible. "Expensive" is an understatement. To compute the rainbow table you need to do the work of actually calculating the scrypt output for each password, even though you don't have to store it. The space is just too big, therefore the time cost.


It's a question of size - clearly a rainbow table of 1 is feasible. I'm guessing a rainbow table of a million won't be too expensive either, and with a million common passwords, you'd likely crack something.

But yeah, it's not likely to be a major threat. In any case, salting makes it irrelevant.


Is scrypt even implemented without a salt? If not, then it doesn't matter. And if so, why?


Good point - it is implemented with a salt! However, the way that's done depends on the implemetation. The underlying KDF doesn't require nor generate a salt, so if you're implementing it client side, you'd need to ensure salting works.


If collisions are not too hard to find, isn't the process just:

1: Steal database 2: find collision 3: authenticate with the input that hashes to the same value

What stops that from happening?


Re-read that comment, particularly the part about the difference between a collision attack and a preimage attack.

MD5 is (currently) vulnerable to collisions, but not to preimages. So you can find two inputs that have the same hash, but not an input that hashes to some particular value in a database.


@teraflop - Oh, so does that just mean basically that you can, e.g. generate a bunch of MD5 hashes and some will be the same? But if you have a target you're basically SOL finding a collision for that? That would make sense if that's what it means.


Yep.

More specifically: if you just hash a bunch of arbitrary strings that aren't specially-constructed for the purpose of colliding, then collisions are basically random, and extremely improbable. But you can fairly easily generate two files, differing only in a small number of bits, with the same MD5 hash by taking advantage of the structure of the algorithm.

Examples here: http://www.mscs.dal.ca/~selinger/md5collision/


Another consequence of that is that even MD5 collisions aren't at all trivial to exploit in general. An attacker can create a collision, but it's a slow process, and the colliding content is pretty constrained. You'd probably need to be very well informed, and invest quite some creativity to find two messages that are "valid" to whatever system is processing those and have sufficiently different meanings to be useful to you.

Clearly doable in specific instances, but it's not going to be an addition to the script-kiddie attack handbook anytime soon.


> These hashes, even old and obsolete ones, are still entirely secure for password like hashing if the password is random enough

Ah, got you, I'm wrong. Thanks!


I thought it was trivial to quickly compute SHAs so you can try lots of passwords.

In the case of sending bcrypt output wouldn't the "password" be long and essentially random and therefore not easy to enumerate?

Or is SHA also reversible?


well breaking SHA256 by brute force is only trivial when your input space is small. presumably scrypt has 256 bit output so bruteforcing to find the scrypt output would be difficult.


1. Empirically: most of the applications I reviewed that used bcrypt simply used the defaults.

2. Even with a lower work factor, bcrypt is drastically better than salted SHA.

3. Companies with auth loads so high that they really need to tune bcrypt should also be using dedicated auth servers, which (a) improve security in other ways and (b) allow for a lot of flexibility in handling work factors.


> 1. Empirically: most of the applications I reviewed that used bcrypt simply used the defaults.

Isn't that a security risk of it's own though? Wouldn't creating a rainbow table for (non-salted) passwords hashed with the default bcrypt settings then make sense again?


No.


Why?

Also; if you don't feel like writing a useful response feel free not to respond at all.


Because bcrypt automatically generates and adds a salt every time it's invoked. Every bcrypt hash has a different salt. Hashing the same plaintext ten times will produce ten different hashes.


Ah OK; so the default bcrypt implementation essentially just includes the salt in it's output by definition. Got it.

I thought hashes were deterministic by definition so I was wondering where the required randomness came from.


One of the things that make these password hashing threads astonishingly tedious is that somebody who has never so much as googled "bcrypt" always chimes in with "but what about the rainbow tables???".


There's an XKCD for that: https://xkcd.com/1053/

As far as I'm concerned bcrypt took the term for a concept that is generally understood to be deterministic (hashing) and redefined it by adding the randomness in by default.

That was probably a better call than trying to introduce a new term and a new hashing algorithm at the same time; but it's also asking for the exact long-term confusion you're complaining about here.


That XKCD doesn't really apply here since you could have easily asked for an explanation without the snark in the first place. There's also a difference between making fun of someone for not knowing something and choosing not to engage with them.

> That was probably a better call than trying to introduce a new term and a new hashing algorithm at the same time; but it's also asking for the exact long-term confusion you're complaining about here

To be fair, besides a basic search for information on the subject, this was also covered in the linked article...


He didn't ask snarkily, but we sure are beating this subthread to death. Let's all stop here.


Really? I just reread the article and I couldn't find anything about bcrypt containing the salt as part of its output.

P.S. The XKCD wasn't directed tptacek but at tedunangst who apparently felt my comment made this thread 'astonishingly tedious.'


What term? The paper that introduced bcrypt was titled "A Future-Adaptable Password Scheme". The first result, Wikipedia, calls it "a key derivation function for passwords".


Because bcrypt uses a salt, so a "rainbow table for non-salted passwords" would be utterly useless.


If they're using dedicated hardware, why not use smartcards to salt the hash.

From the article, even 74kh/s is going to tear through a lot of passwords, but with a secure element in the mix, offline attacks go nowhere.


The purpose of a password hash is to protect the password authenticators in the event the database is compromised. If you define away the possibility of a database compromise, just store them in plaintext.


I'm not trying to define away the possibility of a database compromise, I'm integrating a secure element into the hashing scheme that can compute an HMAC without exposing its key.


Wherever you're storing the key, where it's inaccessible to attackers, just store the password.

I'm not even saying that keyed hashes are a bad idea. I'm saying they're orthogonal to the question being discussed on the thread.


What is authenticating millions of users per second?

Facebook has 900M monthly users, 30M per day and probably most of those are cookied in. Even if not that's only 350 authentications/sec...


You're right, authentication rates are low. But Moxies point of concern with regard to requiring a heavier CPU and memory commitment on the server is still valid, particularly when faced with malicious traffic.

You need to make sure your login system can handle malicious clients, brute-force attempts from individual IPs, or against particular accounts, dropping back to CAPCTHAs or some other means of resource control when things look bad. This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.

IMHO, some of the new fancy password-based authentication protocols, like AugPAKE, which do hashing client-side, might ease this problem... but they're still stateful (SYN floods anyone?) and therefore still depend upon good DoS protection.

Edit: Oh poop, you're djb.


That's not Daniel Bernstein.


I have to admit, when I saw the username, I sorta hoped it was.


> Moxies point of concern with regard to requiring a heavier CPU and memory commitment on the server is still valid, particularly when faced with malicious traffic.h

How about this: Save two hashes, one low cost, one high cost--both must pass for login. Don't run the high cost unless the low cost one works first.

Note: not a security guy, so I'm probably overlooking something silly.

Edit: the two responses below (thus far) are reasons you should not do this. Duh on my part.


If you have a high cost and low cost hash, someone getting the password database can just crack against the low cost hash. Even if the low cost hash is short so that it frequently returns a false okay (say a nice round number like 1/256 FP rate), you're still reducing the amount of "hard" hashes the attacker has to computer by a factor of that false positive rate.


You have just inadvertently created an oracle. This is pretty easy to do when trying something clever; hence the common advice to avoid being clever.


Right, don't do this, for the reasons given.

You can, however, do the high-cost hash on the client, then do a low-cost hash comparison of the result on the server.

So you do a cheap SHA256 hash comparison on the server, but because the input to that comes out of, say scrypt, it has 256 bits of entropy (or near), making cracking infeasible. (scrypt has variable length output, so you could use other hash sizes).

This is only done when checking; when setting the password, both the slow and fast hashes are done on the server (to stop someone just hashing 'password123' with SHA256 and sending that).

This is called server relief. There's a discussion of it elsewhere in this comment thread, it is well known, e.g. [1], though doesn't google particularly well.

The downside, of course, is that someone with a disabled or borked javascript can't login to your site.

[1] https://books.google.co.uk/books?id=uZ-1BwAAQBAJ&pg=PA370&lp...


That's a neat idea. What do you put in both? I see two options:

1) You copy the whole password, and use the whole password for both. Then an adversary can just attack the weaker, faster one. 2) You split the password; half goes in one and half in the other. Now user passwords have doubled in length---or they haven't, and you have 4-10 character half-passwords instead of 8-20 character passwords.


There are separate concerns of brute-force mitigation that people should look into regardless of their hashing technology. Things like exponential-growth cooldowns per IP/user account (with lots of caveats to prevent cooldown-DDoS) should be used.

http://stackoverflow.com/questions/549/the-definitive-guide-...


>This isn't a particularly easy problem to solve, and if you do it incorrectly you open yourself and your users up to a denial-of-service.

When there is a choice between making dealing with denial of service mitigation more difficult or making it easier for user passwords to be discovered after a breach, you should definitely choose dealing with denial of service mitigation. Not saying you might disagree with that sentiment but I would have very little sympathy about a company trying to make excuses like that to validate increasing the risk of user passwords being exposed.


Think infrastructure rather than appliances like Facebook. What do you think happens if certain state keeping big authentication servers or routers reboot? We have a customer that has more than one million users on a GGSN (which is a glorified router). If that happen to restart every single customer will immediately and automatically try to reconnect.

The systems involved actually doesn't tend to handle this many in reality (the ISP in question restrict it to some 20-30k/s to our system). These systems are starting to be better at supporting keeping state through restarts, but there can still be pretty impressive spikes of authentication traffic if the wrong system is rebooted (or crashes!).


Not normally, but it does open up a DoS vector that at the very least will prevent legitimate users from logging in.


It's one of thousands of DoS vectors, all of which are addressed the same way: coarse ad-hoc anomaly filters.


But its a non-obvious (imo) attack vector opened simply by switching to scrypt / bcrypt.

In a perfect world, quality web apps have rate limiting built into their auth schemes. But it's important to acknowledge these two algorithms will put a much heavier burden on your CPU.


Most DoS vectors are non-obvious, so this seems like a very weak reason to change the way you do password hashing.


Maybe each request needs to be authenticated?


If you're authenticating with a username and password on every request you're doing something wrong. Once a user is logged in you should be setting a cookie, session, or some other token to identify the user.


I believe HTTP basic auth checks the username and password on every request. It's pretty rarely used these days though.


Only for secret admin tools :D


A fair number of APIs offer basic auth, even in 2015.


Hell, the Reddit API uses basic auth in 2015.


Yikes... I guess I've been pretty lucky with the API's I've had to work with in the past, I can't recall ever coming across this.


Authenticate once, generate a token, and use the token for auth from that point. I can't think of a case where you'd need to send the u/p on each request.


Does that token not become a new attack vector that also needs a certain level of security or encryption?

Or because it expires, the duration means it doesn't matter if it's compromised?


Exactly this.

I see that token used - sometimes in the http request header - but most of the time as a param in the GET request, over plain http.

Does it even matter if my auth was secure ? I just need to get hold of some access logs and I can impersonate everyone ?


If you're writing http headers out to your apache logs on your production server, you're doing it _severely_ wrong.

edit: I'm specifically talking about http basic auth with a precomputed "Authentication: base64($username + $passwd)" header, not a GET of "/foobar?api_key=12345abcd". The latter is obvious in it's failures and is not related to http basic auth.


Obviously :-) If you're going after basic-auth-headers, you'd probably be sniffing the network.

> Authenticate once, generate a token, and use the token for auth from that point.

If that token is passed back and forth in the http url, it ends up at places where it's easy to find/intercept.

You can use a gazillion bcrypt rounds to store the password: they still send me a link to a page, including their auth-token.


Edit: My intention was to get some input on how to properly use auth-tokens.


Highly unlikely. If you've got an API or site designed around validating passwords on each request you've failed as an engineer.


Oh --- so this bcrypt stuff does not apply to session tokens? Those are okay to leave unencrypted in memory or on disk or something?


No, of course not. The CS problem behind a password hash (or KDF) is to exploit time and memory to convert a low-entropy secret into something with the strength of a high-entropy secret. But when you're securing individual HTTP requests with session tokens, you simply use high-entropy secrets, and thus don't have the password hash problem.


If users picked really good passwords, full sentences with high entropy --- would we not need to encrypt them?

Sorry I think you already explained this, I'm just swimming. I probably should have basic CS understanding to be in this conversation ....


If users picked 128 bit random numbers as their passwords, any hash function would suffice to store authenticators for them. You'd still want to hash them, because presumably users will re-use the same 128-bit random number on several machines. But it would be infeasible to brute force them, so MD4 would do just fine.

There is no reason to hash a secure cookie, because they aren't used across several machines. The compromise of a stored session key implies that every system where that cookie is valuable is also compromised.


A reason to hash cookies/tokens on the server side is so you don't have to reset them if your server database is compromised. A fast cryptographically secure hash with digest size equal to the token size is a good choice.

This also has the benefit of defending against timing attacks, for example, if you don't completely trust your code or the compiler / interpreter to not optimize your "constant time" comparison.


It would be good practice to reset your auth tokens in the event of a database compromise anyway. In most cases that's a low-cost operation. The advantage is that it rescues you from the possibility that the attacker logged auth tokens when they were passing through one of your servers.


I agree, of course there is nothing stopping you from resetting if and when you want to, and on your own schedule. That's not a reason not to run a hash on the input token. I think there's no possible reason not to run 'token = sha256(token)'.

> In most cases that's a low-cost operation

But resetting API tokens is not necessarily low cost, as it's literally pulling the plug on all your persistent connections, and requiring manual intervention to bring them back online. There are many cases where you can't simply pull the plug on all your clients.

Also, as a byproduct it enforces all sorts of best practices, like ensuring the token is just a token, and doesn't have data piggybacking on it.


So what are your cookies/tokens if hashing them makes a difference? Seems like cookies should be CSPRNG values, and it is unclear what hashing one of those gets you.


The tokens are CS-PRNG, e.g. 32-bytes or larger. The preimage is stored on the client as the token. The token value received by the server is sent through a hash or HMAC before being compared against the persisted value on your backend database.

If the backend database is compromised/leaked whatever, the attacker sees the result of the Hash/HMAC, but cannot generate a preimage, so your tokens stored by clients do not need to be reset. Otherwise, a leak of the backend token database grants full access to the attacker.

This is particularly useful for long-lived tokens, like API tokens, but can also be used for short-lived sessions alike.

This is not currently considered a must-have / expected best practice, but I believe it will evolve into one, because it's hard to get wrong, and it protects against a reasonably common attack vector. Guaranteed timing attack resistance is a bonus add-on which is valuable in its own right.


The backend has been so invasively compromised that attackers have dumped the session token store. The thing that the session tokens protect essentially belongs to the attackers. Even if, as a result of crazy overengineering, they can't read the original "seed" secret the store holds authenticators for, they can simply overwrite them.

There is no point to hashing cookies and session keys. Don't bother. It's not just not a best practice. It's something a smart auditor would file a sev:info bug against.


Writing a new token will make a lot of noise in the system, exposing the breach. Using a stolen token from a new IP, perhaps much less so, although that too should set off warning bells. Appeal to authority aside, I can't see a downside.

What I like most about 'token = sha256(token)' is that it expresses a significant part of the design in a single, obvious line of code; we don't store tokens, we can't provide tokens after the fact, we can't display tokens on our god-mode UI, if you lose your token a new one much be regenerated because we simply don't have it anymore, the token must not encode data beyond being a unique random value, we cannot be socially engineered into sharing your token with an attacker, the token will never be disclosed through a timing side-channel, insider threats cannot access your token, etc.


> Writing a new token will make a lot of noise in the system

Simply doing a hash of the token won't get you the desired "noise" in the system. That is, you need to do something else besides what you have said here. For example, writing a new token, whether produced by an attacker or a legitimate user, will produce the same some "noise", no? Unclear how, when the attacker owns your database and any triggers, you are going to tell if it is an attacker.

Similarly, the single obvious elegant line of code is not going to deter an attacker. Oh, I imagine that there are attackers that enjoy the beauty of code in a hacked system and put it on pastebin, possibly slowing them down.

And what are the chances if the attacker is that deep into your system, that they aren't reading the tokens before they are hashed?

The downside that I see, and I would flag this scheme, is that it doesn't provide the protection that the developer is led to believe.


An attacker can't change a token without locking out the valid client, that's the "noise" I was speaking of. If my requests suddenly start failing because my token is no longer valid, the attacker has shown their hand and we now have proof of a compromise. There are many attack vectors which could provide read or even read/write access to the token database without memory access or code execution. To name one from last week, how about Slack?

Nothing, literally, nothing, can provide protection against the attack vector you're saying this does not help against. Sure, and of course I agree. But obviously we don't give up and do nothing because sometimes exploits grant root. We add layers of defense and expect that some will hold, and some will not. Some will act to contain a breach across certain vectors, and some will be run over. Some will simply slow an attacker down, or prevent them from walking away with 100% of your data, and only get a percentage based on the time they were able to maintain the compromise.

I don't know what you mean that the line is "not going to deter an attacker". Surely you understand the difference between a short-lived attack which can only target persisted data, and a persistent attack with full memory access, and surely your threat model allows for one without the other in many cases? So green boxes in those cases for this technique, and N/A for the other boxes. What am I missing?

There are attacks it does not withstand, the same attacks that make password hashing useless by-the-way, and there are attacks that it certainly does withstand. There are also several attack vectors which it eliminates completely, and various follow-on benefits which I listed as well.

On the whole it is a large positive in a simple obvious package with no possible downside (except perhaps auditors flagging it as a bug). I never said it was a magic bullet which protects you from full persistent system compromise. I understand exactly what protections it provides and beneficial policy restrictions it imposes.

I find the resistance to such a simple defense-in-depth perplexing. It not being perfect is not a reason not to do it. Neither is password hashing, but we still do that. I really am shocked that token hashing would be at all controversial. I do it, and I think everyone else should too.

"It doesn't provide the protection that the developer is led to believe." Well this is much more true of bcrypt than it is of token hashing! What percentage of passwords does bcrypt protect, and for how many minutes after the database is popped? It's impossible to answer. What percentage of tokens does token hashing protect? If you can answer a) describe the access level granted by the compromise, and b) describe the time when the compromise began, then I can definitively answer that question.

The answer cannot be, assume the attacker had full root access to all systems. Because we are under attack every single day, and some of those attacks succeed in some limited measure. There are laws on the books today, and new ones coming, which require rapid disclosure of the actual extent of data breaches, so this isn't academic.

Of course the exact extent of a compromise is not always know, but likewise there are cases where the extent of a compromise is precisely known, and in a subset of those cases, a hashed token prevents an attacker impersonating your clients.

I mean, sha256(token) is child's play, to be arguing about it is just odd. I'd like to see homomorphic encryption stop even a memory-sniffing attacker from ever intercepting my tokens. Lets get serious! What, you'd rather just trust TLS? :-/


This makes sense, but--forgive my ignorance--what site needs to process millions of password authentications per second? Don't most sites hash/compare the password once at login, then issue an ephemeral token to maintain the authenticated state? Is Facebook, for example, hashing my actual password for every HTTP request that my Facebook mobile app makes?


I think if the most efficient way to attack the hashes is to buy the same hardware you use to process logins than in some sense you've already won. It's where the attacker can buy GPUs or program an FPGA and dollar for dollar make orders of magnitude more guesses per second than you do verifications per second that you want to avoid.

To the larger point, I do agree that some of this starts to look to be a bit of security theater. If a user picks a laughably weak password, and the server is completely compromised, it doesn't much matter what password storage scheme you used, and on the other hand if the user picks a great password, especially if he never reuses it, the consequences of losing that hash is limited.


I think maybe you missed it a bit. If attackers can spend a bunch of money to guess enough passwords to be useful in, say, weeks, then if you make it take 100 times as long, now it's years.

And using scrypt with a decent work factor isn't 100 times harder, its tens or hundreds of thousands of times harder.

Additionally, I think one of the benefits of scrypt is that it is specfically designed to deter GPU or FPGA attacks because it is memory intensive instead of just CPU.


Even though the choice of scrypt might necessitate dedicated authentication servers at "web scale", I don't see any better options on the table.

Hardware is cheaper than trust, or litigation.

(And there's always rate-limiting)


Moxie hit the nail on the head. It's absolutely a fundamental asymmetry. Slow hashes can protect strong passwords, and not much else. And if you use a password manager to generate strong passwords, then you are going to throw that password out anyway.

In the time it takes for your users to give up on the hourglass, assume their login request is hung, and start double-clicking submit, you have now DDoS'd yourself. That's about 750ms I think...

Also, try this sometime -- clear your cookies and try to login to Google, or Twitter, or Facebook. Time how long it takes before you see the first byte of a session cookie. Hint: Not Long Enough.

Password hashing is embarrassingly parallel. The botnets don't cost the attackers anything to hash your password on 100,000 CPUs all at once, after they've decided to target you. Moore's law? Also not helping.

We need something better. I'm working on just that, I'm sure it will have more than it's share of naysayers, but I'm giving it my best.


You raise a good point, but one that's probably an issue to a minority of those implementing applications that require an authentication component. Sure, the Twitbooks, GooSofts and FaceLinks of the world that deal with large numbers of concurrent authentications have to think a bit more about this, and that's fine; I would expect that they would be well equipped to do so, and likely also run dedicated authentication servers and carefully considered systems.

I can probably count on one hand the number of times I've seen an application use bcrypt or scrypt and define their work factors. The common case seems to be sticking to the defaults that their chosen library ships with. While these may be somewhat less than ideal, it's universally true that these schemes are stronger than salted passwords fed to a fast cryptographic hash.


>>but using scrypt or bcrypt server-side at effective difficulty levels will either destroy your L1/L2 cache or exceed your CPU budget.

Perhaps this is the real issue: lack of sufficient budget for CPUs that can reliably get the job done is a reflection on the priority organizations place on information security.


For an attacker, a great way to cripple an organization that uses scrypt/bcrypt easily is just to attack logins. If each login takes 1 second of work, then it's easy to overwhelm any system system using a host of bots on distributed hosting systems. Say an attacker can hit with a mere 1 req/s from 100 hosts (very easy with aws/digitalocean/hetzner/etc), then that's 100 CPU-seconds of work just for logins.

Comparatively, if they hit with 100 req/s for normal front-end data, caches can easily handle that. So really, adding these types of login checks to any publicly available login page ends up creating an on-demand DDOS target.

From the password cracking side, the asymmetry comes from the fact that an organization has to have X amount of server resources to service Y number of requests, but an attacker can utilize the same X amount of server resources to service just one thing, breaking the exact password they are looking for. This is further exaggerated by the fact that generally for an organization those X server resources also need to more than just service logins (like actually serve content or API data).


If someone can't easily figure out how to create per-IP login attempt throttles, I suspect their application has far more gaping holes in it.


a mere 1 req/s from 100 hosts

I specifically set a request number that would be below most thresholds.

That said, a one-time request from 100 hosts would still use 100 CPU-seconds of work. Other than preemptively blocking hosts (such as all of AWS), there is no way that a "per-ip attempt throttle" is going to catch a single request from 100 different hosts.

If you had a server with maxed out with 8 intel E7-8870V2 CPUs (15 cores @ 8 CPUs = 120, $32k server cost just in CPUs), and set for a work factor that gave a normal "1 second" work factor, then someone just DOS'd you for most of a second. On a more reasonable 8 core server, that DOS would last 12.5 seconds (actual CPU time, not including the fact that they are stacked on top of each other). And on a dual core system that's almost 2 minutes.

If there was any other part of a website that could be DOS'd for so long with so few requests, then most people would also suggest the application had "far more gaping holes in it." Moxie's point above is that the advice presented is to just put bcrypt/scrypt/PBKDF2 in place, and no advice is given at all on how to deal with these issues that come up, or even that they exist at all, and thus systems end up being misconfigured or work factors relaxed to a point where only a false sense of security is gained.


> I specifically set a request number that would be below most thresholds.

No legitimate user will be logging in once per second. If you're specifically throttling logins and don't set it higher, you're so incompetent you shouldn't be writing production code in the first place.

> That said, a one-time request from 100 hosts would still use 100 CPU-seconds of work. Other than preemptively blocking hosts (such as all of AWS), there is no way that a "per-ip attempt throttle" is going to catch a single request from 100 different hosts.

So you've successfully DDoS'd an application for the few seconds that it takes a couple dozen cores to chew through 100 CPU seconds. Um, congratulations? I don't think there's a reward for "most pathetic DDoS attempt in history", but it should go to this.

> And on a dual core system that's almost 2 minutes.

You're very confused. When we say "1 CPU second", we're speaking of a single CPU core. 100 CPU seconds on a dual-core system takes ~50 real seconds. Your hypothetical 120-core server would be theoretically capable of processing nearly 120 logins per second at a 1-second work factor. bcrypt et. al are not multi-threaded.


A site I know allows 5 login attempts per hour. That seems plenty for legitimate purposes. I've never heard anyone complain.


But it doesn't matter if they keep hitting their service with a list of known emails and then sending bogus passwords, 5 times per email per host.


Sorry for late reply. It was 5 attempts per hour per ip. Not per email.


There are botnets with hundreds of thousands of hosts in them. There are proxies and NATs with hundreds or thousands of users behind them. How do you balance those out?


> There are botnets with hundreds of thousands of hosts in them.

Which have a wide array of other mechanisms by which to DDoS your application. If that level of force is being directed at you, you need professional DDoS mitigation assistance. The CPU time cost of your login mechanism is immaterial.

> There are proxies and NATs with hundreds or thousands of users behind them.

You do not need to throttle an IP if it is not the source of an attack. But it remains inevitable that a serious DDoS will sometimes break legitimate users, even with intelligent mitigation strategies. Welcome to the real world, it ain't pretty!


Which have a wide array of other mechanisms by which to DDoS your application.

Why use any of those other mechanisms, which might require a few thousand hosts, when a method exists where a hundred hosts can do just as much damage by getting the server to punch itself in the face.


Because a method does not exist where a hundred hosts can do just as much damage. It is utterly trivial to detect and block anomalous login activity from 100 hosts.


What are the downsides of avoiding this by using scrypt/bcrypt client-side (for a web app)?


People have done it.

Few downsides that immediately come to mind:

- Some people run NoScript or have niche devices that mangle JavaScript (no login for you!)

- The client now has to generate the salt. So you'll want the client to be using a modern browser which supports window.crypto.getRandomValues(); (so IE 11 or newer)

- Low power (either electrically or performance wise) might struggle with the work factor and that might reduce the user experience (the browser might become unresponsive)

- Changing the crypto function is now quite annoying/work intensive. Effectively the client will now need to generate the old hash and the new hash, and send both. You'll need to check the old, check the new, and store the new if either are correct.

- There are some cross-site-scripting and forged request concerns. But those exist for all login pages frankly.

- You'll still need to do SOME work on the server to convert the raw client hash into a storable version (in case of compromise) so they cannot just re-send what is in the database for login (essentially the hash would be a plain text password for all intents and purposes). You could reduce the work factor, but not remove it entirely.

Essentially trusting a browser is a bad bet because too many browsers suck. I'd prefer to put my trust in a more known quantity, the web-server, even if it costs me a few cent in usage than do the extra dev', extra testing, and extra tech' support required to get browsers to do it.

Unfortunately there are still a lot of IE 6-8, Android 2.xx, and so on users who drag everyone down.


Minor nitpick, why would the client need to generate the salt to do authentication? I thought the salt is generated once when the password is first set, which can be generated at server side, and stored along with the salted hash. The salt would need to be retrieved from the server to the client and combined with the user entered password to compute the authenticating hash.


As a minor quibble (or am I missing something?) - the salt doesn't really need lots of entropy and you could send it to the client too; so lack of window.crypto isn't a major factor.

Of course, that doesn't really undermine your conclusion in any way - as you say, enough browsers "suck" that doing this client side may well negatively impact quite a few users.


> As a minor quibble (or am I missing something?) - the salt doesn't really need lots of entropy and you could send it to the client too; so lack of window.crypto isn't a major factor.

The KDF (bcrypt, scrypt) certainly needs a suitable CSPRNG though, which you aren't able to get across all browsers/clients right now.


I'm not sure I quite follow the point you're making. Perhaps I've misunderstood what you're saying, but bcrypt only uses a CSPRNG for generating the salt.

If you generate the salt on the server and pass it to the client, then the client only needs to be able to implement the deterministic part of bcrypt.

Evidence:

The Original OpenBSD implementation http://ftp3.usa.openbsd.org/pub/OpenBSD/src/lib/libc/crypt/b... only generates randomness in bcrypt_initsalt

The mindrot java implentation https://code.google.com/p/jbcrypt/source/browse/trunk/src/ma... Only uses [Secure]Random in gensalt

This fact should also be an intuitive deduction - after you've performed and stored the initial bcrypt derivation (with a fresh salt) all future executions need to be deterministic, because they need to product the exact same output (for comparison purposes). bcrypt wouldn't be much use to you if it generated new hash output each time you ran it (with the same password+salt).


Correct me if I'm wrong (I'm no crypto expert), but doesn't client-side hashing take away the main point of hashing passwords in the first place? If a password database is compromised, the hashes don't even need to be cracked, they can just be sent straight to the server for authentication. The only thing it seems to help with for the client<->server case is password reuse between websites, and that's only if they don't use the same client-side hashing scheme.

Moxie seemed to hint at this problem by mentioning local bitcoin wallets--client-side hashing only makes sense if you're not sending the hash anywhere.


This is covered in a little more details in other comments [1] but the basic idea is:

- The user generates a password. That password can be assumed to have relatively low randomness (entropy). It's normally going to be 6-12 ascii characters which isn't a particularly large input space. Expecting users to remember something substantially larger is not realistic.

- Because the input space is so small, good password hashing solutions need to have a high work factor, so that is it computationally infeasible to do a dictionary attack on the database.

- If the input space was larger [but everything else stayed the same] then we could reduce the work factor, because the size of the dictionary would increase, automatically making dictionary attacks harder to do.

- That "everything else stayed the same" is important. We still need our hash to be one-way, but it means (in overly simple terms) if we increase our input space by 1000 times, then we can decrease our work factor by the same magnitude.

- If you can increase the input space enough, then eventually you get to the point where a standard hash like SHA2 has a high enough "work factor" (CPU/GPU load)

And the missing piece is that it is argued (and is potentially true, but I haven't done the analysis, so don't take this as a recommendation or endorsement) that scrypt (used appropriately, with random salting) is an effective means of converting that small input space into a larger one.

[1] https://news.ycombinator.com/item?id=9305504


The main problem (barring some fancy and probably patented new techniques) is that whatever gets sent over the wire from the client is a de facto cleartext password. If your system does the hashing client side and then stores that, an attacker with the login database can just submit the pre-hashed values.

To prevent this, you still need some transformation on the server between the login form and the database, which means you're back where you started.


There may be other downsides, but the one that immediately comes to mind is that some of your client's devices may well be extremely resource constrained. The obvious type of device that comes to mind is mobile phones but there's also set top boxes and many other non-full-blown-PC web clients around.

What ever work factor downtuning you need to do to conserve server resources may well be dwarfed by the downtuning you need to do to get bcrypt to run in a reasonable amount of time on a mid range smartphone.


Is this still true? I see some articles suggesting it from around the time that Android 2.1 or lower was a "mid range" smartphone.

It doesn't seem like it ought to be true. Running bcrypt in Ruby with default settings takes small fractions of a second on a few-years-old Macbook Pro. Even if the smartphone is 100x slower, it should be within the realm of reason to run a process that might take 1-2s once every thirty minutes or so.

Now, I'll buy that if your target device is not a mid-range smartphone, but "the very bottom of the smartphone market," bcrypt might be too expensive.


JS's bcrypt/scrypt performance will be pitiful. If it takes the client 1s in JS to bcrypt a password, a server will be able to do it in a 10th the time with a more efficient implementation. That makes client-side bcrypt have to use an artificially low difficulty factor due to the inefficient implementation language.


It's been said before in previous discussions, but have not seen it here: Just make your own 'password cracking cluster' to make sure you can handle the auth workload with ease.


Examples of sites where this is an issue? Serious question, where does scrypt/bcrypt overhead get to be important enough compared to the rest of the traffic the site has to handle?


This article says salting hashes is only useful against rainbow tables. That does not make sense to me. Say you have a password database with 10,000 users in it. You want to see if any of them used "Passw0rd$". Without salt, you compute the hash once and check it against all the users (perhaps with a bloom filter). With per-user salts, you have to compute the hash for each user.

I think salting offers pretty clear benefits against other attacks if you have more than one user.

Edit: The point I'm trying to make here is that it is N_USERS times better to have a unique salt per user, regardless of the hash. One should use bcrypt/scrypt/pbkdf2 with unique per-user salts.


A better argument for per-user, randomly-chosen-at-password-change-time salts is so that you can't determine that two users (currently or historically from a password history) are using the same password by comparing the hashes.

If you knew that a hash was used many times over in a system, then you'd try cracking that password first to get access to the most accounts.

With salting you don't get to prioritize, or compare to previous information trivially.


I don't think that's a "better argument". I think it's a second, also good, argument that I didn't repeat because it's been mentioned by several other people.

Your argument is about protecting trivial passwords. It's quite common for people to independently come up with the same crappy passwords - the adobe "crossword puzzle" leak was a particularly interesting example of that.

My argument is about protecting moderately good passwords. If you use, for example, PBKDF2 with a fixed salt and have a million users, an attacker can try close to a million times as many passwords vs random salts.

You can still prioritize cracking with salted passwords in many cases. Say you had a dump of hashes form hackernews, and they were bcrypted with random salts. You'd go after the admin accounts first, then the well known accounts with a lot of influence, then everybody else.


That's the exact same argument, phrased differently.


The salt is typically assumed to not be secret. With modern hardware recalculating the hash is not hard for "fast" hashes, regardless of salting. That's why choosing a slow hash is more important for this vector of attack than anything else.

Of course salting is better than no salting, but its orders of magnitude less important than choosing a good hashing mechanism. Luckily, the good hashing mechanisms do the salting for you so fixating on salting really just serves as a proxy for letting us know that you don't understand the attack vector.


Whether the salt is secret or not doesn't matter. If it's known that I salt passwords in the USERS database with the `uname` field, hashes for "workloginpa$$word" and "kasey_junkpa$$word" will look different, even though our passwords are the same. There's no easy way to compare passwords when they're salted, and that's the point to understand.

Yes, having a good hash in the first place is more important, but salting is right up there.


No, it's not, because choosing a proper KDF will make your passwords take years to crack, while choosing a standard digest with a salt will turn the time from fractions of a second to days.

The only thing that matters is whether or not your password database ends up on a message board somewhere, not about how you feel about hashing.


A proper KDF is more important if you are trying to break a specific user's password. It will simply take too long to try any reasonable number of attempts. However, in the event that you have a large database leak individual salts become just as important. When looking thousands of users, unsalted hashes allow attackers to start with the most common passwords used and compare them to the entire leaked database in one try.


Are there any proper KDFs that don't include a random salt?


this whole discussion is a red herring. Of course all current KDFs include salting as a form of key stretching. That's because no one designing them is writing off rainbow table attacks.


Yes, it is!

Since DDG does it quick, I'll use MD5 as an example, but don't ever use MD5 as your hash!

An attacker already knows a user's plaintext password is "pass123" and no salting has occurred. He can search for "32250170a0dca92d53ec9624f336ca24" in the database and find 5 other users that have the same hash. He now knows that those 5 people have the same password.

Meanwhile:

    Johnpass123: e3445c82086cff25a79dcbbe59b569d1  
    Mattpass123: fd6d563970fd6ead6391a997a5e06d80
Moral: It's not about how hard it is to crack. It's about comparisons of hashed values without cracking necessary.


I think you're severely underestimating the importance of salting. I agree that choosing the "correct" hashing algorithm is more important than salting, but not by magnitudes.

Even if you're using a "hard" hashing algorithm, you're significantly reducing the amount of computations an attacker would have to go through by forgoing salts. Especially considering that salting has very, very little overhead.


forgive me, I haven't worked with a good hashing mechanism before.

How can it "salt for you" without you providing the salt?

This question is operating on the presumption of a unique salt per user.


I was skipping some complexity. Something like b/scrypt are more than just hashing. They are key derivation functions, which look similar to hash functions on their inputs.

bcrypt for instance generates a random salt on the input and stores it in the output.

http://stackoverflow.com/questions/6832445/how-can-bcrypt-ha...


A lot of password hashing APIs have two calls - one where you provide the password (and possibly work factor) and it returns a hash that includes a randomly generated salt, and another where you pass in the hash (which includes the salt and work factor if any) and the password which returns true/false.


The attack you are describing would just be a rainbow table. That's exactly the point.


No, brute-force attacks are different from rainbow tables: http://en.wikipedia.org/wiki/Rainbow_table


But that's the point - you're not describing a brute-force attack, you're just describing a rainbow (/lookup) table. Precalculating hashes and then matching hashes from the dataset to them.

It's a different form, but it's pre-calculation and lookup nevertheless.


I don't understand this comment.

Are you somehow interpreting the stolen database as the rainbowtable somehow?

Are you saying that, when the attacker calculates the hash, it's analogous to looking that hash up in a rainbow table?

My understanding was ryan-c was describing a database that has a hash(global_salt+password). This concept of the global_salt was introduced specifically to baffle rainbow tables that attacked databases of hash(password)

ryan-c was describing how, if you have a database of salted hashes, the attacker can check every entry in that database against attempt, multiplying his efficacy by the number of people in that database.

To me, these are different threat vectors, rainbow-tables attacking hash(password), ryan-c describes attacking hash(global_salt+password), and to mitigate ryan-c's attack you must do hash(unique_user_salt+password)

can you explain to me how ryan-c's attack is analogs to a rainbow table?


The core of the attack that ryan-c is describing, is this:

> Say you have a password database with 10,000 users in it. You want to see if any of them used "Passw0rd$". Without salt, you compute the hash once and check it against all the users (perhaps with a bloom filter).

That is basically the same as a rainbow table attack. A conventional rainbow table attack looks like this:

for user in users: for hash in rainbowTable: (hash == user.hash)

The attack that ryan-c is describing looks like this:

for hash in rainbowTable: for user in users: (hash == user.hash)

They are still fundamentally the same concept.


A single hash per user is not any significant protection. As the article pointed out, the rig could compute 71000 bcrypt hashes per second, so even if you had 1M users, that would take less than 30s/password.


That calculation can't be done without knowing the bcrypt costs used to benchmark the rig and on the unknown password (I was pondering a similar calculation on another story and the only thing I found was a brief statement that pw crackers report performance for cost=5 (just by habit/convention).)


Fair enough, and it seems the same cost was used in this case! http://passwords12.at.ifi.uio.no/Jeremi_Gosney_Password_Crac...


Yes, I was thinking the same thing. I am really mystified by their saying this.


> I see the mantra ‘salt your password hashes!’ > repeated over and over, when its only real > value is against rainbow tables

This is dangerously incomplete reasoning. Salts real value is that if multiple password hashes are Leaked simultaneously then each must be cracked individually. In the Adobe leak password hints were included, which allowed for very easy guessing of passwords, and because there were no salts even if you had no hint if someone else had the same password and an obvious hint you were cracked.


I was about to make the same post.

If a database is popped and people know John123's password, and his hash is the same as Worklogin456's hash, then they know my password.


I am extremely ignorant about this topic. But I don't think that's how it works. Can someone else weigh in please.


unsalted:

username | sha256 hash john123 | ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f joe456 | ef92b778bafe771e89245b89ecbc08a44a4e166c06659911881f383d4473e94f

salted:

username | salt | sha256 hash john123 | ponies | 98d4c01710b2b775e41b0bf4984f0cec7621ae1dfc4304047b196af46ed589b2 joe456 | beards | 0f7c74279db5daa1f4de41cc7f6acc94651da754c41675dcde8dcf151d14f5bf

their passwords are all `password123`


Salt has one other important property: Salt makes grouping users by password more difficult.

Some subset of your users will inevitably use the same terrible password as another user. Or one user will create multiple accounts using the same strong password. These passwords should appear unique in storage.

You still need salt, and that salt must be unique for each password. That's the peril of PBKDF2 password storage: I've seen applications completely botch salting by using one for all users. Bcrypt and scrypt avoid this common error by salting internally.


Computing the hash of a known password is trivial, though—once you know one user used Passw0rd$, it's easy to check every user for that password, even if you have a salt, because the salt is not hidden. So you're just running n password checks in the number of users—admittedly, this is still n checks, but the difficulty of that pales in comparison to brute forcing passwords, and while it's strictly slower than just checking “is anyone else using exactly this”, it's not that much slower in the larger scheme of breaking a set of passwords.

At least that's my understanding... I am not a security expert! :)


Checking every user for a specific password is equivalent to a dictionary attack, using a single word. In fact, commonly found passwords from previous leaks are usually incorporated in such dictionaries.


That's a fair point, but an unsalted hash gives an adversary information before they brute force any password.

In the "user with multiple accounts but one strong password" scenario, for instance, grouping by password would highlight which users are inclined to use the same strong password with multiple accounts, perhaps on other web sites. Multiple accounts would give an adversary more identifiers to use when cross-referencing other leaked databases or the web.


I think this is why one is supposed to use random salts per password. That way, you can't use the successfully-guessed password's hash to assume other passwords with the same hash represent the same clear-text password.


I'd be interested to hear more about the approaches a hacker would take in each case if they came across the hashed passwords and have no knowledge of the salts, etc.

For instance in the first case where hash = md5('deliciously-salty-' + password)

I'm not sure how easy it is for an attacker to reverse the hash if they don't know the salt. (I understand you need to assume the attacker has the salt, but just posing a constrained hypothetical here)


It's equivalent of attacking md5('deliciously-salty-'), since the password can be known, e.g., by using the hash from their own account.

Attacking md5 is not really that hard, nowadays - the article mentions 180 billion tries/second.

Yes, it requires a rig with 25 GPUs, but this is almost the perfect use for the "cloud" - you can rent those for a few bucks just for a specific attack. There are even dedicated services nowadays, with no programming knowledge required - you just submit the hashes, and out come the found passwords.


Hmm I still don't understand, but thanks for replying to my (downvoted) honest question.

If all you have is a hash and know the password you still dont know if it's

md5sum('salt:' + password) or

md5sum(password + '-salt') or

md5sum('salt+' + password + '=salt')

etc

I suppose you'd need to brute force a large key space and look for any results which contain your known password. Does that sound right?


You want to assume that the attacker has the implementation details, but not the associated secret. Assume that the attacker has a stolen copy of the login database with usernames and hashed passwords, and a copy of the login code (pretend the attacker is a rogue developer or you're using an opensource web framework and anyone can view the login code on github). The only thing he doesn't know is the password to enter on the login page to get a hashed password that matches what's in the database.

So, the attacker knows which of those formats in your comment is being used. With no salts, the attacker can just start running md5 against every character string, and look for matches in your database. Adding salts means he has to do that for every character string appended to the specific salt, so he has way more work to do. But, md5 is still really fast, so that extra work is still feasible.

Adding scrypt/bcrypt/PBKDF2 means he has to use a slow hash function multiple times for each possible password he's trying. That means it takes significantly longer for the attacker to try a potential match, which is the goal.


That's true, but once you discover the salt pattern used, then you can crack a specific password and look for the resultant hash in the database. The idea is to use a random salt per password to close this attack vector. That way, even if you do guess the password and you have the salt, the resultant hash is only useful for that one password, even if the same clear-text password is used in other hashes.

Think about it: adding a known global salt to each password in the same manner before hashing is effectively the same (in the long run) as not salting. Once the attacker figures out the pattern you used, he can proceed to crack specific passwords, then search for the resultant hashes in the database.


What I don't understand (truly--I'm a curious neophyte here) is why a random number would be better than salting with, say, their username.

If the assumption is that the random number is known, per both guidelines that only the user-given secret can be assumed secret and that the article says it could conceivably be stored plaintext, isn't it effectively equivalent to just another username?


If two websites use this scheme and they both get hacked and a user has an account on both with the same username/password the attacker has to do half as much work. Another reason is an attacker can compute a rainbow table prior to getting access to the hashed passwords. This would reduce the amount of time the owners of the site would have to respond in the event of a breach. Finally, it's really easy and cheap to generate a random salt.


You can, but it prevents you from allowing changing the username (which might be fine, but why tie yourself?) and it provides less entropy - a username + password can be small enough to fit in a rainbow table, while a random salt can have whatever size you want.


Excellent explanation. I'll further this by adding that, in my use of the excellent PBKDF2.NET library, I generate a salt with the same number of bits that the hashing algorithm generates (e.g., 256-bit salt for SHA256). I can't remember where I read that this was a good practice, though. :(


Why not combine the two approaches? If you have a global salt stored either with your app on a separate server you make the attack require getting access to not only the database, but your global salt as well. A global salt on its own is clearly inferior, but at least to me combined with a unique per user salt should at least offer comparable if not better security.


The first two are equivalent to the third, if you assume that the salt can be '' in any of the cases.

You'd just need to generate (0 or more chars) + password + (0 or more chars).


If the salt is a constant, the attacker doesn't need to know it simply on the principle that they don't need access to the constant parts of your site, it's the dynamic functionality that they want access to. Knowing the salt in that case isn't theoretically any more difficult than knowing any other static part of the process.

Consider an alternative: randomly generate salts with a constant password. It's the same amount of information, and it is obvious that it does no good to ask for a password if there is only one password that everyone shares. All you need is the dynamic part, and you have access. The dynamic part is the security and the data, and everything else is infrastructure that can be known simply by probing.

This is just how information processing works. You have a static 'language' and a dynamic 'meaning', and if you don't know the language, you can't possibly get the meaning -- but this means you can't even use the site at all (even with a password.) So obviously the 'language' has to be public so it can be known by the people who wish to communicate with the site legitimately.


"The value of salts is limited to thwarting attacks leveraged pre calculated lookup tables such as rainbow tables."

This is absolute drivel.

Salts prevent checking an attempt against all the hashes in parallel.

That is to say if the attacker has a dump of 1 million salted hashes, they have 1 million times the work to do compared to 1 million unsalted hashes.


I am also suspicious of the claim that a fixed salt+password hash scheme is "just as insecure" as a no-salt password hash scheme. The salt (unknown to the attacker) has to add at least a few order of magnitudes more security just by turning "password123" into "somesalt65634734password123" right?

Just from the added password length alone it has to be "way more" secure, right? Would love some professional crypto opinion on this.


> The salt (unknown to the attacker)

The salt is generally known to the attacker. The point of the salt is that it’s unique for each password hash, making it impossible to run crack attempts on every password at the same time; for a password hash to be checked, the salt has to be retrieved. This means they’re usually stored together and therefore also compromised together; keeping them apart is rarely useful (kind of like peppers, which is what you might be thinking of; they’re per-application rather than per-hash, and their purpose is to be unknown to the attacker, which is equivalent to encrypting things with a per-application key).


If it's fixed it is limited to just preventing rainbow tables from being used in practice. If they have a copy of your database already ex-filtrated, there's a pretty good chance that they also have access to "somesalt65634734" so while the attacker certainly isn't going to be able to find a rainbow table with every password prefixed with "somesalt65634734" they certainly won't just check every combination of a 32 character password, they'll just check passwords 8 characters or less starting with "somesalt65634734" and if it's unique across your whole site and your site has N users then they just sped up their password cracking with a factor of N.

If N = 1000000 and it's just SHA("somesalt65634734" + password) then the attacker is just about guaranteed to be rolling in tons of cracked passwords in no time. Whereas with bcrypt or a unique salt per password you don't get that massive speedup by a factor of N.


Strictly speaking... it depends.

Lets assume the hash function being used is SHA2-256.

The input block size there is 512 bits (64 bytes).

So if adding the salt pushes the total length of the hash input past 512 bits then it doubles the runtime.

I do not see any particular advantage there over simply running the hash function twice.


I think in that case he's referring to having a global salt instead of a unique salt for every password, in which case he's right. However, that's kind of a silly statement because the example web framework that he was using doesn't just have some global salt and nothing else.


A global salt is not a salt.

Term people are using for a secret is pepper.

H(pepper | salt | password) type of thing.


Nice, but I'm also disappointed :-)

My term is vinegar which is IMHO better!


When you can do 60 billion attempts per second, 1 million is a pretty small number.

Edit: That said, they do still provide value, by making password collisions less obvious.


60 billion divided by 1 million is 6 thousand, and 6 thousand is a much smaller number than 60 billion.


Found the "unnamed blog" post giving the "terrible" salting advice: http://blog.codinghorror.com/rainbow-hash-cracking/


The title seems to be misleading. Nowhere in the article does the author explicitly state that salts should not be used, because well, they should be used. His point was that salts are not enough, you need to couple them with a possibly slowest, yet acceptable hash function. And then hope that Moore's law does not breach your passwords.


Yes. But he's also pointing out tons of the advice given, perhaps by people that are well-known and respected-by-some, is just wrong.

Iterative hashing with per-user salts has been a known pattern. Practical Cryptography covered it in the first edition. The relevant parts should be derivable for most engineers. There's no reason there should be any confusion or crazy blog posts.


The need for unique passwords brings me to a stupid idea that I've had for a while.

Imagine a service which just stores hashes of every password that it can find from everywhere. Any site that wants can check whether a given password is in use..somewhere..and reject it if it is. Any site that wants can add more passwords to the list.

Internally the service would work from a bloom table designed for a fixed false positive rate. (You can actually layer bloom tables in front of bloom tables with parameters to make this possible.) So even if someone compromises that site, there is no direct record of the actual hashes for passwords.

Basically let sites say, "You have to choose a unique password, and do not reuse a password that anyone has used elsewhere!"


This would only work if you could achieve this search with the use of a zero-knowledge proof. Actually disclosing your password to the service would defeat the point.


Yes, this is a weakness. The password would need to be disclosed in the form of a hash of the password, and the hash algorithm has to match what everyone else does.

If the service and your site were both compromised, then it would be possible to match up incoming hashed passwords to users by timestamp. But even so, the fact that no passwords were reused will hopefully make cracking the hashes harder.

So..probably a bad idea. :-(



Compromises are not points in time, they tend to be intervals. A long enough interval could do more harm than good here.


Lastpass has a password-auditing feature that checks your passwords against a list of known-leaked common passwords.


I've been saying "scrypt, bcrypt, PBKDF2, or bust" for years.

Also, there's a large industry trend of, "let's 'pepper' our hashes too!" whereby they have a sitewide salt. This is a mistake. If you want to add a site-specific layer of security to your passwords, encrypting them makes far better sense than adding to the salt.

https://github.com/lizardhq/php_bcrypt_aes

http://blog.ircmaxell.com/2015/03/security-issue-combining-b...


It's not just the scheme that's important, it's also the implementation of it that's important. If I'm doing the smart thing and not rolling my own crypto, can I trust the existing implementation that I've chosen? It's hard to say. Does it have secure defaults or do I need to choose and why do I need to choose, why aren't the defaults secure? I'm not a crypto expert, I'm not qualified to judge and therefore I don't even care (as such) - why aren't the defaults secure?

For example, I'm writing an app that requires a password setup in Python so I was investigating this issue:

1. Passlib - https://pythonhosted.org/passlib/ - "implementations of over 30 password hashing algorithms" - What? I'm not qualified to judge what's secure so I don't even care, I just want something secure!

2. python bcrypt - https://pypi.python.org/pypi/bcrypt/1.0.1 - OK standard bcrypt, supposedly that's secure - "Author: Donald Stufft" - who is Donald Stufft? Is the code good? I don't know C and I'm not qualified to judge!

3. python scrypt - https://pypi.python.org/pypi/scrypt/ - OK it's Colin Percival's code, it's probably good, "Author: Magnus Hallin" - did Magnus Hallin screw it up though? "Burstaholic on Bitbucket provided the necessary changes to make the library build on Windows." Eh. Did "Burstaholic" ruin the crypto? Who knows.

Also, in my case I needed something cross-platform, primarily for Windows. bcrypt worked out of the box. The scrypt library required building on Windows with OpenSSL development headers or whatever and all that - I just said "pass".

Nacl is fantastic and simple to use, it can't get much easier than this: https://libnacl.readthedocs.org/en/latest/topics/public.html

Something similar should exist for passwords - I just want to call compare(password, storedhash) or whatever and be done with it, I'm not qualified to judge the crypto. And it should be cross-platform.


This is why I love the (modern) php approach. It's in the standard library and you don't need to supply an algorithm name or salt or anything. It is also designed so that a future version of php can have upgraded algorithms without breaking current code. It is trivially easy to write code that upgrades old hashes when the user logins in.

http://php.net/manual/en/function.password-needs-rehash.php


This is missing something pretty important about salts. They hide the fact that two users have the same password. That's a huge security risk if that correlation is leaked.

Correct me if this is mentioned in the article, but based on several mentions of how salts were effectively useless, I think this was overlooked.


I didn't get the impression the author was saying salts are useless, obviously you still have to use them and bcrypt and scrypt both do. The point is that a lot of programmers naively assume that a salt (or salt+pepper) will make their password storage secure on its own when that isn't really true anymore.


I fail to understand why salting is bad. Seems like he's saying it's not a cureall. We knew that. But it's at least marginally better than not salting and it's easy, so why the hell not?


Yeah, salting definitely isn't bad, in fact it's actually essential. What the author is saying is that it's not enough - salting won't save you when your attacker can test 10s or 100s of thousands of hashes a sec. The article could have been phrased better.


What he's saying is that a salt should be a no-brainer. FTA: "Storing passwords as unsalted hashes is a grievous mistake of course, and breaches disclosing poorly stored passwords are still too common."

He's simply saying that a salt isn't enough to keep your passwords secure - you should also be using a slow hashing / memory intensive hashing function.


Nobody's saying salting is bad, the blog is saying that salting is a given, if it's not you don't belong anywhere near passwords.


I've recently found out about Diceware and the importance of a strong password [1][2]. Often users choose a really weak password to begin with. I've read [3] that if you have a strong password (like a 7 word Diceware password) even MD4 or MD5 would suffice :O. I'm really curious if this is true.

[1] http://world.std.com/~reinhold/diceware.html

[2] https://firstlook.org/theintercept/2015/03/26/passphrases-ca...

[3] https://github.com/freedomofpress/securedrop/issues/180#issu...

ps. for those that use bcrypt and want to ensure a constant user experience on their server see https://www.npmjs.com/package/bcrypt-speed


while we're at it and throwing stuff around as if its an easy fix: lets stop using passwords. Use keys. Use a trust model on these keys. So many advantages.. you don't receive the cleartext password no more (because really I DONT CARE if you use triplezscrypt.. SINCE YOU GET MY PASSWORD IN CLEAR TEXT during authentication.. its not like if compromises were "on the database". Its always at the application level.)

Beside it makes rotation easier, and not having to remember multiple passwords, or having a password manager, etc. a thing.

Oh wait ;) Google and others are actually attempting to make this work with U2F:

https://support.google.com/accounts/answer/6103523?hl=en (yes it works with gmail. Right. Now.)

https://www.yubico.com/applications/fido/


The points made here are covered in Coda Hale's post: http://codahale.com/how-to-safely-store-a-password/

While it recommends using bcrypt, if you augment that with "or scrypt or PBKDF2" the same principles apply.


There is one thing that I think is often omitted. A lot of user use trivial passwords and they reuse them. These users' passwords will fall essentially no matter what you do because making the check so slow that a dictionary attack with a couple thousand very common passwords becomes impractical is itself borderline impractical. On the other hand strong passwords will withstand an attack even if they are hashed with plain MD5 or SHA1 - at least right now, there are already some serious collision attacks.

The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes. So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.


> There is one thing that I think is often omitted. A lot of user use trivial passwords and they reuse them. These users' passwords will fall essentially no matter what you do because making the check so slow that a dictionary attack with a couple thousand very common passwords becomes impractical is itself borderline impractical.

Sure. This is something I touched on very briefly near the end of the post. Password re-use is a huge problem and could easily be an blogpost of its own.

> On the other hand strong passwords will withstand an attack even if they are hashed with plain MD5 or SHA1 - at least right now, there are already some serious collision attacks.

For some value of 'strong' sure. If you're password is long enough, random enough, and of a broad character set then yes, an md5() of that may well be beyond the reach of most attackers. But that's not the sort of password most people, even security conscious people, choose. A preimage against MD5 would quickly change this too, and that's a risky gamble.

> The user group really benefiting from strong password hashing schemes is the one between the two mentioned extremes.

The best we can do is make attacker costs a function of the password that the user has chosen. If someone makes their password 'password' there's no degree of magic that's going to help. Mostly, the benefit comes if/when disclosure occurs and there's an active incident response effort.

> So if you really want to protect all your users and use simple password authentication you have to enforce strong passwords policies which is itself not trivial and will definitely badly annoy some users.

It's not a zero sum game. When a breach occurs, the rough points in time are something like:

1) Attacker compromises digests (let's just assume all are obtained in an instant, rather than exfiltrated over time) 2) Attacker recovers 1 password 3) Attacker recovers n passwords 4) Attacker recovers all passwords

Our goal isn't to eliminate these steps except hopefully the extreme of 4. Rather, it's to increase the time between 2 and 4, and make sure 3 is closer to 2 than 4.

On the other side of the coin, there's a few points in time:

a) Victim org. realizes digests have been compromised. b) Some credentials valid at 1 are made invalid. c) All credentials valid at 1 are made invalid.

If you accept my previous point that at some extreme there's nothing that can be done to prevent brute force, then you should also see that storing passwords in this fashion isn't meant to be a perfect defense. By minimizing the number of compromised accounts at a given point in time, we can maximize the benefits of our response. We can notify users and force resets, invalidating some set of credentials before they're recovered. We want b and c to be closer in time to 2 than 4, and minimize the value of n when we get there.


Since the author didn't bother to mention any threat models, we can assume that the threat is the most common one, SQL Injection.

You don't need SQL Injection if you can directly read the database file, or can operate as the web server user, therefore if you're doing SQL Injection you don't have those privileges.

If you can HMAC or encrypt passwords (prior to hashing them) in the database with a key only on the web server, then that's an extra level of protection. You can't get that key with SQL Injection alone.

https://blog.mozilla.org/webdev/2012/06/08/lets-talk-about-p...

password_hash(base64_encode(hash_hmac("sha512", $password, $key, true)), PASSWORD_BCRYPT)


> we can assume that the threat is the most common one

No we can't.


How about not reusing passwords and always using a strong password? SHA256 alone is enough, when you just do that. And so what if someone got my password. If they had already access to the system, they could have also stolen the data they were after anyway. Today's password: .P`éA@O?/^2HNVSé%@ÖY it contains more than 256 bits of entropy. Based on this SHA256 collision attacks and cracking AES128 should be trivial. I personally consider passwords to be unique shared secret blobs / service.


Password hashing is crypto, too. Don't roll your own crypto.


I agree with this advice, even though I'm probably the worst offender. I have:

    - Rolled my own password library (linked elsewhere in this thread)
    - Rewritten an existing AES-CBC + HMAC-SHA-256 library to use a different 
      underlying driver (mcrypt is basically a mummy at this point)
    - Did both of these things in PHP, which a lot of people assume
      is inherently incompatible with security
However, everything I've done in this vein has been passed before people who are far smarter than myself, to which I received no complaints. And I constantly seek out criticism from people who understand crypto too. (Aside: Most crypto people are dicks. But they mean well.)


One question: _Why?_

However many people have reviewed your code, far more people have reviewed libraries like bcrypt!


My code uses bcrypt. It's a wrapper around password_hash() and it also passes it through an authenticated encryption library.

When I said I wrote my own crypto code, I don't mean like something that belonged in a PHC entry.


Jeff's and Thomas' posts are totally on the money. We wrote a piece as well, specifically to talk about how encryption and hashing are not the same thing. You'd be surprised how often people screw that up.

https://www.tinfoilsecurity.com/blog/store-data-securely-enc...


If you are using .NET, you will be glad to know that .NET framework has a standard implementation of PBKDF2, one of the recommended choices in the article.

  var hash = new System.Security.Cryptography.Rfc2898DeriveBytes(password,
    salt, workfactor).GetBytes(32);
Using a NIST-approved function that's continuously supported by a large player seems like a good choice.


The implementation iterates HMACSHA1, which is fine, of course, but which raised concerns on at least one audit I went through because, well, why would anybody want to use SHA1 when SHA2 exists?

Wound up having to implement my own version of PBKDF2 that lets you specify the HMAC algorithm.


And, as always - these discussions about Salts/bcrypt/scrypt are only relevant to people where passwords/security are important, but not critical.

For anybody who has critical security needs, just use an HSM already. Sophos/Utimaco makes a nice one, and it is a nice foundation for password security.


How about using a tested login library instead and consider using OAuth as well (e.g. Google account to login)?

Rolling your own login system, dealing with hash functions and checking passwords yourself is far too low level and dangerous for most apps (as this thread demonstrates).


Urghh... I hate these articles that list bad examples of salt use and then doesn't explain the reasoning why exactly these are bad. Just trying to prove to the world how smart the author is. Maybe he doesn't really know how to explain it?


Does this mean we should drop the salts, simply scrypt and be done with it?

Or should we still include salts in addition to scrypt?

If yes, only one global or many user specific or both?


Use bcry—ah, fuck it.


Why are we still using passwords in 2015?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: