Hacker News new | past | comments | ask | show | jobs | submit login
Storing Passwords Securely (throwingfire.com)
250 points by pw on June 6, 2012 | hide | past | favorite | 133 comments



I know I won't be able to easily convince anyone of this, but I thought I'd mention...

Colin Percival's "scrypt" password hash is:

1) production-ready (and has been for a long time)

2) superior to bcrypt, as it is designed to be expensive in both CPU and memory (hence, scrypt is "memory-hard", whereas bcrypt is not)

I don't have time to go into further detail. I encourage you to check it out. It's quite simply "the future of password hashing". (Bcrypt will be defeated by natural advances in multi-core hardware; scrypt won't ever be.)

Passwords hashed with scrypt with sufficiently-high strength values (there are 3 tweakable input numbers) are fundamentally impervious to being cracked. I use the word "fundamental" in the literal sense, here; even if you had the resources of a large country, you would not be able to design any hardware (whether it be GPU hardware, custom-designed hardware, or otherwise) which could crack these hashes. Ever. (For sufficiently-small definitions of "ever". At the very least "within your lifetime"; probably far longer.)


One thing to be careful is that you don't open yourself to a trival DOS. The default scrypt command line program will churn CPU for a full second when encrypting a file. If you use similar settings, it will not take an attacker many connections to reduce even a large cluster to tears.


The usual concern with scrypt is that it has less research behind it. The bcrypt hash function has gone through over twelve years of attempts without being broken in the cryptographic sense. The scrypt hash has only around three years of the same.

I would expect that within 5-10 years, scrypt will be the normal suggestion. It really is much better on the fundamentals, so much so that some experts recommend it even in spite of its relative newness. For the moment, though, there are arguments to be made either way.


You are going to have a hard time finding a professional cryptographer who will look at scrypt and say that it's risky. These are KDFs, not message authentication codes; they aren't risky constructions.

If you have no other choice but to iterate SHA1 many thousands of times, that's still better than what most apps do, and in the grand scheme of things almost "ok".

scrypt is fine. If you have a library that supports it, go ahead and use it.


> If you have no other choice but to iterate SHA1 many thousands of times, that's still better than what most apps do, and in the grand scheme of things almost "ok".

While I won't recommend people simply iterating a hash many times, I also won't slap people down for it anymore. In the grand scheme of things, there are a billion more likely routes of attack than someone breaking your 20000-rounds-of-SHA1 hashes.


Out of curiosity, what do you recommend people use?


If being able to do things with off-the-shelf tools is important, I'd recommend PBKDF2. It can be implemented in a couple dozen lines of code over one of the SHA2 functions, as opposed to Bcrypt which is a completely novel hash function that isn't in most languages' standard libraries.


Much as I like bcrypt/scrypt, I generally just recommend PBKDF2 with a large number of rounds. It's not perfect, but it's more than good enough.


This is true. In particular, a KDF is in no danger from preimage or collision attacks, which are generally the types that show up.


There are no known good implementation of scrypt in php (http://stackoverflow.com/questions/10149554/are-there-any-ph...) So I don't know how "production ready" that is considering php is the most popular platform for the web as much as I hate the language.


Even opting for bcrypt can be sketchy. Native support only appeared a few years ago, older versions rely on the OS - many of which didn't support blowfish without a patch. Fine if you control the server.


Since most people who face a choice between bcrypt and scrypt will be the ones who are developing new sites or considering a major upgrade to an old site, I don't think they'll care about compatibility with anything older than PHP 5.2. Even if you don't control the server, it's difficult to find a reputable web host nowadays that still runs anything older than PHP 5.2, and your average cPanel host is most likely to support a wide range of crypto functions.


It's not a very good idea to implement cryptography in a high level language such as PHP. Using a high level language opens a whole new world of pain when it comes to side channel attacks. Timing attacks, cache timing attacks and branch predictor attacks are much easier to protect yourself against if you write your crypto algorithms in C or, preferably, Assembly.


C module?


Writing your crypto with C and then calling it from another language such as PHP or Python is fine.


Fyi, scrypt is in the Debian testing and unstable repos [1], and Ubuntu's universe repo [2]. Not sure about the RPM ecosystem, but I'm sure it's available there too. And there are a bunch of bindings and libs in various languages on github as well [3]. And there is always the source [4].

1. http://packages.debian.org/search?keywords=scrypt

2. http://packages.ubuntu.com/search?keywords=scrypt

3. https://github.com/search?q=scrypt&type=Repositories

4. http://www.tarsnap.com/scrypt.html


FWIW I included scrypt :)

I personally agree that "Use bcrypt." should become "Use scrypt." soon. My main gripe is that there is far less library support for it, at least for now.


One issue that I have with scrypt and bcrypt for java is that there is no "official" support for the algorithms in java.

When looking at jBCrypt for instance, they're only at version 0.3 with no updates since 2010, makes me really nervous of using it.


How many bugfixes do you think a hash function needs? Just use jBCrypt.


If you try to use the spec'ed 32 rounds the code borks. I fixed it in my own version of jBCrypt.


Supply your version to the jbcrypt website, or host it! If you want to put it up for FOSS.


I think I answered my own question, but I'll ask anyway - why does memory intensiveness matter?


For a "worst-case" attacker -- one who can fab his own ASICs -- cost is proportional to the area of silicon used. RAM takes up die area, and it's something commodity systems have plenty of.

Basically it's a matter of "what do we have which they would have to pay extra for".


GPUs (and similar hardware) have a very limited amount of memory per core. If computing a hash requires 100MB of memory, then even a 1GB graphics card can only execute 10 cracks in parallel, not hundreds or thousands.


Or the hundreds of millions.


Processors get faster and faster, so we increase the work factor to compensate and keep it a time-hard problem. This has no real restriction on memory though, so the amount of memory only has to grow linear to the amount of time.

Making it memory-hard makes it also memory hard.


> Making it memory-hard makes it also memory hard.

That's a tautology, unless the hyphen means something different.


Looking into what Scrypt is? Thanks for the info.


The article says to give each password its own unique salt and then store the salt (as well as the salted hash) in the database.

This seems like a bad idea to me. If a hacker gets access to the salted passwords, in this case he'll probably figure out how to get access to the salts too.

I figure if the salt is stored in the code (or a config file...) rather than the database itself, at least it's two different hacks to get (1) the salted hashes, and (2) the salt.

Am I misunderstanding?


> Am I misunderstanding?

Aye. There are two points to the salt:

1. Avoid precomputed hash attacks ("rainbow table") where the attacker has a big list of hashes:password, and can just walk the table of (leaked) password hashes to get the cleartext. A global salt is sufficient for that, and where it's stored does not matter (can be a config file or a config table or whatever)

2. Avoid the attacker being able to brute-force the whole collection at once, there each password needs its own salt: the attacker needs a pair of (salt, hash) to be able to brute-force each and every password, it can't just compute a million (salted) hashes and cross-check all the table, it has to do so for each and every password it wants to crack. This requires a unique salt per password/hash, and the salt can just be stored with the hash.

Salts are not secrets, they just exist to make the hashing of a given user's password unique. They are generally returned as part of the hash function's result (alongside the number of rounds, so the result has the shape (cost, salt, hash)), it's understood and expected that the attacker knows them: it does not matter to their purpose.


Salts are not secret. They make you run unique calculations per attacked password and do absolutely nothing else.

You might feel like adding a hidden component but it won't noticeably help security and it's not a salt.


Furthermore, it's important that the salt be unique per password. If you had a single common salt in your code, then two users with the same password would produce the same hash.


Every salt example I've run across until now used the same salt for every password in the database. Obviously I can disregard those examples!

Thanks for bringing this up--the concept of salt makes a lot more sense to me now.


Using the same salt for every password in the database protects against an attacker using a pre-computed dictionary ("rainbow tables" are something like this) to attack your stored hash values. Using (and storing) a separate salt for each password protects against that and ALSO protects against the attacker building a dictionary of common passwords with a certain salt and then using that dictionary to crack ALL the passwords in the database at once.

So one-salt-per-user is clearly better, but one-salt-for-all is still better than no-salt-at-all.


Ya, that's silly.

A common and simple password setup is hash(username+password) or hash(private_account_attribute+password).

Also, use bcrypt/scrypt/similar.


> Ya, that's silly.

Not exactly. Having a salt at all mitigates rainbow table attacks, having a unique salt per password also limits brute-force attacks (you can't brute-force the whole table trying out cleartexts with a given salt, you have to brute-force each password individually).

Having a global unique salt is already better than no salt at all.


I have heard of doing both, though I have not seen it yet in production. It gives the benefit of having to get the code and the database while also requiring dictionaries to be build per password.


I think the considered opinion of "the people in the know" is that getting the code is just not that hard, and the global secret technique offers more illusory security than real security.


This is only a problem in the sense that once one hash has been cracked the so is the other, however most authentication is done using some user/email attribute also.

Usually we fetch user/email, hash password, compare to hash in database - authenticate if match or deny if not.


So, no changing of email without changing the password then?


No, just require that the password is entered to change the user's email address. Then a new hash can be generated and stored.


No it's not a bad idea and essentially it's what BCrypt does.

Think about it. If you use the same salt for all passwords then I can easily create a rainbow table consisting of "keyword" + salt hashes and doing so I can crack multiple passwords.

However if there is a different salt for each password then this type of attack becomes more difficult as I have to essentially do the same amount of work for only a single password.

Finally, we need to store the salt in the database because it's needed in order to recreate the hash using the user's password for authentication.


Further, the same salt for every password opens the door a class of rainbow type attacks that precompute the rounds of the hash containing the salt, then computing the remaining rounds for each list in the dictionary of passwords. When done well, this can reduce huge piles of computational effort required to scan a dictionary or passwords. This same concept is also why hash(password+salt) is really weak, you can precompute a lot of the rounds of a lot of different paswords and then go from there with the known salt.

There are deeper attacks too, but the above are just what can be derived from a bit of reading up on how the SHA family of hashes works.

For a closely related problem/solution, reading up on HMAC is quite enlightening.


> If you use the same salt for all passwords then I can easily create a rainbow table consisting of "keyword" + salt hashes and doing so I can crack multiple passwords.

That's not a rainbow table, a rainbow table is a list of precomputed hashes (which you'd just do once on a cluster you pay — of buy from a guy who did — and then store/stash)

But you're brute-forcing the whole table (you can look for matches after each computation) instead of having to individually brute-force each password. But you wouldn't keep the forced salted hashes around since there's no need for them once you've tested the leaked hashes.


Yeah I should have stuck that in quotes. I was trying to keep the explanation as simple as possible.

You are right of course there is no need to keep hashes about once they have been compared against the db.


Upvoting you for asking the question.

Too often people assume one way or the other (and come up with their own hare-brained password encryption scheme, defending it from all-comers).


You are. The salt is just to stop the use of rainbow tables, which are pre-generated maps between plaintext passwords and their hashes.

Anyway, you want to store a new salt (not just a system-wide 'this is my salt' salt) for each stored password anyway, so you will need to store that data somewhere. You could obfuscate a little by storing the salts elsewhere, but it seems a little extreme.


Yup. The main purpose of salt (aside from increasing entropy by increasing the length and complexity of the hashed value) is that it prevents rainbow table attacks where an attacker pre-computes (or downloads precomputed) hashes for common passwords, dictionary words, and brute-force style variations of same.

A hash unique to the site would require the attacker to create a site-specific rainbow table, but once created it can be used for all passwords. Having a unique hash per password means that the attacker would have to generate unique password tables per user, which for a suitable salt & algo is impractical, even if (as they normally are) the salts are stored with the passwords.


I was always under the impression that the main point of salting a password before hashing (unique or not, stored in cleartext or otherwise) was to prevent the hash from being referenced against a rainbow table of precomputed hashes. Obviously this is in the case of using md5/sha1 for your hashing.

As an example, find the md5 of a dictionary word and google it, its original value is bound to be in the first or second result. Now find the md5 of that word and a random string (aka a salt)... google and there shouldn't be any results. and even if there was due to a collision, that password wouldn't work, because it would be resalted prior to the comparison happening.


The individual salts are added only to prevent an attack with a list of pre-calculated hashes. You could also store another common salt ("pepper"?) in the code.


This might be useful for you or anyone else with the same question: http://stackoverflow.com/questions/2583203/salt-passwords-an...


> Am I misunderstanding?

Hi Phil -

Yes. Salts are to defeat things like this: http://en.wikipedia.org/wiki/Rainbow_table


for every user the salt should be different. there is no way it can be kept in a config file.


SRP note was very nice: http://en.wikipedia.org/wiki/Secure_Remote_Password_protocol

On a more esoteric note: If you are looking to resist quantum algorithms attacks, there are post-quantum algorithms for that[1] (they are computed on normal machines, but the problems behind the crypstosystems are hard to solve even for quantum computers).

[1] http://crypto.stackexchange.com/questions/494/what-is-the-po...


SRP also has tunable work-factor knobs, although they aren't as explicit as the ones in bcrypt, scrypt, or PBKDF2. But I'd strongly recommend against re-implementing SRP, since it's treacherous to get right.


Of the implementations listed in the WP article, which, if any, are suitable for general purpose use? And what manner of implementation screwups are most likely|hazardous?

[With the obvious caveat: Advice not for production use, If you have to ask...]


If you're going to use an SRP library, there's an open-source Stanford SRP library (most commercial systems derive from it) which deals with the obvious attacks.


If you're worried about quantum computers when thinking some website's security, you're either doing it wrong, or living in the future.

Post-quantum is cool alright, but it's not really a thing that implementers need to worry about.


If you're storing passwords as MD5/SHA hashes, how difficult is it be to switch over to bcrypt? I've never had to do this, but I would imagine it would be somewhat trivial. With all of the password leaks that have happened over the past few years, I'd imagine a good amount of developers are aware that storing passwords as MD5/SHA hashes is somewhat risky, so I can't understand why big websites (LinkedIn) are still doing it.


if your userbase is fairly active, you could migrate passwords the next time they login (and store a flag indicating the password "version"), since you'd have the plaintext version during authentication time.

alternatively, you could bcrypt all hashes now, and anytime you authenticate, making sure to MD5/SHA hash the plaintext password before checking the password using bcrypt.

legacy code and especially authentication code that has huge exposure (code path hit during every login and potentially every session auth) is difficult/risky to change once deployed. making things "more secure" has always been a hard sell to management... until a disaster like this happens!


In Django, as I recall, you just check for a hashing indicator that's prefixed to the hashed password, and do something like this on a user's log-in:

    if hashed_password.startswith("sha$"):
        hashed_password = bcrypt(hashed_password)
(or `... = "bc$" + bcrypt(hashed_password)`. However it's done.)

Here is the relevant code for django-bcrypt: https://github.com/dwaiter/django-bcrypt/blob/master/django_....

In your case, you could probably do this:

    if not hashed_password.startswith("bc$")\
       and sha(entered_password) == hashed_password:
        hashed_password = "bc$" + bcrypt(entered_password)
You don't have the prefix identifier, but that's okay; you just roll out an equivalent now instead, so you only have to check the start of the hash string and do the conversion, if it hasn't already been performed.

Of course, you have to account for the prefix identifier when validating an entered password against the stored hash.

YMMV.


On a quick revision, the first code should read

   hashed_password = bcrypt(entered_password)
Not `bcrypt(hashed_password)`.


We did it on Clojars, the Clojure library source, without much trouble:

https://github.com/ato/clojars-web/compare/68872652fc427cc1....

We had a month or two grace period in which anyone logging in would have their password upgraded to bcrypt automatically, then wiped the SHA1s.

https://groups.google.com/forum/#!msg/clojure/Xg1I0rgt85s/Vf...


Could you use e.g. bcrypt(SHA1(x))? I think that should be OK, and avoids the issues with switching over current users (just take the bcrypt of the currently stored passwords).


It is always possible to apply additional hashes to the MD5/SHA. First strip away the salt, then apply bcrypt or scrypt, next store both the new salt and the old salt plus the new hash. Validating passwords will require two steps. First, hashing the entered password with old salt, then applying bcrypt one more time.


I liked the article and it was a nice little afternoon read, but the whole thing could have been condensed to "use bcrypt for passwords".

I'm not really sure where to stand on this. On one hand, we have PLENTY of security articles stating the same thing (bcrypt, bcrypt, and just in case you've forgotten... bcrypt), which leads to an observed over saturation of the same subject matter. On the other hand, we have a huge company like LinkedIn that doesn't have the presence of mind to use something other than vanilla SHA-1. Maybe there's just too much lazyness/ stupidity in the world to require a constant barrage of the same security articles every week.


Disagree. The point of this article as I understood it was not to provide short advice, but to provide a comprehensive explanation of the problem and why the 3 listed solutions are good ones.

People often ask "why use bcrypt", and the response is that they should google it. If you look through the first page of google results for [why use bcrypt], though, none have a good discussion of the reasonable alternatives, or when you might want to use one or the other.

Coda Hale's post has a pretty good explanation of why bcrypt is good, but I personally find this to be more in-depth.


I imagine things like this is a result of the account management part of LinkedIn being built years ago when we thought plain-jane sha1 was ok to use, and then they just never got around to using bcrypt.

Ironically, the algorithm to upgrade to bcrypt is simple. Add a flag to the account table if they've upgraded or not. Next time the user signs in successfully, re-hash their password with bcrypt, toggle the flag, and update the password_hash value in the database.


I agree there are a ton of articles saying "Use bcrypt." After Coda's post (http://codahale.com/how-to-safely-store-a-password/) it's almost become a meme. I don't, however, think that the people who say "Use bcrypt!" tend to explain why they say that.

I think the reason that this happens so often is that regular developers just don't care. But that's because they don't know why they should care. Given a proper explanation (and an attention span longer than "Squirrel!"), any reasonable developers would (at least, should) care.


Indeed. There's a developer bubble around Hacker News and websites like Stack Overflow, where topics like bcrypt have become second nature. Whereas standard developers, i.e. those that program solely as a job, they don't inhabit these places.


How about this: Don't store passwords at all.

There are a multitude of sophisticated third-party solutions to authentication. Facebook, Twitter, and Google all offer competent solutions. Don't like those? Use BrowserID.

Integrating any of these is actually quite a bit easier than rolling your own solution. It reduces hack risk, provides a better experience for your customers (what was my password again?), and almost certainly will be more reliable than your website.


OpenID and OAuth really did a lot, but there's just nothing called "don't use passwords." Fingerprint readers suck. Anything biometric that doesn't suck costs too much, and 99% of people don't have them. A good KDF is not bad in comparison to a centralized authentication server considering other factors.

Someone, somewhere will be storing user passwords/digests for the foreseeable future. And they will do it incorrectly.


Sure, but the number of those people should become vanishingly small over time.

HN is full of web developers rolling unnecessary username/password solutions. The fact that this is such a hot issue - as opposed to esoterica like TCP frame size - shows that far too many developers are homebrewing solutions rather than outsourcing.


I agree, but "outsourcing" includes using libraries written by people who know what they're doing. (And not using libraries written by people who know what they're doing, but which are the wrong tools for the job.)


This naively assumes that your entire userbase uses those services and would like to attribute their Google (et al) account with your service. This may not always be the case.

Someone has to store the passwords, it would be good if there was a way you could be assured your data at rest was safe.


If you believe your userbase will object to Google/Twitter/FB, use BrowserID.

Personally, I never feel particularly secure when typing passwords into text boxes on random PHP forms. On the other hand, I feel fairly confident that the folks at FB, Google, Twitter, and Mozilla know how to store a password and secure their infrastructure.


On the other other hand, I don't really like the idea of 1) tying all my accounts to Google (what happens when I want to get rid of my Google account? Or my Google account gets locked?) or 2) having Google / FB be aware of other websites I log into,


BrowserID specifically addresses these concerns.


What's the browser support level of BrowserID at this point? How does BrowserID transfer between the different browsers I use? Can I log in using BrowserID on my phone?


It's good. I use it (dual-auth with FB) in a production system (https://www.voo.st/) and it works great from every browser I've tried so far, including iOS and Android.

The biggest UX issue is currently the up-front email roundtrip for new accounts. In the long run this experience will improve considerably when primary identity providers support the protocol directly. In the short run it's still not bad.


I used to hear some controversy with regards to "stretching." The argument back in the day was, "it's partially security through obscurity, but the danger is that there isn't research to prove that a hash of a hash is cryptographically strong."

So is there research that proves that hashing a hash of a hash of a hash (x100000) doesn't result in a smaller range of values than a single hash for SHA algorithms? Is there no such convergence?


Stretching isn't "security through obscurity". It's "security through increasing the attacker's cost by a huge amount while increasing your own cost by a minimal amount".

But don't use stretched SHA1. Use bcrypt or scrypt or PBKDF2, all of which explicitly address this particular concern.


An even better way of securely storing your passwords would be to mix them around on entry to your bcrypt hash function in a unique way that makes it impossible to brute force your leaked password hashes without having access to the code that did them.


So something like a HMAC digest generated using a pepper stored in the source code/binary or on disk before passing it to bcrypt/scrypt? :)

This only really protects against SQL injection attacks, though/when there is actually a separation between where you store the bcrypt digests and where you store the pepper. (Granted, there are a lot of SQL injection attacks.)


Exactly, most commonly with these things is that the db was dumped which does not imply that the source code was accessed. If the source code was accessed they normally don't need a db dump. (unless it was read-only)

The first section of the article IMHO was not needed in regards to a simple hash. Forums have been hashing their passwords with salts for how long now ?


Sure, but I try to not make any assumptions without being boring. I think the goto-link at the beginning works fairly well.


If you happen to have a web app that stores passwords in clear text or SHA-1 hashed, all is not lost. You can apply further secure hashes to the existing value stored on db and update your authentication validator.


Or do the easy thing and perform a chinese fire-drill whenever the user logs in and bump them to the new encryption schema.

Or invalidate and send an email.


Interestingly, when reddit upgraded to bcrypt, they refused to do this because so many users don't know their passwords and it would lock lots of people out of their accounts forever (remember, reddit doesn't require email addresses to register).

http://reddit.com/r/changelog/comments/lj0cb/reddit_change_p...


Good on them for knowing their users?


Excellent reading - I spent countless hours researching all this stuff for a project few years back. After my research I came up with very similar protocol than SRP but I find that SRP is a nice POC for both protocols.

One thing I didn't find solution for is keyloggers and other similar attacks. If you look at the whole securing your service as a whole, you have to acknowledge the risk of keyloggers also. Now with Flame, Stuxnet and all the other nice things still in the shadows keyloggers can suddenly become also a risk in a large scale.


I was just testing with www.Leakedin.com for a possible breakup of my LinkedIn password. I gave my password and said it is leaked. But when I placed a random value, like dsfsfgfdsgsd it said hoorah!, not leaked. So, the onus is on the users, who don't want their passwords to be leaked, rather than depend upon someone to keep it safe.


OK, so I get the message. Use bcrypt. Don't worry, that's what I'll do in production.

On the other hand, if it's so hard to roll your own, can somebody point out the security flaws in the given Python function? Seems pretty straightforward to my untrained eye.


Although it's not that much if you're using a long hash, iterating a hash function causes you to lose entropy. The implementation in the article is basically PBKDF (1). PBKDF2/HMAC avoids this by making an "outer" and "inner" layer.


The check of hash to input uses == which will shortcircuit and return quicker as you guess the leading digits allowing you to figure out what the hash is. (i.e. a timing attack)


Try to explain how you would actually conduct that timing attack to see why it isn't one.


If you are going to go through all the effort to do it properly, you might as well use a proper comparison function. If nothing else, it reinforces the knowledge that string comparisons can be part of security (which goes overlooked by many).


I'm not sure I agree that it would matter, but, either way, using a constant-time equality function might have given readers the impression that my code was safe to use. It isn't. That was never the intention. One of my main points was that it's extremely hard to do properly. My (pseudo-code) examples were intended to explain the concepts of salting and stretching. Perhaps it's unfortunate that they're actually valid Python.

People should use proven KDFs for password authentication, not implement their own (including using my SHA-salting/iteration examples.)

Edit: removed "in web apps"


Many timing attacks are viable in web applications. But there aren't timing attacks against password hash comparisons, for obvious reasons.


You're right. That was an unfortunate choice of words.


The == operator is a proper string comparison function in this setting.


Assuming all the steps in the article are followed, yes, you are correct.

I still think any article talking about verifying credentials is obligated to mention that string comparison could be an attack vector.

Like I said, it plants a seed. And, I've seen way too many naive implementations where it is needed (like simple token-based auth systems) to know that this seed needs to be spread a lot more.


No, not assuming that. It has nothing to do with the specific steps. To see why, try to imagine a scenario where there could be an operator== timing attack on a password hash.

It feels sometimes like people hear about the idea of timing attacks and then want to see them everywhere.


> try to imagine a scenario where there could be an operator== timing attack on a password hash

Sure, that's simple. You're leaking information about the password hash to the attacker, which they can use to speed up an attack where they have large offline resources but are limited in their online guessing capacity.

Let's e.g. assume we have an unsalted hash, but the system limits us to one guess per second. The password is hashed, then compared with ==, taking some variable time we'll assume can be measured.

We use an iterated timing attack to discover a prefix of the password hash, then run an offline dictionary or brute force attack using that prefix. Passwords that match the known part of the hash are tried online, potentially revealing more of the hash as we go.

> It feels sometimes like people hear about the idea of timing attacks and then want to see them everywhere.

When it comes to side channel attacks, "vulnerable" is the default state.


Does this sound remotely plausible to you? In all of /usr/share/dict/words, there are four (4) 4-byte prefix collisions of SHA1 hashes - 0.0016% of all the entries. That 4-byte prefix is just 1/5th of the total size of the SHA1 hash.


Why, yes, it's eminently doable. Even a single byte will let you discard 255 out of 256 attempts, and as you kindly point out, 4 bytes would be enough to uniquely identify almost any word in the standard dictionary.

Feel free to peruse some SHA-256 hashes with slightly above six bytes of fixed prefix - several orders of magnitude more difficult to find than a mere 32 bits partial collision: http://blockexplorer.com/

Now you can argue about how likely it is for an attacker to actually bother to find and exploit such a weakness, and how a salt would mitigate the severity and so forth, but the takeaway here is that this attack can be trivially and permanently defeated by using a timing independent comparison function.


The "timing leak" here does not uniquely identify any word in the dictionary; the attacker is on the opposite side of the problem. This attack is implausible.


I'll make one more attempt. I don't think I can explain it any clearer than this:

http://pastebin.com/MYT9kpgZ

Here I model the server as a simple function that takes a password, hashes it, and compares it to a known digest with an '==' substitute. The function returns true or false, but also leaks information about how long the match was, through a simulated timing leak.

This lets me identify the dictionary word that was used as a password using just 23 login attempts, instead of the expected 19300 from a brute force attack.

In effect, you're giving the attacker the ability to perform an almost offline attack, as if they possessed the password hash, by supplying however much of the hash they need.

I don't see how this would not be a security problem, if you acknowledge that testing a HMAC with == is.

Again, this is trivially preventable by using a timing independent comparison.


No, I understand the attack you're proposing. I just don't think it works, or, for that matter, buys you much of anything.

Some things a reader of this thread would want to know, to make sense of it:

* C memcmp (which is what Python uses) is below the known measurement floor for remote timing.

* With (many) repeated trials and statistical filtering, the floor is (IIRC, but just Google [crosby wallach timing]) hundreds of nanoseconds on a LAN, tens of microseconds on a WAN.

* The difference between 1 and 2 bytes of matched SHA1 hash is certainly not a millisecond.

* Even if it was a millisecond, which it isn't, you'd come to that conclusion only after many repeated trials.

* To make this attack appear worthwhile, you introduced an artificial rate limit of 1 attempt per second, but obviously ignored that limit when trying to measure the (very noisy) timing of each hash byte.

* Obviously, you can't time a randomized hash, because you don't know enough to generate proposed password hashes to match against.

* The 4-byte shared prefix you're looking for is probably not present in the set of all valid passwords; or, put differently, you'd have better luck just guessing likely passwords through repeated trials than you would making repeated trials to find a next prefix byte.

I'm not disputing that there's information of some sort "leaked" in a password hash comparison; I'm just disputing that it's valuable in any way to a real world attacker.

A more effective way to try to launch this attack would be to try to dump the set of all users known to the system. This attack, far more straightforward than the one you proposed, also doesn't work because of measurement difficulties, but it at least has value in theory, doesn't depend on arbitrary shifting obstacles for the attacker, and nobody in the history of web app development has ever tried to stop it.


It was a toy example, it was not meant to capture the exact details of an attack.

I didn't ignore the time limit. As long as login attempts are much slower than offline bruteforcing, all that matters is whether we need less attempts total. Since it lets you cut your attempts by some fraction with a corresponding constant cost, it always wins out for a large number of passwords. In the toy example it's about a wash if you assume 1000 samples per 'attempt'

I won't examine each of your plausibility objections, but I'll note that the case is identical when checking a HMAC. You shouldn't rely on those assumptions if you can avoid it.

Simply put, using a timing independent comparison is best practice both for checking MACs and password hashes, and I think it is wrong to dissuade people from doing so.


Then tell people to use secure_compare for everything, because virtually all web applications have much worse leaks than the time it takes to compare password hashes. The username example is far more potent in a real-world attack.


Thanks for this example.

I wasn't trying to be condescending when I asked how it was significant. I really didn't understand what you meant.

I've added a note to the article.


How does this matter given a proper avalanche effect?


Nm. Got it.


I believe what tptacek is trying to say that an attack on the == operator is not a timing attack but a brute force attack. Sure the function may return quicker but it doesn't reveal any more information and will be a brute force attack of data returned from a hash function.

For other readers, the == operator is comparing a password hash, as opposed to a password itself. One of the properties of a secure hash function is that, changes to the input drastically changes the output. Thus this attack is one of brute force and not a timing attack on the == operator.


Changing operator== to secure_compare or some analog doesn't change this construction's resilience against attacks. This is very much unlike the situation with, for example, an HMAC verification.


To the majority of users, including me, that idea is actually novel, and the novelty itself demonstrates the hidden corners that are lurking quite well.

I believe that point was well made, even though in a technical sense, the exact error was unfounded.


Why does everyone need to be an expert in password generation and storing? Would a password as a service would be feasible or even authentication as a service? (And I do not mean FB or Twitter auth)


Everyone doesn't, and shouldn't. That's why the best practice is not to implement your own hash function, and instead use a third-party library written by an expert.

I'm not sure that password-as-a-service would be worth the overhead involved, but password-as-a-library is functionally equivalent from a developer's perspective and is already the norm. The only question, then, is "which library?" which is what this article attempts to address.


i've had good experience using KeePass http://keepass.info/

it is open source, and supported by all platforms i use (windows,osx,android,ios) and the interface is pretty well designed. Once local database is open, simply doing ctrl+c on any of the sites copies the password to clipboard for a very limited time.

this is still a major pain, especially since you need to protect the safe with a long password and this is particularly painful to type on mobile devices.


bug in the code.

return getDigest(password, salt) == digest

getDigest returns a tuple


So much for white-boarding it. Fixed, thanks.


great article!


> MD5, SHA-1, SHA-256, SHA-512, et al, are not "password hashes." By all means use them for message authentication and integrity checking, but not for password authentication.

Bullshit. MD5 is just fine, as long as you use the salt.

Here, hack this:

    MD5(password + salt) = "b520542710812f347432232b2a1fba83"

    salt = "MD5 rules"


I mean this in the most polite and professional manner possible, please take five minutes and read http://codahale.com/how-to-safely-store-a-password/ .

It will explain why "salts are useless for preventing dictionary attacks or brute force attacks."

The entire article is excellent - and every colleague who I've ever pointed at it, has come away nodding their head and seeing the light.

The key-takeway (but please, read the entire article) is: "It doesn’t affect how fast an attacker can try a candidate password, given the hash and the salt from your database."

Salts only help you from precomputed dictionary attacks ("Rainbow Tables") - but, if someone is brute forcing you, the value of a salt just disappeared.


So here is a question that I've asked and received no good answer. Why size of the salt doesn't matter? Wouldn't salt be used in every hash computation? Wouldn't large (megabytes) salt slow down this computation and require more memory to perform it? I'm not advocating the use of large salt as opposed to specialized functions, I'm just curious as to why it doesn't work. The article doesn't explain that.


It's not that the size doesn't matter; it's just that it's not as significant. It becomes very, very hard to compute rainbow tables after just a few random bytes. No matter how long the salt is, it doesn't do anything to prevent somebody from trying to guess the original input for a given digest using a brute force approach, though. So usually the salt has less than e.g. 256 bits of entropy just because it takes up less space.

Sure, a very large salt might slow down the first iteration a little (but not necessarily subsequent ones, and it wouldn't require more memory, at least with most hash functions), so you're almost always better off just stretching the key--then you save the storage costs too.


I will believe you when you hack that password.


    $ echo -n "Spiderpig1MD5 rules" | md5sum
    b520542710812f347432232b2a1fba83  -
Thus the password here is "Spiderpig1"

MD5 is broken.


MD5 is not fine. MD5 is very very fast[1] and as such it's possible to simply brute force password given relatively modest computing resources. The space of passwords just isn't that big.

Use bcrypt or scrypt. Don't make up your own crypto.

[1] Crypto benchmarks: http://www.cryptopp.com/benchmarks.html


MD5 is 'fine' for very long passwords. How many of your users have very long passwords? MD5 does nothing to make it hard to blaze through billions of possibilities.


Theoretically and even practically, MD5 has other issues. It susceptible to collisions:

http://www.mscs.dal.ca/~selinger/md5collision/


HAHA...

Maybe Linkedin should've used this ;-)




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: