Hacker News new | past | comments | ask | show | jobs | submit login
A possible flaw in open-source bcrypt implementations (rondam.blogspot.com)
124 points by lisper on June 14, 2011 | hide | past | favorite | 58 comments



So, he's right in that there seems to be a flaw with respect to the base64 encoding, but he's wrong about how much. It's one byte of hash that's lost when you add the padding back (it decodes to 184 bits), which is explained by the following code:

        encode_base64((u_int8_t *) encrypted + strlen(encrypted), ciphertext,
            4 * BCRYPT_BLOCKS - 1);
It does indeed look like it's cutting a byte off the ciphertext when it's encoded.

Edit: I think I know what's going on with the key length, too. It just so happens that len(base64.decodestring('x'72)) == 54 -- just below the key size limit. But I don't see where the key is getting base64 decoded in the code. Digging...

Edit #2: The b64d('x'72) == 4 behavior I saw was coincidental. What happens is that subkeys can be up to 72 bytes long but will not affect all bits of ciphertext. This is valid and by design:

    /* Schneier specifies a maximum key length of 56 bytes.
	 * This ensures that every key bit affects every cipher
	 * bit.  However, the subkeys can hold up to 72 bytes.
	 * Warning: For normal blowfish encryption only 56 bytes
	 * of the key affect all cipherbits.
	 */
Edit #3: The hash truncation occurs in the reference implementation as well. This is in disagreement with the paper, and reduces the keyspace by ~4.16% -- that's ungood. Wonder why that was?


Actually, it reduces the keyspace by 99.6%.

As to why, I've never encountered a software implementation that precisely matched its design paper. Usually has a lot to do with publication deadlines. Papers don't change, software does.


Sorry, my math was initially off -- the keyspace is reduced to ~99.6%, or a drop of ~0.4%.


No, the upper byte of key should have 256 possible values. It now has one (all zeroes, essentially). Keyspace is reduced to 1/256 of original, a reduction of 255/256.


Only for that byte -- the overall keyspace is 184 bits instead of 192 bits.


I think his point is that 2 raised to 192 is 256 times larger than 2 raised to 184, meaning that you are getting 1/256 as many possible keys by reducing the key length by 8 bits.


Ahh, that's a good point. Thanks for clarifying.


> len(base64.decodestring('x'72)) == 54

Yeah, I thought about that. But I tested it and that wasn't it:

>>> hashpw('x',s) '$2a$05$YTnGVPqoxIgfmWrHAxHqhegbOHvVmK8y9bvOwNlOUVEruX9NjPBJy'

>>> hashpw(chr(ord('x')+128), s) '$2a$05$YTnGVPqoxIgfmWrHAxHqhesvRgqg0bHauZJ3SrAc2oRwZ3XF46hE2'

> subkeys can be up to 72 bytes long

That makes a lot more sense.


53 characters is certainly an "odd" length for a base64 string, as base64 uses four characters to encode three bytes.


Fixed. Thanks.


I'm the py-bcrypt author. py-bcrypt is a thin wrapper around the original reference implementation of bcrypt from the authors of the paper describing it (Provos and Mazieres). The discrepancy in hash length is totally harmless (adding something like 2^-186 additional likelihood of collision) and was present in the reference implementation. I don't know exactly why the hash is slightly truncated, but I guess David or Niels thought that 60 character hashes were a more manageable length.

The author of this assinine blog post only contacted me a few days ago and obviously couldn't be bothered to wait for a response before proceeding to imply malicious intent for what is clearly a trivial difference between academic paper and practical implementation. I hope he retracts it.


> The discrepancy in hash length is totally harmless (adding something like 2^-186 additional likelihood of collision) and was present in the reference implementation.

Assuming that this was not intentional -- and considering that the paper clearly supports this -- then it is not totally harmless. The additional likelihood of collision is increased by ~4.16% (8 / 192) which could be a Big Deal (TM) given the right circumstances, although it's still infinitesimally small.

But what worries me more is that you say "this assinine [sic] blog post". The author of the post found what could be an issue, even if it's not in your code, and you write it off completely. That is simply not an appropriate way to handle issues with cryptographic software.


It was an asinine blog post, Cody. It strongly implied that the author of py-bcrypt intentionally sabotaged it. And, I disagree with your security analysis. Truncating an otherwise secure 192 bit hash to 185 bits is not dangerous. You are talking about percentages of quantities denominated in atoms-in-the-planet-core.


It was an asinine blog post, Cody. It strongly implied that the author of py-bcrypt intentionally sabotaged it.

The guy found a design bug/implementation discrepancy in a bit of software with high security requirements.

He didn't know how to evaluate the security impact or what to do about it. But he knew enough to know he should take it seriously. This is commonly referred to as "being paranoid" and is a natural reaction given the situation. Commendable even.

How many people before him spotted this and didn't speak up because they (rightfully) figured they'd be ridiculed by the OpenBSD hyenas?

And, I disagree with your security analysis. Truncating an otherwise secure 192 bit hash to 185 bits is not dangerous.

I agree, especially given the relative sizes involved. But until recently there was debate about the safety of truncating hash values in general. Noting that NIST didn't specify any form of truncated SHA-1, people would say there was no guarantee that the "entropy was evenly distributed".

You are talking about percentages of quantities denominated in atoms-in-the-planet-core.

This is a common fallacy. There is no relationship between the cardinality of the possible hash values and the number of atoms in some planetary body.

In many (perhaps most) contexts, the critical security factor is the square root of that number. Shall we then look for another smaller planetary body with which to consider that number?


From where do you get the idea that the "OpenBSD hyenas" would ridicule you for finding a security vulnerability in OpenBSD? The only thing I've ever been ridiculed for by Theo is not finding bugs in OpenBSD.

You can get advice about truncating SHA512 from Furguson and Schneier; that book is 5 years old. Since we appear to agree about the important bit, I'm disinclined to argue about how many atoms are in the core or whatnot. I fully own up to not actually having counted them. :|


From where do you get the idea that the "OpenBSD hyenas" would ridicule you for finding a security vulnerability in OpenBSD?

Seriously? One doesn't have to hang around those lists long to see how they treat those they disagree with.

You can get advice about truncating SHA512 from Furguson and Schneier; that book is 5 years old.

Sure, but that wasn't written in the late 1990s when bcrypt was presented and we certainly can't expect the blogger to have read up on the modern advice. Bcrypt doesn't even use SHA-512, it uses the block cipher Blowfish in a suboptimal custom mode to construct its own 192-bit narrow-pipe hash function. When Microsoft rolled their own password hash functions with DES just a few years earlier (LanMan, NTLM) it was thoroughly pwned by Schneier, Mudge, Wagner, L0pht, etc. But you know this!


I'm not an OpenBSD fan, Marsh. I'm close to the opposite. And I'm calling BS on this. I'm sure the OpenBSD people aren't nice to you, but it's probably not because you found flaws in OpenBSD code. I wasn't being flippant; the only thing Theo ever fucked with me over was not finding enough bugs.

As for the last part of your comment, we appear to be searching for reasons to disagree with each other, when in fact we agree that this finding has no impact on the security of bcrypt. Let's not find artificial reasons to disagree.


Yeah I've been off and on the OpenBSD lists a few times over the years and have said both really dumb stuff and insightful stuff. I'm not so much referring to any specific instance, just saying their reputation for being cranky and unwelcoming is well deserved.

I was wrong about bcrypt being a 192-bit narrow-pipe hash. While some parts of it are, reading the paper again the real work happens in the key expansion function having an internal state size of 33344 bits.


He didn't know how to evaluate the security impact or what to do about it. But he knew enough to know he should take it seriously. This is commonly referred to as "being paranoid" and is a natural reaction given the situation. Commendable even.

No, commendable would have been waiting a reasonable amount of time for the author to respond, and not including suggestions that the bug (that they didn't analyse the impact of) was an intentional backdoor.

... ridiculed by the OpenBSD hyenas?

Cheap, ad hominem, etc.

Noting that NIST didn't specify any form of truncated SHA-1

Not SHA-1, but SHA-224 and SHA-384 are truncated hashes specified by NIST.

In many (perhaps most) contexts, the critical security factor is the square root of that number

Not in this case, unless you have corpora of ~2^80 bcrypt hashes and want to find a collision with one of them.


I agree that implying the creator of py-bcrypt was malicious is asinine (although that was apparently not his intention), but dropping 8 bits off the hash could be dangerous. When you're talking about an attack against a perfect hash function, it would mean nothing due to the large size, but in conjunction with theoretical flaws in bcrypt it could be a big deal.

All that said! Bcrypt is still secure by all means and better off than damn near anything else out there.

Edit: note, such theoretical flaws in bcrypt have not even been proposed. Didn't want to make this seem worse than it is.


The sad thing is, the finding isn't asinine. It's truly interesting (though practically irrelevant, except that it appears to have rendered the ref implementation of bcrypt incompatible with Mazieres' paper).

But instead of spending time talking about the interesting finding, this blogger clubbed their own reputation to death on it. I'd like to presume they were just spectacularly bored, but to steal some of the author's own words, that's "vastly too big a discrepancy to be explainable by a simple inadvertent bug".


Maybe we read different posts, but I didn't get anything like an implication that the author of py-crypt intentionally sabotaged it. What I got was more of a "I'm not taking anything for granted because that would be a good attack vector."

I'll leave the rest of the analysis as to the effect of the problem to your much-more-knowledgable hands.



> It strongly implied that the author of py-bcrypt intentionally sabotaged it.

No it didn't. And I quote:

"Except that the problem is not in the python wrapper."


It strongly implied that the author of py-bcrypt intentionally sabotaged it.

It explicitly says very much the opposite, pointing out that the problem isn't the wrapper. I can't understand how you could read the blog post and come away with such an impression.


For the record, I did not mean to imply that py-bcrypt was malicious. All I intended was that I think it's prudent to understand discrepancies like these because they could some day be an indication of something malicious. I apologize for the misunderstanding.


From your post:

OK, so maybe someone accidentally introduced an off-by-one error into the python wrapper. Except that the problem is not in the python wrapper. You can find bcrypt test vectors on the web, and they are all 60-byte strings.

... and:

That is a very big discrepancy between the actual behavior of the code and the description given in the literature. It's vastly too big a discrepancy to be explainable by a simple inadvertent bug.

.... and:

Now, some people might say I'm being excessively paranoid, but I don't think so. The higher the stakes in the internet security game get, the more incentive there is for attackers to try all kinds of sneaky and nefarious tricks to introduce weaknesses into people's defenses, and one of the easiest ways to do that is to publish some plausible-looking open-source security code that actually has a hidden weakness built in to it [!!!] and hope that nobody notices.

Give me a break. And lest the tl;dr's think I'm cherry picking: this stuff is the bulk of the post.

Ed: fixed ems.


I can see how someone could read that as implying that I thought the author of py-bcrypt was being malicious. All I can say is that wasn't my intent. My intent was that I didn't think it was being overly paranoid to try to understand what was going on, whatever it was.


It's funny how we're looking at the same text and arriving at opposite meanings.

Let me highlight some things from your quotes.

1. "Except that the problem is not in the python wrapper"

- This directly says that the Python author is not being malicious---how can he be since he isn't at fault for the error?

The essayist continues to explain why the Python author is not at fault.

2. "You can find bcrypt test vectors on the web, and they are all 60-byte strings."

- This bug is wide-spread.

3. "That is a very big discrepancy between the actual behavior of the code and the description given in the literature. It's vastly too big a discrepancy to be explainable by a simple inadvertent bug."

- The bug is too large to be an inadverdent bug introduced independently by every author.

4. "Now, some people might say I'm being excessively paranoid, but I don't think so. <snip>"

Here the author is only defending his rationale for looking for bugs in an open source project. He isn't implying that he has now found one that shows the authors are malicious.


Technical question: you want to emphasize the em-ed text or the plain text? I'm not sure, because it seems to me that the text you left unformatted is more relevant in this context that em-ed one.

Edit: thx :).


I guess David or Niels thought that 60 character hashes were a more manageable length.

That's a pretty weird design decision to use a 128 bit salt and then chop bits off the actual hash value to make the result a "more manageable" length.

The 128 bit salt wastes 4 bits in the base64 encoding. The 31 character base64 discards 8 of the 192 bit output bits (31/4*24 = 186).

If they'd just used "only" a 126 bit salt they could base64 encode it in 21 chars with no wasted space. That would allow them to store the full 192 bits in 32 chars with no wasted space.

So they threw away 8 hash output bits in order to save 2 salt bits.


First of all, thank you for py-bcrypt. Thanks to your hard work and generosity, the tiny sliver of the internet that I manage is a more secure place.

Now for the request (sigh, you knew it was coming): at my day job, I have to develop on Windows and it's not feasible to install Visual Studio 2008 Express on my web server. Is there any other practical way to get py-bcrypt running in that environment?


Maybe you could install just the VS 2008 runtime libraries? http://www.microsoft.com/downloads/en/details.aspx?familyid=...


Thanks for the suggestion. I tried this, but the install script still returns the same error: "Unable to find vcvarsall.bat".


Hm, that's actually a compiler error. If you can't get the VS compiler installed easily I think http://blog.eddsn.com/2010/05/unable-to-find-vcvarsall-bat/ is your best bet.


I'd like to help, but I have no idea how to build Python modules for Windows any more. I once figured it out by following some guide that was linked to from python.org, but that broke when more recent versions of Python required more recent compilers.

If you can create a build environment on your workstation, IIRC there is a bdist_win (or similar) setup.py target that will build a .msi package that you can load on your webserver. That's about the limit of my Windows Python knowledge right there :|


Interesting investigation, I would like to see what the response from the py-bcrypt author is (since py-bcrypt is what I use in my applications).


I just started using py-bcrypt for a Flask extension I wrote. I was told there's a couple of other modules out there that support bcrypt but I was trying to find a reason to use them over py-bcrypt as it seems to be the de facto standard. Maybe this is such a reason? iirc cryptacular and bcryptor were the modules that were recommended to me.


I didn't really like cryptacular that much. I don't really know why one character of the hash is being taken off; I remember curiously wondering why that was in there when I was fiddling with the source to get it to compile on Windows.

I also don't know if that would pose as a significant security threat - sure, you would be taking one character off of the number of characters that need to be brute forced, but it is only one. I'm not informed enough to give an accurate opinion.

I do know, though, that jumping to conclusions before a thorough explanation is provided is silly... Hence why I'm not suddenly jumping to the use of cryptacular or others.


I'm guessing it would make brute forcing about 40 times easier.


Which is a statement not unlike talking about making it 40 times easier to travel to the Andromeda galaxy.


Indeed.


Bear in mind that the likelihood here is something like 2^-186


It would make it 256 times easier, reducing the keyspace by ~4.16%.


Reducing the log of the keyspace by 4.16%, which is not a particularly interesting measurement.

Reducing a key by 8 bits reduces the key space by 255/256, regardless of key size.


I don't see how that can be, given that the character that's chopped off can take one of 40 values... Or is it case sensitive? Still, nowhere near 256.


It's not b64 encoding the final byte of the hash, not dropping a b64 character.


Ah, I see. You are correct, then. Also, that would make it 64 times, if I were right, it seems.


I suspect Solar Designer wrote the implementation for John the Ripper with the goal of cracking passwords, not validating them properly. For this purpose, slightly-truncated hashes should work just as well (maybe slightly better).

If Openwall and py-bcrypt are using JtR code for actually validating them, that's a questionable bit of software engineering. JtR may not be doing the same type of input validation that one would want in your authentication code. More evidence for this suspicion is that the input length disparity the blogger Rondam describes.


No, py-bcrypt uses the reference implementation from OpenBSD.


Has anyone looked at the length of the strings on OpenBSD?


Yes, py-bcrypt produces identical hashes (it was intended to be compatible)


Very suspicious indeed.


It'll be troublesome to fix this since any change will invalidate all existing passwords.


Fixed and unfixed versions, no?

I usually include a verbose, independent scheme string next to encrypted db columns, so a) data is self-documenting for future owners, providing fair detail to work around forward breakage/compatibility and b) have multiple methods living in the database during upgrades.


Does "verbose, independent scheme string" mean:

Have an extra column describing the hashing scheme used for the password?

Isn't that what the $2a$12$ is for?


No, that minimum is exactly what will fail in this case.


It isn't worth fixing this - the likelihood of a hash collision is infinitesimal




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: