Google confirms Java and OpenSSL crypto PRNG on Android are broken

tptacek · on Aug 15, 2013

Here again I will posit:

Secure programs should rely on /dev/urandom, to the exclusion of other CSPRNGs, and should specifically eschew userland CSPRNG engines even when they're seeded from /dev/urandom.

This Android bug is another in a line of bugs in userland CSPRNGs, the most famous of which was Debian's OpenSSL CSPRNG bug which gave up everyone's SSH key. CSPRNGs have bugs. When you rely on a userland CSPRNG, you're relying on two single points of failure: the userland CSPRNG and the OS's CSPRNG. Failures of either aren't generally survivable.

There are people who are smarter than me (notably 'cperciva) who disagree with me on this, but this is an idea I got from DJB's NACL design paper; it's not my own crazy idea.

I got to eat dinner with Dan Bernstein the other day, by the way, and it turns out NACL is pronounced "lasagna".

H3g3m0n · on Aug 15, 2013

Except if /dev/urandom is using a hardware based random number generator, then you have to trust that the hardware hasn't received some NSA alterations at some point during the design.

The NSA did design a random number generator that likely had a backdoor in it: https://en.wikipedia.org/wiki/Dual_EC_DRBG#Controversy).

Here's Bruce Schneier talking about it: https://www.schneier.com/blog/archives/2007/12/dual_ec_drbg_...

Also it's in Windows (although it's not used by default but userspace programs could rely on it). https://www.schneier.com/blog/archives/2007/12/dual_ec_drbg_...

It would be possible for the NSA to go to Intel and get them to put in something in their random number generator that would let them to basically break the encryption by massively reducing the keyspace if they have the secret key.

patio11 · on Aug 15, 2013

If you don't trust the hardware, then you've already lost, no matter what algorithmic construction you are using. How are you going to trust your random number generator if 1 + 1 = 2 except when it equals NSA?

(This is one of those times where I regret HN that has twin interests in software security as an engineering science and software security as a political statement.)

H3g3m0n · on Aug 15, 2013

> If you don't trust the hardware, then you've already lost, no matter what algorithmic construction you are using. How are you going to trust your random number generator if 1 + 1 = 2 except when it equals NSA?

You can verify the random number generator. If you know the algorithm and the seed values you can run it on multiple different platforms, or with a pen and paper and verify that the output is as expected and repeatable. If you have large amounts of entropy you are feeding into it, you can log it for testing purposes.

There are also apparently some EC based algorithms that can be used to fix or at lease reduce the impact of a compromised random number generator.

That might not protect against a active attack on your specific system by the NSA (they could send/embedded a magic packet that gives them total control over the CPU for example), might even be possible for it to happen on the NIC controller rather than the CPU if it has access to the system bus. At the least they could flip it into some kind of backdoor random number mode by embedding some extra data in a TLS handshake or whatever. But it should protect against widespread passive surveillance.

tptacek · on Aug 15, 2013

Reducing the impact of a CSPRNG state compromise is a basic design goal for CSPRNGs, and you don't need ECC to do it; the "NSA backdoored" RNG you're referring to is one that no system anyone knows about actually uses, for that reason.

Again: you're making a point that has nothing to do with my comment. If you don't trust RDRAND, don't use it. But you should still be using the OS's CSPRNG. Just make sure your OS isn't using RDRAND. Done and done.

ig1 · on Aug 15, 2013

It's about verifiability, you can verify if a CPU is adding incorrectly, there's no way of verifying if the random source is generating randomness correctly.

Integers form a closed group (in the group theory sense) under multiplication and addition, so if you were concerned about specific calculations being compromised you could verify the answer via alternative calculations and testing for consistency.

Faking consistency is likely impossible without causing a huge amount other calculations (which the CPU will do as part of day-to-day operations) to fail. Making the CPU alter calculations only under very specific circumstances and in an undetectable way would require a huge amount of complexity.

On the other hand we know several fairly trivial ways of faking reversable randomness using standard crypto algos that would be statistically undetectable without taking the hardware apart.

patio11 · on Aug 15, 2013

Tell you what: You write executable code which tests whether, e.g., Ruby's Fixnum addition is actually adding or not, without using introspection on the implementation. (i.e. You're allowed to test any combination of inputs and outputs, but cannot look within the black box.) Feel free to use number theory, statistics, or a web service which wraps Cthulu in JSON.

I will provide a script which monkeypatches Fixnum such that your test script still reports "Yep, it's still adding correctly" and which will predictably cause 1 + 1 to equal the string "LOL I AM THE NSA" under attacker-chosen conditions.

There are many, many ways I could do this as the crafty adversary, but I anticipate my implementation would be roughly seven lines long and trivially implementable in hardware. (At complexity somewhere between a half-adder and the circuits which we had to hand-draw for our finals in hardware design, which -- obviously -- are a drop in the ocean of complexity on a modern CPU.)

ig1 · on Aug 15, 2013

Say you wanted your backdoor to kick-in when I was calculating the fibonacci sequence, will you also detect when I was calculating it via the golden ratio, or via continued fractions, etc. If I can write a test that can show that there's a backdoor because it's not behaving consistently then your attack is significantly weaker then a RNG attack.

You can't just break one calculation under specific circumstances, you have to break any way of verifying it which means compromising pretty much every calculation.

(not to mention you'd have to figure out that it's the fibonacci sequence being calculated and not some other calculation that happens to start with 1+1)

Remember you don't have an interactive attack, you have one chance to build something and ship it out in hardware, and it then has to compromise software that's going to be written in the future.

The beauty of the RNG attack is that it's undetectable, introduces a backdoor into a huge number of systems and it only makes the system vulnerable to the attacker and not to anyone else.

patio11 · on Aug 15, 2013

Super double-secret squirrel sources have said there exists a file "nsa_does_fixnum.rb" which is 1006 bytes long and the SHA512 hash of 896e12642f5720d830572a8b11e4c729da3a20c57402758d22a25c46c3c2e6 bb4b09de9030ad591ecb51b8a9a4d4c119cefbf4d51a72db219d94150fb63a 3623. (Concat them -- I broke it into a few lines to avoid breaking the comment page on low-resolution devices.) Our intrepid squirrel didn't get a good look at what it is doing but claims "It doesn't use any external library and doesn't do anything even remotely tricky." They claim that it secretly modifies any Ruby1.8 interpreter to cause 1 + 1 to equal "LOL THIS IS THE NSA" under some circumstances.

Quick ig1, you are freedom's last, best hope. Write actual computer code which can, by adding numbers together and inspecting their output, determine whether your Ruby interpreter has been compromised by the NSA. You're lucky, since the NSA has already shipped their exploit (or did they?), they can't modify it in response to your detection code. Bad news, though: if your detection code fails and an interpreter which includes the backdoor can, after passing your detection code, still get the wrong answer for 1 + 1, an innocent user fails to find the backdoor and suffers a FatalHitByCruiseMissileError. You don't get to say "OK, so in hindsight, now that I see the backdoor addressing it was pretty darn easy. Mop up the mess and I'm sure to win round two."

acqq · on Aug 15, 2013

Sorry, I don't see what you're trying to argument here, since CPUs aren't Ruby interpreters with libraries. CPUs have much stronger limitations under which they have to operate. Remember the Pentium DIV bug? It was discovered.

I'm writing assembly code, I've spent time also to measure the time of the machine operations etc, so I simply can't imagine the valid argument which ignores exact limitations present in actual hardware.

rst · on Aug 15, 2013

People in this thread seem to be arguing past each other. Patrick is specifically disputing the claim (a few posts up) that you can always tell if the hardware is adding incorrectly. His argument is that if it's deliberately manufactured to add almost correctly, except in certain carefully tailored cicrumstances, the chance of that being detectable by a third party are pretty slim. The Ruby stuff is just a metaphor.

(Google for "Illinois Malicious Processors" for what an attacker might actually do. A few extra transistors here and there can lay you wide open.)

acqq · on Aug 15, 2013

I have referred to

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

I still don't see how adding "almost incorrectly" would have not be detected in practice. In practice you also have to implement "almost incorrectly" in a way that doesn't affect the performance of your CPU. We are talking about processors, not Ruby interpreters.

DanBC · on Aug 15, 2013

The Pentium FDIV bug was inadvertent and blatant, and was found because people were bound to notice the faulty output.

You can't compare that kind of error to a deliberately hidden flaw.

The possible flaws with Lenovo computers haven't, as far as I know, been found anyone other than the spooks. (https://news.ycombinator.com/item?id=6108980)

acqq · on Aug 15, 2013

Those flaws aren't "adding" bugs but remote access "features." Intel has also such technology in the new BIOSes and we ordinary users unfortunately can do less and less about it every time some new technology comes.

jsnell · on Aug 15, 2013

Can you please break that hash on multiple lines or quote it? It breaks the layout of the comment page.

ig1 · on Aug 15, 2013

Your attack relies upon security by obfuscation, hoping that no-one will notice that the behaviour is incorrect (we obviously disagree about the chance that such an attack will be spotted but I think to some degrees that's irrelevant).

The RNG attack has correct behaviour under every statistical and blackbox analysis. The only way you can break it externally is if you have the key.

If you were designing a cryptosystem you'd pick the one where the security lies in the key and not in obfuscation. The same applies when designing an attack you want to keep secret.

cpeterso · on Aug 15, 2013

Linux's /dev/urandom does not trust hardware RNGs any more than other entropy sources. All entropy sources are stirred into the same entropy pool.

ballard · on Aug 15, 2013

Does it use multiple pools a-la Fortuna?

__alexs · on Aug 15, 2013

There's an input pool and 2 output pools but it's not like Fortuna. This is the most up to date analysis of the Linux entropy system I'm aware of http://eprint.iacr.org/2012/251.pdf

gizmo686 · on Aug 15, 2013

No you don't. The only time you need to worry about the NSA having a back door into your random number generator is if you are worried either about the NSA attacking you (in which case you probably don't have a chance anyway), or are worried that some learned of the NSA's back door and is exploiting that.

derefr · on Aug 15, 2013

That reminds me of a question I've been pondering on-and-off for a while: is it possible even in theory to use Sufficiently Advanced Mathematics (e.g. homomorphic encryption) to build a trusted computation environment on top of an untrustworthy foundation?

For example, I want to run a VM on EC2, knowing the NSA's after something I have. Is there some way I could, even theoretically, build my VM, that would protect it from the NSA compelling Amazon's cooperation in ripping the data right from my VM's memory at the hypervisor level?

tptacek · on Aug 15, 2013

Possibly. For instance, look up some papers on White Box Cryptography; these are implementations of things like AES designed to run AES computations on untrusted platforms without revealing their key (the key gets "baked in" to the code that runs the algorithm in such a way that it's theorized to be cryptographically hard to extract the key from the algorithm).

You are getting into the flip side of the DRM coin here; put differently: if there's a way to do what you want, there's also a way for the MPAA to do what it wants on PCs.

Personally, I think there is, at least in the dollar-cost model of security.

bch · on Aug 15, 2013

I'm out of my depth here, but since this conversation seems precise, and is discussing theoretical and practical implementations, does this paper[1] factor in? If so, is it fair to say that at some level, without auditing every aspect of hardware, and every aspect of all software used at build time and runtime, you can't know?

[1] http://cm.bell-labs.com/who/ken/trust.html

ballard · on Aug 15, 2013

It's a good insight. There is distinction between I/O (rendering the video and audio to the host) and computations of a seemingly blackbox. Would also be an interesting avenue for malware to grow into if possible. The problem is there's always been decryption code that has to live somewhere that effectively unlocks the whole thing. Edit: PyPy but zk computing for malware.

homeomorphic · on Aug 15, 2013

Wouldn't this make nice foundations for hardware cryptographic tokens such as CryptoStick?

sweis · on Aug 15, 2013

We're working on a similar problem at PrivateCore: Protecting VM data in-use on outsourced infrastructure.

We're running a high-assurance, remotely attestable hypervisor inside the CPU cache and encrypting all access to main memory. This protects against threats from the physical layer, like cold boot, DMA attacks, NVDIMMs, bus analyzers, etc.

It's not quite what you're talking about in your Amazon and NSA scenario. Amazon doesn't let you bring your own hypervisor to run on bare metal and the NSA can compromise the CPU itself.

However, our approach does give you assurance that someone with physical access can't easily snapshot your VM memory.

derefr · on Aug 15, 2013

> remotely attestable hypervisor

How does this bit work, by the way? What's stopping an altered hypervisor from lying to say it's unaltered? (This is the classic "how do you verify a player on your FPS isn't running a bot instead of a game-client" problem in a nutshell.)

sweis · on Aug 15, 2013

Today we rely on the TPM to measure the state of the system using Intel TXT. These measurements are stored in platform configuration registers (PCRs) on the TPM device.

There are known TPM and LPC bus vulnerabilities. That is why long-term we will move away from that dependency by utilizing upcoming CPU features.

e12e · on Aug 15, 2013

> our approach does give you assurance that someone with physical access can't easily snapshot your VM memory.

But how do you know the VM (or rather the hypervisor for the vm) is running on physical hardware, and not in a hypervisor?

I can't think of a way you could be certain of this remotely? Perhaps you could be on-site for the boot-up, and then rely on the fact that snapshotting is very hard -- but it sounds rather fragile...

Still very interesting project! I've been thinking a bit on "running inside the L/1/2/3 chache"-lately - but I hadn't thought about the particular idea that you could treat RAM as "external" -- assuming you could guarantee that you're always in cache.

sweis · on Aug 15, 2013

You must first remotely attest the hypervisor using TXT before deploying a VM to run on it.

Today, that attestation process relies on a TPM and a signed certificate chain baked in by the TPM manufacturer. This is standard stuff out of the Trusted Computing Group.

One more thing to add, this isn't just a personal side project. We're a company and have a beta product deployed to early adopters.

e12e · on Aug 16, 2013

I realized that you probably did have a product -- unfortunately that doesn't mean that the product works (not trying to imply anything about your product/company here; it's just (as you probably know) there many companies selling security solutions; and very few that seem to be selling security solutions that work...).

Are you aware of:

  "Overcoming TPM by exploiting EFI overflow"
  http://www.youtube.com/watch?v=4bM3Gut1hIk&feature=player_detailpage&t=1655

and:

  "Coreboot/bios malware":
  http://www.youtube.com/watch?v=umBruM-wFUw&feature=player_detailpage&t=2297

It's an interesting use of TPM -- and sounds like a sound approach, assuming there aren't any bugs in the TPM software... which might be too big an assumption.

I don't suppose any of your software is available as open source? Where can I/we learn more?

sweis · on Aug 20, 2013

Hi. We're aware of TPM vulnerabilities. The one you link to is not relevant. However, there are attacks to extract EK private keys, which we know the cost of conducting. It's significantly higher than other low-cost attacks.

We're also aware of vulnerabilities on the the LPC bus. The latter can be addressed with existing TPM 1.2 features -- although they aren't enabled by default.

There are CPU features in the pipeline which may make the TPM unnecessary. We're also working on some new attestation techniques which may help.

We measure and attest the state of the system with TXT. If that works as advertised, you would measure changes to the BIOS, SINIT, opt ROMs, ACM modules, etc.

tptacek · on Aug 16, 2013

I don't know anything about his product but let me just chime in here to say that Steve Weis is the real deal.

ballard · on Aug 15, 2013

Funny you should mention this. I was asked to "suggest solutions" to this in 2008 as a client-facing AWS enterprise consultant in the mobile industry.

The naïve way relies on: "If there isn't a Rootkit or capture mechanism, it's secure if it's obscure." That will buy an iota of time.

The other way is to simply not trust the host and end-to-end encrypt everything important. This is a PITA, but there is no current viable alternative without a significantly rigorous and extensive project (would be in the form of a VM for servers &| something that runs on js like asm.js)

Mostly you need zero knowledge computation. Storage and network are easier probs.

The intractable problem is eventually an answer needs to be presented somewhere. For now, you should minimize exposure as much as possible through all means available.

DanBC · on Aug 15, 2013

> Except if /dev/urandom is using a hardware based random number generator, then you have to trust that the hardware hasn't received some NSA alterations at some point during the design.

That is a valid concern that needs to be part of your risk assessment. "Do I want to protect my secrets from a well funded government agency?"[1]

But there are other risks that need to be thought about too. Some people seem to think that hardware RNGs are better than software. Often they're not, they're lousy.

HWrngs can have subtle failure rates which are hard to detect.

Once you've done all the de-skewing and other checks they can be quite low bandwidth.

I have a bunch of links to reading about HWRNGs here - (https://news.ycombinator.com/item?id=6060636)

And here's a really nice thread (https://news.ycombinator.com/item?id=1453299)

[1] Although if you want to defend yourself against a well funded secret government agency you need to worry about more than a weak RNG.

tptacek · on Aug 15, 2013

None of this comment has anything to do with my comment. If you don't want to use RDRAND, compile a kernel that doesn't.

ballard · on Aug 15, 2013

To avoid hobgoblins wearing Faraday cages without denying the reality of the privacy debate, open source hw can make them quieter. I think it's going to happen because of PC market forces (IBM/Lenovo, Dell) and prior examples (OpenSparc). The startup costs (eda, test equipment) are doable because there are enough people with specialized expertise that could contribute.

If you don't trust the hw or vhost, you're stuffed. Political upwards and technical downwards are the ways to make sure this doesn't happen.

cperciva · on Aug 15, 2013

(notably 'cperciva)

I'm confused, what's the apostrophe eliding? ;-)

tptacek · on Aug 15, 2013

I quote HN usernames. It's a weird me thing that I've done ever since I saw a user named "the". If I did @cperciva that would make intuitive sense to everyone, but then it wouldn't work for people not on Twitter.

cperciva · on Aug 15, 2013

Huh, I never noticed that before. It's a useful notation which I might well adopt. Thanks!

e12e · on Aug 15, 2013

I'm assuming Bobby Tables can't get a nick on HN, then?

(That is, nicks like 'nick, and "' or '1'='1' -- '" aren't allowed?).

Florin_Andrei · on Aug 15, 2013

> but then it wouldn't work for people not on Twitter.

I think the @user idiom is by now understood by anyone not relying exclusively on pigeons to send their missives.

tptacek · on Aug 15, 2013

In places outside Twitter, the @foo idiom addresses a comment to that user, but that's not always what I intend when I reference a user. I'm simply trying to prevent evaluation of a token as English text.

anyfoo · on Aug 16, 2013

I understand where you are coming from, but I think that at this point the interpretation of @foo as an idiom for not addressing but merely mentioning a user has permeated most online cultures.

Those who really do not know (yet) will quickly understand from context.

All in all, I think @foo is less confusing than 'foo, and will spare you questions like in this thread.

rmc · on Aug 16, 2013

The English language is evolving. We've created a vocative tense for English using the "@" symbol. "Et tu, @Brutus?"

616c · on Aug 15, 2013

I believe he speaks with a lisp.

(Please do not throw anything at me.)

TeMPOraL · on Aug 15, 2013

It's a Lisp way of quoting (i.e. preventing from being evaluated) symbols :).

glurgh · on Aug 15, 2013

I think this is fundamentally wrong, in the way 'secure programs should do their own crypto' is wrong. Secure programs should be able to rely on the platform provider's API to provide a CSPRNG. If the platform doesn't, it's the platform's fault and let's all go yell at them. It seems odd to put this at the doorstep of app developers. Android app developers should reasonably expect that SecureRandom does what it says on the tin as should consumers of similar apis in iOS, .net, OS X, Windows, you name it. "I'm a Java app, but really, in fact, I happen to know underneath it all it's Linux thus I will read some random file descriptor". It's a Chewbacca defense, it makes no sense!

tptacek · on Aug 15, 2013

Would that every platform had done what Golang did and simply wrap /dev/random. In the meantime, you have to make a choice: do you trust your language designer's CSPRNG AND the OS CSPRNG on which it (and the rest of the system) doubtless depends, or instead trust only the OS CSPRNG?

glurgh · on Aug 15, 2013

Sure but down that path lies 'did Intel and the NSA backdoor my CPU' tinfoilhattery. If you're an app developer, rather than implementor of original crypto from dry sticks and a bit of flint, you have to trust something and the platform API is a sane place to start. Google, who have specialists on staff, seems to have got this wrong at least twice. It seems unreasonable to expect a person who's simply looking for a nonce to do better or to keep careful track of how to do better because the APIs at hand just don't do what they claim they do.

tptacek · on Aug 15, 2013

This comment appears to be animated by a belief that the NSA backdoors the Linux source code.

glurgh · on Aug 15, 2013

Wait, wat? I thought we were talking about 'What is the reasonable line that cryptography API users should assume is secure'. Which is something sensible people can disagree on.

I just happen to think you're wrong (and the NaCl paper you mention actually provides advice to crypto library writers, not users, for the question at hand). I'm not animated by anything any more than you're in the secret employ of the cartesian product of all possible three letters.

tptacek · on Aug 15, 2013

I'm just not following why you think the NSA makes /dev/random risky. /dev/random is not an interface to hardware random numbers; it's the logical interface to the OS's own CSPRNG, which uses multiple sources of entropy.

glurgh · on Aug 15, 2013

I don't think that at all, I'm sorry a dumb NSA joke confused the issue. I think app writers should be calling SecureRandom (or platform equivalents) and expect secure results. Just like app writers should use TLS, despite its faults. My argument is that reading '/dev/urandom' is akin to opening a raw socket and hand-crafting packet headers because you don't trust TLS.

tptacek · on Aug 16, 2013

I'll stop harping on the backdoor point. What kinds of things do you think developers can do wrong reading from urandom? There are a lot of things you can get wrong using raw sockets instead of TLS; a universe of them.

My real issue isn't that devs use SecureRandom. Use it by default; it's not like we'd doc a bug on you for doing that.

My real issue is that the library developers who expose SecureRandom don't themselves simply pull the random bytes from urandom.

glurgh · on Aug 16, 2013

I'll stop harping on the backdoor point.

I never even made such a point! At all! If anything, I'd like to know how I gave you that impression so I can avoid it in the future.

What kinds of things do you think developers can do wrong reading from urandom?

Well for one thing it's entirely unportable. I think cperciva pointed out some difference on how it behaves on FreeBSD vs Linux. How does it behave on OS X? On Windows? What are the failure modes?

My real issue is that the library developers who expose SecureRandom don't themselves simply pull the random bytes from urandom.

I'm in 74211004% agreement with you there, as is the NaCl paper. Centralize randomness, ensure that it's reviewable and done right. In fact, the NaCl paper suggests:

"A structural deficiency in the /dev/urandom API provided by Linux, BSD, etc. is that using it can fail, for example because the system has no available file descriptors. In this case NaCl waits and tries again. We recommend that operating systems add a reliable urandom(x,xlen) system call."

Again, my stipulation is that this is very much not the job of app developers. The Android bitcoin wallet developers seem to be a case in point - they don't seem to have done anything particularly wrong at all. The platform failed them. So should we be berating platform providers or should we insist app-writers keep close track of various security developments? I think it's very much the former.

tptacek · on Aug 16, 2013

Sorry, I was trying to concede that you weren't making that point, but did it clumsily.

So here is the interesting question as I see it:

What's safer for developers, reading from /dev/urandom, or calling the platform CSPRNG when the platform CSPRNG does something other than reading from /dev/urandom?

I think the answer might be /dev/urandom. A Debian maintainer is not going to accidentally comment the randomness out of the Linux kernel CSPRNG. Nobody is going to miss the logic bug that causes the Linux kernel random number generator to not even try to seed itself.

glurgh · on Aug 16, 2013

What's safer for developers, reading from /dev/urandom, or calling the platform CSPRNG when the platform CSPRNG does something other than reading from /dev/urandom?

I think you're right about this and the (rather fiddly and nerdy) point where we disagree is on where the line of abstraction lies. I don't think you have any business reading /dev/urandom any more than you have using, say, AES-NI

tedunangst · on Aug 16, 2013

Nobody is going to miss the logic bug that causes the Linux kernel random number generator to not even try to seed itself.

Well, NetBSD more or less did that twice in a row earlier this year...

doe88 · on Aug 15, 2013

I think iOS does just that see the function SecRandomCopyBytes in http://opensource.apple.com/source/Security/Security-55179.1... it just wraps /dev/random

mh- · on Aug 15, 2013

I'd be curious to know how he pronounces Google's NaCl. ;)

edit: for those unaware, the NACL 'tptacek is referring to is [0] - which long predates [1].

[0] http://nacl.cr.yp.to/

[1] https://developers.google.com/native-client/

cbhl · on Aug 15, 2013

I always thought it was weird hearing Google devs pronounce it like "na-cull" (like na in "na na na na na na" and "cull" as in killing something) instead of "salt".

xxpor · on Aug 15, 2013

What's the CS part of CSPRNG?

mh- · on Aug 15, 2013

cryptographically secure.

http://en.wikipedia.org/wiki/Cryptographically_secure_pseudo...

tptacek · on Aug 15, 2013

Cryptographically secure.

xxpor · on Aug 15, 2013

Thanks

on Aug 15, 2013

[deleted]

xxpor · on Aug 15, 2013

Think about the man-hours of productivity saved by having the explanation right there ;)

Apes · on Aug 15, 2013

This comment and the fact that it is so highly rated shows just how easy it is to fall into the trap of "I'll just create my own better crypto!" /dev/urandom is insecure by design. It is a non-blocking version of /dev/random and works by first depleting /dev/random, and then falling back to a very low quality RNG once /dev/random is depleted.

/dev/random is the right way to provide hardware RNG in a UNIX environment. However, programs using it can be blocked for extremely long times if there is no dedicated hardware available for providing RNG. I have regularly seen blocking times in the multiple hour range. This is not practical for most programs.

Even relying on /dev/random isn't necessarily safe. I have seen setups where people connect /dev/random to /dev/urandom in an attempt to eliminate these long block times on /dev/random. A quick google search can bring up multiple examples of how to do this. I'll leave it to you to think about out why this is a bad idea, but the result is that you cannot even trust /dev/random.

tptacek · on Aug 15, 2013

Your comment more or less expands on an urban legend. The legend started because of the way the Linux man page for random(4) was written.

Thankfully, Thomas Pornin wrote a definitive debunking of the urban legend here:

http://security.stackexchange.com/questions/3936/is-a-rand-f...

In short: that's not how urandom works. It's not a "fallback to a crappy RNG". Random and urandom are two different interfaces to the same CSPRNG engine. One of them happens to block based on entropy estimates, but that's a design weakness of the Linux CSPRNG, not a great feature of a CSPRNG in general.

While considering how highly rated my comment was, and weighing your own expertise on cryptographic randomness, you might also consider reading the NaCL design paper (Bernstein, Lange, Schwabe) for more details on why you want to use the OS CSPRNG instead of your own. Or, for that matter, consult the NaCL source code to see which they use.

Finally, to your main point: the reason you should use the OS CSPRNG (typically: urandom) is precisely so that you don't end up using some random bit of aspirational app crypto. The OS has to get its CSPRNG right no matter what, so just let it do the work for you.

ballard · on Aug 15, 2013

Host-based security is a whole dead horse discussion, but I still believe Def-in-Dpht means "use everything you can get your hands on; reveal the least as possible."

Correct me if I'm wrong: which OSes have desirable CSPRNG properties like Fortuna (entropy pools esp.). I've looked at OSX, Win8, Free/Net&Open BSD and still haven't found anything satisfying other than class of algo. (Ignore whether closed/open source and IP issues.) I would like to see all mainstream OSes do these well enough.

tptacek · on Aug 15, 2013

OS X does. The kernel CSPRNG in Linux is comparatively primitive.

aliguori · on Aug 15, 2013

I don't think you understand the problem.

The problem is that the PRNG has a weak default entropy source. The same problem existed in the kernel for ages. See http://www.factorable.net

The real advice here ought to be that if you are building an application that generates keys, you should make sure that the system has appropriate entropy before generating the keys. Don't assume the PRNG (whether it's /dev/urandom or OpenSSL) does it for you.

tptacek · on Aug 15, 2013

The problem with this comment is that /dev/random wasn't broken, and the OpenSSL CSPRNG used by Android Java was.

Further, it is harder to imagine a circumstance in which /dev/random could be broken and an app-layer CSPRNG like OpenSSL's could not be broken.

It's a little fuzzy in this case because part of the patch feeds entropy back to /dev/random, but observe (a) that the bulletin doesn't tell developers to re-key if they were depending on /dev/random (as some apps do), despite naming 5 other interfaces that do demand rekeying, and (b) that this particular patch would be an inadequate fix for a broken /dev/random anyways.

It's true that you care about entropy at cold start either way. But your choice of CSPRNGs doesn't resolve that problem for you, except that the OS needs to have secure random regardless and so you're boned whether or not you rely on /dev/random, and hence are better off just relying on /dev/random.

taway2012 · on Aug 15, 2013

Do you mean /dev/random or /dev/urandom? They are different, AFAIK.

tptacek · on Aug 15, 2013

Random and urandom are interfaces to the same kernel CSPRNG. The difference is that reads from random block until an internal counter that estimates the entropy in the CSPRNG crosses a threshold, and reads from urandom don't do that.

In practice, this is a distinction without a meaningful difference. In more modern implementations (like OS X) the CSPRNG works the same way with both interfaces.

You should generally prefer urandom.

There is a wellspring of urban legands about the security difference between these two interfaces because the Linux man page is misleading.

raverbashing · on Aug 15, 2013

And why /dev/urandom instead of /dev/random?

ikawe · on Aug 15, 2013

The entropy needed to generate random numbers is a finite resource. If you don't have enough entropy stored up, /dev/random may block until such a time as it has sufficient entropy.

/dev/urandom reuses existing entropy to produce more pseudo-random numbers. It may be less "random", but will never block.

more at http://en.wikipedia.org/wiki//dev/random

raverbashing · on Aug 15, 2013

And that should be a very good reason for using random instead of urandom

I don't want less entropy thank you. If I don't have enough then let's wait until some more become available

I would use urandom if I need random but I don't care about their quality (even though urandom is good most of the time)

patio11 · on Aug 15, 2013

Since this is as good a time as any: I snidely implied that this was likely to have been due to a bug on the part of the Bitcoin community rather than a design error in the Android platform. That seemed like a reasonable guess at the time, but it was wrong, so: sorry, Bitcoin et al devs.

glurgh · on Aug 15, 2013

It was not a reasonable guess even at the time because it should have taken you all of 2 minutes to verify it is, in fact, not a reasonable guess. I don't think there is some great tragedy or mistake in being wrong about this but if you're going to gracefully admit wrongness, admit to the right kind.

You were wrong then and you could have easily known you were wrong then. This new disclosure doesn't make you righter or wronger in retrospect.

marshray · on Aug 15, 2013

How would you verify this in 2 minutes?

glurgh · on Aug 15, 2013

The programs in question were not manually seeding the RNG, as was patio11's suggestion. The app implementors in this case don't seem to have actually done anything wrong at all.

marshray · on Aug 15, 2013

But it took us days to arrive at this conclusion.

j_s · on Aug 15, 2013

I believe this is the point in the discussion where people willing to do some digging indicated that the problem was most likely not in the apps:

https://news.ycombinator.com/item?id=6195707

It wasn't days, but I can't say for sure that it was minutes... my vote would be for hours! :)

glurgh · on Aug 15, 2013

Once it was clear what library was in use (on the day of the original announcement) it literally took minutes to eliminate that particular one. You're right that it's taken days to learn what the precise problem is - but that's exactly why I'm saying prematurely assigning blame is a mistake!

tptacek · on Aug 15, 2013

It really did take minutes to see that the bug was almost definitely not bitcoinj or the SpongyCastle code explicitly seeding the CSPRNG with crap.

tptacek · on Aug 15, 2013

Yeah, Patrick. Apologize harder! Harder, dammit!

vbuterin · on Aug 15, 2013

I would still say it's a bug from the Bitcoin community to rely on one generator rather than XORing a few just in case.

And I'm saying that as a prominent Bitcoin user and not-so-prominent developer. We undervalue redundancy.

patio11 · on Aug 15, 2013

That would be a very, very bad decision. As one possible implementation flaw, consider what happens where Generator A is /dev/urandom and Generator B is SecureRandom whose platform implementor has decided to implement their own pseudorandom number generation algorithm in cases where they're sure they have enough entropy but, if they aren't, they fallback to /dev/urandom.

That will pass all tests and work fine for all purposes right until hilarity ensues in whatever real world circumstances deplete the entropy pool. Note that after attackers realize that depleting the entropy pool is a victory condition, they can actively work to cause that.

marshray · on Aug 15, 2013

Generator A returns 32 bytes from /dev/urandom. Generator B returns 32 bytes from /dev/urandom. The application XORs them.

What's the problem?

ig1 · on Aug 15, 2013

If you're XOR multiple generators then if an attacker can zero out the other generators (i.e. draining the pool) they can reduce your security to the weakest generators you're using.

If both generators have the same underlying source then they're also likely to share any security vulnerability so XORing them is unlikely to improve the randomness and can introduce potential weaknesses.

marshray · on Aug 15, 2013

I see. OK. Well if you let an attacker zero out your random number generators, you're screwed. You'd better not let that happen.

This 'decrement pool entropy estimate and stop returning numbers' idea is the longest running mistake in crypto.

tptacek · on Aug 15, 2013

ars · on Aug 15, 2013

> they can reduce your security to the weakest generators you're using.

> and can introduce potential weaknesses.

This isn't true. XORing numbers can never decrease entropy. It can only increase it. So your security would stay at the strongest generator, plus some extra entropy from the weaker ones.

patio11 · on Aug 15, 2013

XORing numbers can never decrease entropy. It can only increase it.

Let Generator A(x) equal "For any x, a perfect random number picked by the hand of God Almighty." Let Generator B equal "For any x, A(x+1) unless 'ars is asking in which case A(x)".

If ars chooses Generator A, ars is fine.

If ars chooses Generator B, ars is fine.

If ars chooses Generator C = A(x) ^ B(x), ars will very quickly see that ars' intuitions about math are in some circumstances incorrect. Ars might then think "Phew, good thing they weren't subtly correlated, or it would have looked great right until it didn't."

ars · on Aug 15, 2013

If your second generator has an output that is a function of the first then it's not a cryptographic RNG in the first place and then Generator B is NOT fine.

(Alternatively it could mean that you can only generator a single cryptographic random number, since all the rest are just functions of the previous one, and then again you are not fine. Either way Generator B is not fine.)

And in any case neither A nor B is a generator - only x is. Neither A nor B are adding any entropy whatsoever.

nitrogen · on Aug 16, 2013

If your second generator has an output that is a function of the first then it's not a cryptographic RNG in the first place and then Generator B is NOT fine.

A CSPRNG is a function that takes as input a small set of random bits and generates as output a large set of unpredictable bits. So "Generator B" can be a "cryptographic RNG" and be a function of, or at least correlated with, "Generator A".

marshray · on Aug 15, 2013

Reading from /dev/urandom twice does not yield the same result, or even results subtly correlated under XOR.

marshray · on Aug 15, 2013

XORing numbers does decrease entropy to the extent that there is correlation between the numbers. But you're correct that you can accumulate entropy by taking poorly-random streams and XORing them, as long as they are independently poorly-random.

In cases where they might be non-independent, a simple hash function like even the otherwise broken MD5 would be better at combining them than XOR. But of course, we're now three steps down the path of constructing our own replacement for what the kernel and/or OpenSSL should be doing for us already.

stavros · on Aug 15, 2013

A XOR A = 0

marshray · on Aug 15, 2013

If your /dev/urandom returns the same sequence of 32 bytes twice in a row, you have something truly pathological going on.

stavros · on Aug 15, 2013

In the above example, you're XORing urandom with itself.

marshray · on Aug 15, 2013

Run this a bunch of times:

     dd if=/dev/urandom bs=32 count=1 | xxd

Substitute 'base64' if you don't have 'xxd'.

If you get the same output twice, see a doctor.

DanBC · on Aug 15, 2013

Why XOR urandom with itself?

You don't gain anything, and if your random numbers are correlated with something you're making it worse.

I guess it makes testing easier. Correlations will be easier to see.

marshray · on Aug 15, 2013

People were saying 'urandom XOR urandom' was a pathological failure mode. My point is that it's not going to return the same results twice.

DanBC · on Aug 15, 2013

...but you understand that it doesn't matter that it's not returning the same results twice, and that it's still a fail, right?

tptacek · on Aug 15, 2013

Marsh probably does understand the implications here. What kind of "fail" were you thinking of? That it was pointless? I agree. That it's going to blow up? I less agree.

marshray · on Aug 16, 2013

Thanks 'tptacek.

The point is

     `dd if=/dev/urandom bs=32 count=1` XOR `dd if=/dev/urandom bs=32 count=1`

is not going to cancel out to be zero as some here are implying. No one is claiming that it's a net gain either though.

DanBC · on Aug 16, 2013

As I understand it: You're taking a non-blocking source of randomness, and depleting its entropy pool quicker, which forces it to drop back into 'not very good random mode' quicker.

tptacek · on Aug 16, 2013

Argh. No, that's not what happens. Think about this for a second. It's not like the "entropic" bytes have a color, and the pool gets low on the bytes with that color.

marshray · on Aug 16, 2013

It would be like that if you were looking at it decades ago when key sizes were small and cryptographers didn't have much faith in the one-way-ness of hash functions, or from the perspective of that branch of information theory which seems to permit the existence of arbitrarily powerful computers. A lot of guidance coughman pagescough is still written that way.

In the real world, our computers are limited by physics. It seems that in crypto, the least controversial assumption we can make is that given enough secret bits (i.e., entropy) we can be safe from brute force attacks.

x1024 · on Aug 15, 2013

XORing random streams doesn't have as much value as you think.

raverbashing · on Aug 15, 2013

"Developers who use JCA for key generation, signing or random number generation should update their applications to explicitly initialize the PRNG with entropy from /dev/urandom or /dev/random."

Really? This is what's missing?

They build a whole infrastructure for crypto and forget to do that?

pjmlp · on Aug 15, 2013

The Sun/Oracle JVM does it, as many other JVMs.

Just Dalvik does not.

rsynnott · on Aug 15, 2013

They didn't build a whole infrastructure for crypto; they just implemented Sun's one incorrectly.

dekz · on Aug 15, 2013

> Some SecureRandom Thoughts

I feel like the author of the post isn't giving this the severity that it deserves. The title makes it sound like a thought experiment, here is an example where the original title is (let's face it) shit and the editorial title here is relevant.

healsjnr1 · on Aug 15, 2013

Agreed. The whole thing is a bit blase.

Also, anyone else find it concerning that they advocate presenting /dev/urandom as SHA1-PRNG when it isn't? I don't even see why they did this? They could have wrapped the default SecureRandom and forced it to seed bytes with /dev/urandom.

Also, why are they writing to /dev/urandom??

tptacek · on Aug 15, 2013

Presumably because some Android devices don't boot cold with a lot of entropy, so there's a virtue in helping kickstart it.

taspeotis · on Aug 15, 2013

I'm a layperson when it comes to cryptography. Can you clarify how much entropy a device would need to become cryptographically secure, and the profile of an Android device that couldn't collect enough entropy quickly? E.g. most devices could collect battery %, 3G radio noise levels, wifi noise, readings from the capacitive touch sensors, accelerometer, microphone etc., yes?

marshray · on Aug 15, 2013

> Can you clarify how much entropy a device would need to become cryptographically secure

200 bits.

healsjnr1 · on Aug 15, 2013

There is no magic number needed to make a device "cryptographically secure". It depends entirely on the operations being performed.

In general if you generating Keys, the DRBG or PRNG used in key generation needs to be seeded with entropy equal to the security strength of the key.

For example: If you are generating 128 bit AES keys, the DRBG needs to be seeded with at least 128 bits of entropy. If you generating 2048 bit RSA keys you'll need 112 bits of entropy.

logn · on Aug 15, 2013

1.21 gigawatts!

conformal · on Aug 15, 2013

needless to say, android passed its fips certification with flying colors.

just another secure mobile OS. nothing to see here, move along.

did i mention that you should never store anything of actual import on a phone?

cbhl · on Aug 15, 2013

At what point do you differentiate between a phone and a computer?

throwawaykf02 · on Aug 15, 2013

In this context: At the point where one begins to be much, much more likely to be misplaced than the other.

metafunctor · on Aug 15, 2013

FIPS is about compliance, not about security.

4ad · on Aug 15, 2013

Of course, but OP's point was that compliance is meaningless...

ParadisoShlee · on Aug 15, 2013

Let me get this straight. Somebody spent time to hunt down the original issue with the RNG. Followed by the time required to exploit the issue. Then focused on attacking bitcoin environment in some way. Taking a total of 55 bitcoins.

That seem's like a lot of work for $5000.... Some Blackhat burning 0day for giggles?

It's also very impressive that the BT community caught them so quickly.

m0nastic · on Aug 15, 2013

Part of the philosophical justification for the "full-disclosure movement" is the belief that, much like scientific discoveries, they almost never happen in a vacuum by a lone brilliant researcher.

Which is to say, that it is entirely possible (I'm not going to presuppose how likely) that this has been known about for some time, and used for other nefarious purposes in a manner that wasn't so "easily" detected (the existence of a master record of all Bitcoin transactions makes it much more likely for this to be found than people exploiting it in some random app).

It could be that somewhere right now, in dozens of coffee shops throughout Eastern Europe, South America, and Asia, a group of collective hackers, each unaware of the existence of any others, read this blog post and uttered their linguistic equivalent of "Son of a Bitch".

e12e · on Aug 15, 2013

You forgot the coffee shops in Langley, VA.

tlrobinson · on Aug 15, 2013

If I understand correctly, the attacker didn't even need to know which clients were vulnerable or details of why, they simply scanned the Bitcoin blockchain for vulnerable transactions/addresses and stole any Bitcoins they could. Only then did anyone realize which client(s) were vulnerable and the actual bug.

Though they probably could have stolen more money by waiting until they had collected more private keys, there was a risk that someone else would find the same keys and steal it first.

One somewhat fortunate consequence is there will now be lots of people actively scanning the blockchain for these sort of vulnerabilities thus they should be found extremely quickly, which is great for everyone except the unfortunate first victim. Hopefully clients will specifically test for this as well.

Also note that you can avoid this class of vulnerabilities by not reusing Bitcoin addresses.

zorpner · on Aug 15, 2013

Oh, we've been scanning the blockchain for this for quite some time ;)

It would be great to see it pushed down to the client, though. Easy way to avoid a simple cryptographic tripwire that's likely to recur if the BTC clients continue to rely on library sources of randomness.

marshray · on Aug 15, 2013

(cross post from other article)

I thought OpenSSL's default code already pulled from /dev/[u]random at initialization?

rwg · on Aug 15, 2013

The short answer is yes:

The first call to ssleay_rand_bytes() (which is what RAND_bytes() calls eventually become with the default RAND_METH) will call RAND_poll(), which on Unix-but-not-OpenBSD-or-VOS will try reading ENTROPY_NEEDED (32) bytes from /dev/urandom, /dev/random, and /dev/srandom (in that order). Then, for grins, it throws getpid(), getuid(), and time() into the pool.

The longer answer is maybe:

• RAND_poll() relies on a #define named DEVRANDOM to figure out where to go looking for random devices. If DEVRANDOM is undefined, e_os.h will define it to "/dev/urandom","/dev/random","/dev/srandom". If DEVRANDOM is defined but empty, then the code in rand_unix.c's RAND_poll() ends up skipping seeding from random devices entirely and only tosses getpid(), getuid(), and time() into the pool.

• If something changes OpenSSL's default RAND_METH so it's not RAND_SSLeay(), all bets are off.

Despite staring at all of the code between Android's SecureRandom.java and their OpenSSL tree's crypto/rand directory for three hours, I can't figure out how they screwed this up.

benmmurphy · on Aug 15, 2013

my theory:

trace what can happen if reading from /dev/urandom, etc fails.

i think:

a) if the RAND_seed has already been called it is possible to fail reading from /dev/random and not get a subsequent failure from RAND_bytes because RAND_bytes will believe there is enough entropy in the pool. this can happen even if YOU don't explicitly seed the SecureRandom you are using because the state of all SecureRandom instances are shared :)

b) there is this lols method called throwExceptionIfNecessary in the native crypto code. RAND_bytes checks the return code and if the return code is bad it calls throwExceptionIfNecessary. however, throwExceptionIfNecessary does not always throw an exception. there are some conditions where it won't throw an exception (malloc failure...). however, i'm not sure if these conditions are possible on android openssl.

c) sharing of state between forks. not sure if openssl is initialized in zygote before forking.

marshray · on Aug 15, 2013

Complexity kills.

Thank you for staring at the code for three hours BTW. :-)

rwg · on Aug 15, 2013

As an added fun bonus, it looks like the NativeCrypto Java wrapper around OpenSSL wrongly assumes that RAND_bytes() returns 0 on failure and not-zero on success. In reality, it returns 1 on success, 0 on failure, and -1 on super ultra mega failure.

marshray · on Aug 15, 2013

That evil API has bitten so many projects.

benmmurphy · on Aug 15, 2013

this is probably the problem :)

rwg · on Aug 15, 2013

After staring at convoluted Java code and OpenSSL's code for three hours, I need a stiff drink...

Maxious · on Aug 15, 2013

The developers that used this flawed API thought so too. Mea culpa.

MichaelMoser123 · on Aug 15, 2013

/dev/urandom is usually pseudo random, so as an initialization for SHA1PRNG random number generator it is not good enough. It seems that the suggested fix is not good enough; you need a real source of entropy - which is /dev/random.

The problem with /dev/random is that sometimes there is not enough system entropy (depending on the available hardware sources), so reading from it can block;

On smartphones they might use the radio signals as a source of entropy, get some data from the radio then MD5 the stuff; couldn't they do just that ?

__alexs · on Aug 15, 2013

/dev/urandom and /dev/random follow the same bit generation routine except that /dev/random blocks when the estimated entropy in the input pool falls too low.

The SoCs on phones often have HWRNG support, they just don't always use it, or someone forgets to actually seed their entropy pool with it because they assumed some other bit of code would, which seems to be what happened here.

e12e · on Aug 15, 2013

You mean, take noise that can be readily sampled, by the guys in the black van down the road (or more likely, the guy in the dorm-room next to you) -- and used to reconstruct your random seed?

I'm certainly no radio|crypto expert - but I don't think the "standard" radios on a smart phone are good sources of entropy -- I'm not sure how much raw signal you can actually get out of them (like you can't(?) sample radio noise with your ethernet card, just because it works like a radio).

An AM/FM radio might work (but might still not be a very good idea). There's been some projects using noise from the sound cards in pcs (the idea being sampling electrical noise from the system, not ambient noise, or "actual" sound).

rwg · on Aug 15, 2013

/dev/urandom is usually pseudo random, so as an initialization for SHA1PRNG random number generator it is not good enough.

OpenSSL on most Unix-alike platforms has been seeding its default CSPRNG (which uses SHA-1 internally unless you've built OpenSSL without SHA-1 support) from /dev/urandom for over a decade. I don't understand why has everyone suddenly started framing /dev/urandom as a gigantic crypto bogeyman to be avoided at all costs. It just isn't so.

general_failure · on Aug 15, 2013

I don't understand. Isn't seeding very fundamental? How could they forget this?

marshray · on Aug 15, 2013

QC/QA is all about testing for repeatability. Testing for entropy means proving a negative: that the initial parameters will never repeat across all devices that will ever be booted. Entropy gathering is really really difficult to test within a normal engineering process.

oggy · on Aug 15, 2013

Am I reading the patch correctly in that Android versions below 4.1 are not affected?

syjer · on Aug 15, 2013

there are two fix: applyOpenSSLFix and installLinuxPRNGSecureRandom .

The first is only applied to jelly bean. The prng one is for all the versions.