There are even bigger concerns with crypto on virtualized hardware: side channels. We probably don't even know all the microarchitectural pathways that crypto code can leave footprints on, let alone how to deploy efficient general-purpose crypto code to obscure those footprints.
Could you elaborate on this? My interesting stuff detector is going crazy but I'm kinda out of my depth as to what exactly you're talking about. Are you referring to information leakage from the guest to the host operating system that might allow the host to sniff the inner workings of crypto algorithms running on the guest? Or perhaps guests sniffing other guests through timing attacks and suchlike?
For the benefit of everyone in a hurry: those attacks are very real. If you're running on AWS, assume that the NSA [EDIT: or anyone with deep pockets, or plenty of time on his hands] can break any crypto algorithm you use.
You are still secure from script kiddies, of course, and you've done a fairly good job if this is the easiest way to hack you.
Or "nice paper" hard, really. In fact, I'm pretty sure that I could convince my advisor to let me work on such an attack, and I think I could pull it off in a couple of months. It's hardly novel at this point, though.
So what you're saying is that your fee for side-channeling the private keys from a neighboring domU or dom0 without root dom0 access is $500,000? How long to deliver?
Then who if not Matasano can perform such an attack for $500,000?
Let's say I'm in the market and this particular web site annoys me and it is hosted on EC2. I want to compromise that web site's crypto from a neighboring domU and my budget is $500,000.
You are saying I can contract with someone at that rate and it will get done? How long does that security company take to deliver? With your breadth of security experience and your claim that it's a $500,000-contract job surely you must know who will write that contract otherwise you wouldn't have said such a thing, right?
I didn't pull that number out of the air; I gave it a good 30 seconds of thought.
I arrived at it by:
* modding our bill rate up to that of a contractor who specializes in hardware crypto (we do not, but I know the bill rates of several people who do),
* guessing the amount of time it would take me to implement e.g. Aciicmez (something I can do reasonably because we did BTB timing for virtualized rootkit detection), and
* breaking it up into hours x bill rate.
If you can name 3 people who specialize in adversarial hardware crypto review†, then you know there are at least another 3 who will do grey-area projects of similar sophistication (say, for a company's competitor).
Can you name 3 hardware crypto testing specialist firms? I know there are other people on HN who can. Are you one of them?
† (I can: 83f633acea3a6ca594ea85ae552445369058ded1)
I asked more specific questions in my comment; you aren't answering them. The only important question: why are you so strident about x86 side channels being a non-issue?
Because I'd watch chip vendors not even figure out how to secure MSIs under their IOMMUs and question whether just-plain-old- software security was a reasonable expectation under virtualization. You on the other hand seem to think it's so solid that the microarchitecture doesn't cache crypto artifacts.
Citation for "real"? All of the papers I have read on the topic are theories and most every known side-channel attack is relatively benign and not specific to virtualization. Do you have evidence of a side-channel attack on Xen or KVM being performed successfully in the real world outside of an academic environment which led to cryptographic compromise?
It's easy to say things are "very real" when we don't know if the NSA can do them. Watch: mind control is very real. Assume the NSA has it.
Going from academic to accusing Amazon Web Services of intentionally exposing their customers to NSA side-channel attacks is libelous at best.
VMs are bad because they make it much easier for an attacker to get a process on the same CPU as yours; nothing more, nothing less.
For VM-specific material, read e.g. http://cseweb.ucsd.edu/~hovav/dist/cloudsec.pdf, in particular section 8.4, and note that keystroke timings are usually enough to recover plaintext (passwords are more difficult, but it should still give a good guess). The cache-based covert channel is interesting as well, mostly because it suggests that other cache-based attacks are possible.
Side-channel attacks work just fine outside of academic environments, but the people performing them are testers under NDA (consider Common Criteria for smart cards) or working for various intelligence agencies; they're unlikely to run their mouth on the internet.
> VMs are bad because they make it much easier for an attacker to get a process on the same CPU as yours; nothing more, nothing less.
The paper cites less than 25% of the time which is specific to EC2. With more cores on the host this attack rapidly becomes impossible as domUs rarely share cores. Which is why I asked for actual evidence of cryptographic compromise in the wild and not yet more papers. You suffer from academia. Written in a paper and demonstrated to enable the same timing attacks that having network access enables (with reference to extracting passwords) does not mean "any cryptographic algorithm I use is compromised".
You said "assume the NSA can break your crypto". Since everybody likes to bang on EC2 I would like evidence of how that is accomplished since you used the words "very real". You bring up a good point in that if it is being done it is under NDA and, without realizing it, admitted that you have never heard of it being done.
Which begs the question: how can you say "very real"? Have you ever observed cryptographic compromise via a CPU side-channel exacerbated by virtualization or have you merely read about it?
At any rate, I look forward to your results in a couple of months after you side-channel a neighboring domU and compromise their crypto. Once you're ready I'll give you a couple of my own domUs to demonstrate on free of charge.
To your inevitable followup question: no, I'm not going to talk to you about it.
To your overarching point: if you are on a cloud platform that promises you will never share hardware with any other company – which virtually nobody is – you are still at greater risk simply being on a nanosecond-timeable switched network with your attackers. But local crypto timing attacks are far more powerful.
Nobody bothers with this stuff because simple application-layer attacks are so simple that there's little impetus to develop and mainstream the attack techniques to exploit them. You're naive indeed if you think that's a gauge of how practical those attacks are.
I wonder where your stridency on this topic comes from. I've read all your comments here --- I mean all of them, on HN period --- and I haven't been able to discern what background you might have in software crypto security. You're here saying something that contradicts virtually every other software crypto person I know, is why I wonder.
You talked earlier about "all the papers you read being theoretical" (I'm paraphrasing). What papers would those be? Because I'm a little familiar with this research (we pirated it gleefully for our virtualized rootkit detection talk several years ago), and, relative to the crypto literature at large, x86 side channel research is striking in how non- theoretical it is; to wit: most crypto papers don't come with exploit testbed how-tos.
So your NDA allows you to acknowledge that a side-channel cryptographic compromise is possible but not give any details? That's a really funny NDA. I call bullshit.
Since I have executed one with my employer yes I do.
For example if you asked me directly if such an attack was possible I cannot answer you due to my NDA even though I have personal experience with the matter. You seem really eager to answer that it is though.
All of the NDAs I have signed have never said anything like "you can't say how, but you can say that we pulled it off". In fact most of the NDAs I've signed have been along the lines of "you don't talk about Fight Club".
Can we deduce that you are willing to violate your NDA to write that you have observed such an attack or that you never executed an NDA regarding the specific attack? Yes.
In a previous job I worked for a company whose product needed some entropy on startup. It originally read from /dev/random. But then one of our customers reported that the product was hanging on startup, just after installation. It turned out that they had installed it into a freshly built VM (not a cloned one, I guess) and the read from /dev/random was waiting to accumulate enough entropy to return. (We changed it to use /dev/urandom instead, which is not entirely satisfactory, but at least prevents hanging in this situation.)
While this is not exactly the scenario the OP is describing, it's another thing that can go wrong with /dev/random and VMs.
For servers and VMs without much internal entropy, they could use a random number server. On boot, they could pull random seed data from a web service like random.org or by hashing Google News headlines.
Most servers have little source of decent entropy. Virtualisation makes this worse. Intel has dropped the motherboard RNG support in their chipsets. The suggestion in the thread to use a few cheap VIA boxes which have CPU RNG is one idea. There are cheap USB rngs too like http://www.entropykey.co.uk/.
I did some research work last semester on crypto inside VMs. One of our initial readings was Yilek's work on attacking VM crypto through VM snapshots http://cseweb.ucsd.edu/~syilek/ndss2010.html
That's an interesting trick but not really representative of the problem. Reusing an RNG's entropy pool wholesale after restarting a snapshot is a mistake, not a design flaw.
Part of the problem is the conflict of transparency and security here. Fixing the wholesale reuse of RNG state would most likely require modifying the guest so that it is aware of being restarted from a snapshot so it can react appropriately.
However, that might have consequences on what restoring from a snapshot means conceptually.
Yes, in general write to /dev/random with the write permissions is how entropy gathering daemons and the like work. It gets added the input and mixed in. However, that doesn't fix the issue of how a snapshot restore works on most hypervisors. Adding an RNG refresh as part of the restore process could be possible, but definitely not trivial, and it could have other consequences if not carefully implemented.
I'm coming from a background in massively parallel computing and financial services, both of which are heavy on security. Nonetheless, and even though I have been running cryptographically active instances on Amazon and Rackspace for a long time, I had honestly never thought about the RNG source on VMs.
That is my own failure, of course. I wonder, though, whether everyone else knew about the VM RNG issue, or if only I had missed the memo. If the majority of poll respondents have never thought about the problem at all, perhaps I should start an awareness campaign on wikis and forums.
You did not miss the memo. Crypto on virtualized cloud platforms isn't trustworthy. But it's a grade of untrustworthy several steps higher than "exploitable SQL injection", so people don't think about it, talk about it, or take it seriously.
The poll, though, is unnecessary and I flagged it.
You noted that people could run these on their CSPRNGs to check randomness. No, they can't. These are all useful tools indeed; pentesters use them to check cookies and crypto tokens. But they can only tell you whether bits are correlated. It is very easy for an uncorrelated stream of bits to be terribly insecure: they simply have to be seeded from the same source.
It is not a good idea to suggest people test their RNGs with things like "ent". In "ent", Ruby's insecure rand() can appear competitive with /dev/random.
There's also the obvious confounding issue with testing an entropy pool by depleting it.
It is very easy for an uncorrelated stream of bits to be terribly insecure: they simply have to be seeded from the same source.
Case in point: The b0rked Debian OpenSSL RNG would certainly pass any statistical tests. But it still turned out to have only something on the order of 32767 unique sequences, i.e., 15 bits of entropy.
Hence the caveat: "and hope/trust that the entropy you observe is unique, not copied to any other instances".
I suppose it would take a centralized web service to detect correlated unrandomness between servers. I am not personally willing to implement such a service right now.