RAID0 ephemeral storage on AWS EC2

andrew311 · on May 2, 2011

I've done thorough testing on random read/write performance of ephemeral vs EBS, and I can tell you that EBS is waaaay better in the case of random IO. I can attest to the accuracy of the random IO performance in one of the references from the post:

http://victortrac.com/EC2_Ephemeral_Disks_vs_EBS_Volumes

Amazon even says this on their page (also referenced by Gabriel).

Joe Stump's article might lead you to believe the ephemeral and EBS are equal for random IO, but Joe only tested on a RAID0 config with two EBS volumes.

In general, even with the best instances under the best circumstances, you won't crack 2K iops/sec on ephemeral RAID0 with four drives.

EBS, on the other hand, with 8 volumes configured in RAID0 will exceed 24K random reads/sec and 12K random writes/sec. The reads are so much higher because EBS is mirrored.

The downside with EBS is you can see worse performance when there are noisy neighbors. I've seen performance drop 50-70% for hours at a time on m1.large because of network card contention, but you can avoid this when on larger instances (m1.xlarge or m2.4xlarge do the trick).

Sequential IO is another matter. I haven't thoroughly compared, but I believe in this case things are more even.

Performance aside, there is something to be said for removing dependencies on a complex system like EBS. It also frees up network bandwidth and provides quite a bit of storage without the $0.10/GB cost of EBS. If iops isn't a problem and you plan on replicating, then ephemeral can be a big win.

dotBen · on May 1, 2011

So I'm kinda confused. The instance goes down, you lose your data. That's the main reason why Amazon build EBS (granted it has issues of its own)

I'm not quite sure how, in production, this setup really helps you because you KNOW that you will loose your data.

ghotli · on May 1, 2011

Some examples where you might not care: HDFS has every block on three DataNodes around the cluster. Cassandra is similar and both will auto heal if a node goes down. Projects like drbd will do block level mirroring of a partition between a set of machines with only one master elected at a time.

Some usage patterns don't care if the data is lost, they just care that the I/O is fast. Consider a group of identical machines serving up read-only data round-robin'd via a load balancer. If one goes down, who cares? Only the load balancer does. A new machine gets provisioned and gets added back to the group. DuckDuckGo probably has a data lifecycle that pushes out a new read-only index set on a regular schedule. If the data is kept in S3 and pushed out to read-only nodes then this is a very good use case for only using the ephemeral drives.

Servers where writes occur are really the ones that need to be durable. Best case would be for the application to wait until the write was fsync'd on multiple machines if you're faced with node failure at any moment.

This topic is kind of a rabbit hole. If you're using ephemeral drives and can't stand to lose data since your last backup then you have to use some sort of multiple machine architecture. Once you do that you're at the mercy of the network. A set of postgres servers in master/slave replication will always have the possibility of losing writes since the slave replication is asynchronous. Cassandra and other quorum-write based systems can make it so that if you write to a single node, a quorum of nodes must agree upon the write before returning the "block written" signal to the client.

Really the CAP theorem[1] explains it all. Read the paxos wikipedia article[2] and the chubby paper[3] if you want to see how to keep a set of nodes in total agreement as to the current state of the system.

[1] http://en.wikipedia.org/wiki/CAP_theorem

[2] http://en.wikipedia.org/wiki/Paxos_(computer_science)

[3] http://labs.google.com/papers/chubby.html

RyanKearney · on May 1, 2011

Why would anyone go with RAID 0 for a server? At least do RAID 10.

jonburs · on May 1, 2011

The article describes techniques for setting up RAID(0) against EC2 local ephemeral storage for performance gains. As there's no way to replace a failed drive RAID 10 wouldn't provide any benefit.

sobbybutter · on May 1, 2011

Well, you'd have complete data loss on RAID-0 if there's a single drive failure. With RAID-10, you can sustain at least one drive failure without data loss. That way, even though you can't replace a failed drive, you can at least get the data off to a new instance.

aaronblohowiak · on May 1, 2011

The whole instance can disappear at any time, so you can't rely on hardware redundancy. This means you must back up your data off the ephemeral drive in some fashion already, in which case going for RAID-10 doesn't provide much benefit.

sobbybutter · on May 1, 2011

Ah, I see. I'm not an EC2 customer, but how often do instances disappear? This seems kind of odd on Amazon's end. What's the reason for instances disappearing?

wmf · on May 1, 2011

Hardware failure.

semiquaver · on May 1, 2011

The four drives required for RAID 1+0 are only available on xlarge instances -- overkill for most people.

99% of the motivation for doing this is to make up for the poor performance ephemeral storage offers. Any data on these drives should be replicated to other nodes anyway, or you're doing it wrong.

bmurphy · on May 1, 2011

You could always replace it with an EBS volume.

dotBen · on May 1, 2011

The whole point of the post is that he moved off of EBS for various performance and stability reasons.

cagenut · on May 1, 2011

keyword: ephemeral