Amazon S3 – 2 Trillion Objects, 1.1 Million Requests/Second

sylvinus · on April 18, 2013

"If you added one S3 object every 60 hours starting at the Big Bang, you'd have accumulated almost two trillion of them by now."

That actually sounds underwhelming! IMHO our brains have an easier time thinking "hey, 1 every 60 hours that's not much" compared to figuring out the universe is really incredibly old ;-)

jeffbarr · on April 18, 2013

I'm open to more creative analogies. I'll happily send some AWS stickers to the first 5 truly great ones that show up under this reply...

sylvinus · on April 18, 2013

Awesome ;-)

How about comparing to the lifetime of one person? With the US average life expectancy (78 years) that means you'd have to upload 800 objects/second for your whole life to get to 2 trillon :)

Edit: One S3 object for each fish in the ocean (3 to 4 trillion) could also be a nice future milestone (if also slightly underwhelming :)

Edit2: I also love the eye blink as a unit of time. Each time you blink (average is around 15 times a minute), XXX more objects will have been uploaded to/requested on S3

columbo · on April 18, 2013

My math is probably wrong... but I believe if you ate a twinkie for every request at the end of a year it would take 1,350 Blue Marlin heavy lift ships to move you across the ocean.

somethingnew · on April 18, 2013

also you would die.

celoyd · on April 18, 2013

Oh, oh, I like doing these analogies, and we badly need AWS stickers for the office.

A typical grain of beach sand – which must be about the lightest solid thing that’s easy for most people to envision – is 3 mg. That many grains of sand would mass 6e6 kg (6000 metric tons), which in turn is near the upper limit of masses that make sense to most people in everyday terms.

So you could say that if every S3 object were a grain of sand, S3 would be 3× the launch mass of the Space Shuttle, or somewhat more than the gold held in Fort Knox, or as much as the heaviest living thing: http://en.wikipedia.org/wiki/Pando_(tree)

And the the 1.1 million requests per second would be about 3.3 kg, or as much as a gallon of milk/water.

mmcnickle · on April 18, 2013

If each S3 object was a US penny, you could fill over 350 olympic sized swimming pools.

oskarth · on April 18, 2013

Imagine each object being a person, and imagine a room large enough to fit 10 000 people, the size of a small village, standing side by side, barely being able able to touch each other's hands, spanning roughly 15-20 km.

Now imagine putting 9 999 people behind every person.

And finally, imagine stacking 9 999 on each of these million people's head. That's twice the height a normal airplane flies at.

Then double that number. That's how many objects have been created so far.

---

A bit long, but I think the roughly human scale numbers, 999s and even more effect makes it almost imaginable.

EDIT: Another one - you could fill the whole of manhattan with cents and still have money to spare.

(Manhattan is 87.46 km^2 and a coin is less than about 40mm^2)

tekacs · on April 18, 2013

I have a feeling that having 1/8 as many objects as there are dollars in the US National Debt might hit home right about now. ;)

Not least people will have seen the many, many visualisations of that figure. :P

JustFlanners · on April 18, 2013

Let's put a positive slant on it ...

If you had $8 for every object you could wipe out US National Debt.

tekacs · on April 19, 2013

But the negative slant is so much more fun (and deliberately cheeky!) :( :P

cperciva · on April 18, 2013

It's so many objects that you had all their names printed out, and spent the rest of your life reading them, you wouldn't have time to get to the end.

In fact, even if you and all your friends spent your lives reading the lists of object names, you wouldn't have time to get to the end.

In fact, even if you and all your friends and all of their friends devoted your lives to reading lists of S3 object names, you wouldn't even make a dent because new objects are arriving faster than you could collectively read their names.

thomson · on April 18, 2013

You could cover the surface of the earth with furby dolls, and then do it again to create two layers of furby dolls. That's about as many objects.

lsb · on April 18, 2013

If each object is a megabyte, you could stick them on microSD cards packed in a cube as tall as an adult.

(1 card is 0.11cm x 0.15cm x 0.01cm, 32GB. 32k objects, 31.2M cards needed. 165 cm x 165 cm x 165cm = 110 * 150 * 1650 cards = 27.2M. Instead of 165cm which is 5 and a half feet, say 6 feet, so try 166 * 122 * 1829 = 37.04M, enough to use some error-correcting codes just in case.)

johnbm · on April 18, 2013

I like to use dice to think about storage. If a byte is the size of a die (let's say 1cm^3), then a kilobyte is a 10x10x10 cm^3 cube. A megabyte is a 1m^3 crate, a gigabyte is a 10m^3 house, a terabyte is a 100m^3 tall building, a petabyte is a 1km wide borg cube etc.

Fenster · on April 18, 2013

If each object was written on a single standard 8 1/2 x 11 inch piece of paper and stacked up, the stack would be about 125,000 miles high (about half the distance to the moon).

zackelan · on April 18, 2013

2 trillion Post-It notes would cover all of Los Angeles county.

jeffbarr · on April 18, 2013

All of these are awesome. Please email your mailing address to me along with your analogy and I'll send you some stickers!

maverhick · on April 18, 2013

Make it x times the Brittanica, y times the Oxford Dictionary, stacked z books xy times to the moon

MartinMond · on April 18, 2013

So how many objects should have been corrupted/lost according to their SLA and how many actually did get corrupted?

HeyImAlex · on April 18, 2013

>how many objects should have been corrupted/lost according to their SLA

Anywhere from 20 to 200M per year depending on how many people use RRS...

gphil · on April 18, 2013

It's staggering that out of 2 trillion only 20 might be corrupted, assuming they lived up to their SLA.

2000000000000 - (.99999999999 * 2000000000000) = 20

Makes me feel pretty OK about having backups there.

jeffbarr · on April 18, 2013

Note that RRS is "Reduced Redundancy Storage" - an S3 option that offers 99.99% durability, as opposed to the 99.999999999% durability offered by Standard S3.

angersock · on April 19, 2013

Do objects tend to get lost in clusters, or what?

chuckmans3 · on April 18, 2013

Considering we don't hear about problems that often, this is quite an impressive feat of engineering. You really don't think about it until we get a hiccup and half the Internet goes down.

Smrchy · on April 18, 2013

It would be really interesting to know the average size of an object and visualize the amount of harddisks it takes to store all this data.

ihsw · on April 18, 2013

To put this into perspective it's 2.7B new objects per day (assuming 1 Trillion objects averaged over 365 days).

Assuming each object is 100KB (generous estimate, after compression) that would be 270GB per day -- or assuming ten levels of redundancy and striped across three RAID storage devices (per level of redundancy) then 8.1TB per day.

I'm not familiar with their hard disk procurement policies but it wouldn't be difficult to assume they've been purchasing 1TB drives, so 10 new disk drives per day just for keeping ahead of growth. Furthermore let's assume their disk drive failure churn rate is 10% per day so another 1 new disk drive for parts replacement (so 11 disk drives per day).

These are really loose numbers not based on any actual data (or any personal experience at all) but just napkin math, so take it all with a grain of salt.

stilldavid · on April 18, 2013

I'm not convinced that 100KB is a great estimate on file size, but either way you're off by a few zeroes. It's not 270GB per day, it's 270TB. Even if each object were just one byte, that would be 2.7GB. 100KB is one hundred thousand bytes. So it's quite a bit more than eleven drives per day!

ihsw · on April 18, 2013

You are correct, that would be 270TB.

After applying the same shoddy math with each object being 100KB -- 270TB with 10 levels of redundancy across 3 RAID drives resulting in 8,100TB per day. This would be 8,100 drives (at 1TB per drive), or 8,910 drives after 10% being dead-on-arrival.

The math is sketchy, so let's cut it down by 10x (10KB per object): 891 drives per day. Keep in mind this is just for S3 and it doesn't account for existing drives failing, growth, or what other services require (eg: EC2, RDS, Cloudwatch, Cloudfront, etc).

SpikeGronim · on April 18, 2013

Average size is irrelevant. A handful of 1G objects dwarf hundreds of 1 byte objects when computing the average. The overal distribution is interesting. There are actually three: GET sizes, PUT sizes, and stored sizes. They are not identical distributions, especially since the as the PUT size distribution has changed it becomes out of synch with the stored size distribution. Wish I could tell you more, there are some fascinating data points in there but, you know, NDA. Source: form S3 employee.

jl6 · on April 18, 2013

You could then multiply by 2 trillion to work out their total data volume under management, which I believe they consider commercially sensitive information, but I can't quite put my finger on why.

jacques_chester · on April 19, 2013

It would help competitors form a view of their cost structure, which would allow them to optimise a price which would put Amazon into the red.

upthedale · on April 19, 2013

Wow, Windows Azure seems to be beating it by a long way. Genuine surprise... I assumed it would be much closer.

9 months ago, they announced they stored twice this amount - 4 trillion objects. A year before that, 1 Trillion. Given that previous rate of growth, we can expect they have a lot more than this now.

They also announced peaks of 880,000 requests a second. Whilst Amazon wins here, I'd say its fair to assume this number has increased in those 9 months.

http://blogs.msdn.com/b/windowsazurestorage/archive/2012/07/...

theatraine · on April 19, 2013

Bear in mind that Azure storage includes "Blobs, Disks/Drives, Tables, and Queues" however S3 is only blobs (other services like Amazon's DynamoDB would be analogous to table storage). Hence, it's not an apples-to-apples comparison.

upthedale · on April 19, 2013

Fair point. Do we have any other numbers that might make it a more apples-to-apples comparison?

TallboyOne · on April 18, 2013

What is an 'object' in this sense?

kalleboo · on April 18, 2013

A file

facorreia · on April 18, 2013

i.e. a sequence of bytes

breadtk · on April 18, 2013

A blob of data, but in general people think about objects as files.

skyebook · on April 18, 2013

Anything stored in S3

kenneth_reitz · on April 19, 2013

I love S3, and have been experimenting with using it as a true key value store.

https://github.com/kennethreitz/elephant

It's going quite well so far :)

ConceitedCode · on April 18, 2013

Anyone know what technology S3 uses to meet this kind of demand?

thelegit · on April 18, 2013

Hard drives..

Jk... but seriously, good question! it's gotta just be a massive feet of engineering

cinquemb · on April 18, 2013

waiting for the comparison to the total outstanding us derivative exposure :P

jeffbarr · on April 18, 2013

Dollars and objects don't compare very well.

raulonkar · on April 18, 2013

what do u mean by objects?

jeffbarr · on April 18, 2013

An object is a single blob of data. S3 objects vary in size from 1 byte up to 5 Terabytes. They can be uploaded with a single PUT, or with multiple PUTs in series or in parallel (which we call multipart upload). They can be downloaded as a unit, in full (GET) or in part (range GET).

raulonkar · on April 18, 2013

Thanks jeffbarr for valuable information...

Udo · on April 18, 2013

Which you could have googled in 3 seconds.

spullara · on April 18, 2013

I am surprised that the number of requests per second is this low — especially if this includes PUTs. There must be a pretty huge multiple between CloudFront and S3 that keeps this in check.

tomschlick · on April 18, 2013

Anyone else find this surprisingly low? I'd imagine your typical web service holds a few thousand objects in S3 for images etc, then backups and anything else. Then you have your big players like netflix, dropbox etc that use the service. Who store data for tens of millions of customers...

ceejayoz · on April 18, 2013

Netflix is unlikely to store customer data on S3. They use it for movie files IIRC, but customer data would go in a proper database of some sort.

Dropbox does use S3, but I think you're probably doing the fairly standard human mistake of not realizing just how big a trillion is. Dropbox hit 100M users in November of 2012, so for Dropbox to use up two trillion objects each user would need to have 20,000 of them. Dropbox does deduplication, has lots of inactive/minimal users, etc., so they're probably a percent or two of S3 objects.

saurik · on April 18, 2013

Netflix stores most of their user data in Cassandra (at the AWS conference last year I attended a talk by them, and specifically asked after some of the things they were storing).

(At one point, the stuff I was storing/logging for Cydia actually represented over a percent, maybe it was even over two percent, of all objects in S3; now I'm between 0.1% and 0.5%.)

charlesju · on April 18, 2013

I think you are desensitized to what a trillion means.

http://demonocracy.info/infographics/usa/us_debt/us_debt.htm...

fooyc · on April 18, 2013

1.1 Million Requests/Second seems especially low for Amazon.

The average server can serve more than 1,000 static objects / second easily.

amock · on April 18, 2013

There is more to S3 than just serving static objects off of the local drive. Maintaining the integrity of the data and ensuring that the data sent out is consistent is not a trivial task. A constantly changing map with 2 trillion keys is a hard problem on its own. Also, serving 1000 1MiB objects per second is not the same as serving 1000 1KiB objects per second so it's hard to say how many resources just the serving portion consumes.

fooyc · on April 21, 2013

I'm not saying it's simple nor easy.

I was assuming that the number of requests referred to read requests, and I guess that the system is designed in a way that makes read requests very cheap, maybe even cheaper than reading a file on a usual filesystem, at least for hot data.

Just because it's huge and complex doesn't mean it's slow and that requests are expensive.

kondro · on April 18, 2013

If you ignore the fact that these aren't actually static objects and require a lot more computation to work out where they are and where they need to go.

1.1M RPS is the amount they actually serve, not how much they can serve. Just because your single server can serve more than 1,000 static objects/second (in fact, that number should be much higher), it doesn't mean you need to.

fooyc · on April 21, 2013

> If you ignore the fact that these aren't actually static objects

It depends what you call a static object.

If the content of your object is stored somewhere and you can just send it without transformations, it's some kind of static object.

Now, looking at a single S3 bucket as a key-value store, with some kind of routing mapping an object's URL to a set of shards each containing the object, one could argue it's serving static objects.

> require a lot more computation to work out where they are and where they need to go

I hope not. I would bet it's not very far from serving a file from the filesystem. There may be a lot of i/o contension though.

> in fact, that number should be much higher

Yes, very probably, and that only makes the 1.1M RPS number seem even less impressive - for amazon.

> 1.1M RPS is the amount they actually serve, not how much they can serve

My whole point was this number of requests seem low for amazon, not that they couldn't handle more.