Can anyone think of any ways to collect data on just how big AWS really is? So f...

sorenbs · on Oct 26, 2011

If we could somehow collect information on the power usage of their data centers i think that would be the most accurate measure. You might be able to do that by using a thermo camera to determine how much heat is dissipated by their cooling systems.

theatraine · on Oct 26, 2011

If we could generate random IP samplings by crawling, and if the IP addresses are linearly incremented, then we would have a problem analogous to the German tank problem (http://en.wikipedia.org/wiki/German_tank_problem) allowing us to compute the minimum variance estimate of the total number of IPs from a random sampling as: max(IPs observed)*(1+(number of IPs observed)^-1)-1.

dredmorbius · on Oct 26, 2011

You can probably simplify that task markedly using the Routeviews Project data.

http://www.routeviews.org/

Zonefiles are downloadable.

If you can get a comprehensive list of AWS/Amazon ASNs, you could also hit up the CIDR Report: http://www.cidr-report.org/as2.0/

The rabbit I haven't been able to pull from a hat yet is getting that list of ASNs without a fair bit of legwork.

jontas · on Oct 26, 2011

Their servers send an indication in the http headers: Server: Apache/2.2.21 (Amazon)

I guess you'd need to crawl the web and look for those headers. Or maybe you could look at IP addresses? It would certainly be difficult to do with any kind of accuracy but you could probably get some decent estimates if your sample size was large enough.

0x12 · on Oct 26, 2011

Very large numbers of Amazon servers are used for something else than cranking out HTTP pages. Expect a lot of them to be crunching numbers for bio-informatics problems, physics simulations and so on. That's why there is a CUDA enabled instance.

Rendering web pages is actually one of the worst use cases for Amazon from a bang-for-the-buck perspective, especially when you factor in bandwidth.

samstave · on Oct 26, 2011

Crawl and multiply by 3 (assume that 30% are direct web servers)

nl · on Oct 26, 2011

Is that 30% based on anything at all?

When people are building 30,000 core compute clusters [1] on EC2 - presumably with zero publicly available web servers, I'd be very interested in any methodology that provides reasonable estimates of revenue based on public web servers.

http://arstechnica.com/business/news/2011/09/30000-core-clus...

ceejayoz · on Oct 26, 2011

> Their servers send an indication in the http headers: Server: Apache/2.2.21 (Amazon)

That's likely only if you use Amazon's Linux distribution (based on CentOS). My EC2 instances say "Server: Apache/2.2.17 (Ubuntu)".

bdonlan · on Oct 26, 2011

I don't know where you got that header; S3 returns Server: AmazonS3 and http://aws.amazon.com returns Server: Server. I would suspect a lot of their backend services use custom HTTP servers, and maybe a small handful actually use apache. Moreover you'll be completely missing backend servers that have no public IP at all...

kjw · on Oct 26, 2011

I haven't used AWS in a while. Is there a pattern to the AWS IP addresses? (e.g. are they using a set of specific blocks? ...that seems too simple). The other interesting data point would be how much AWS resource is consumed by Amazon itself. I understand that they have been moving big pieces of infrastructure onto AWS over the past couple years.

crb · on Oct 26, 2011

They keep a sticky post in their announcements forum with the current list. The post as of today is https://forums.aws.amazon.com/ann.jspa?annID=1199, and the forum link (in case that didn't work) is https://forums.aws.amazon.com/forum.jspa?forumID=30.

(You probably have to have showdead on to see me because I'm hellbanned. I've emailled pg to try and get this fixed, and had no response. If you happen to see this, please check my comment history to realise that I'm not at all a troll, and consider upvoting me on the offchance it will get my account back into positive karma land. Thanks!)

showerst · on Oct 26, 2011

I don't have showdead on and i see your comment just fine.

jbarham · on Oct 26, 2011

http://alestic.com/2011/08/ec2-max-instances

Somewhat out-of-date (Aug, 2011) but a useful estimate.