AWS Compute Cluster #231 on Top 500

RK · on Nov 15, 2010

Each (cluster compute) instance has 2 quad core processors. The machine on the Top 500 list has 7040 cores. So at 8 cores per instance (880 total instances) and $1.60/hr per instance, that comes to $1408.0/hr to run this machine.

No idea how that compares to other machines on the list. You would really need a TCO analysis.

When I run our 250 node EC2 cluster of small instances (around $25/hr) I get nervous that it won't shut down properly. $1400/hr would be pretty nerve wracking...

wisty · on Nov 15, 2010

I wonder if cloud clusters will get GPGPU, since that seems to be a factor these days. It's not much use for a general purpose web server though.

maximilian · on Nov 15, 2010

They have compute nodes with GPUs. From their website:

  22 GB of memory
  33.5 EC2 Compute Units (2 x Intel Xeon X5570, quad-core     “Nehalem” architecture)
  2 x NVIDIA Tesla “Fermi” M2050 GPUs
  1690 GB of instance storage
  64-bit platform
  I/O Performance: Very High (10 Gigabit Ethernet)
  API name: cg1.4xlarge

So they have new awesome Fermi GPUs but the instances cost $0.50 more per hour than the standard HPC compute nodes. Generally exciting for anyone with GPU code and the desire to run on a cluster.

amock · on Nov 15, 2010

Those are available now, but not with the comment was posted.

jread · on Nov 15, 2010

I ran about 100 performance benchmarks on this new cluster compute instance and published results on our blog. It uses Nehalem X5570 with 16 hyper-threaded cores (8 physical), HVM, and 10 Gbps non-blocking network for the EBS volume. Out of 134 different virtual and bare-metal cloud servers I've benchmarked from EC2, Linode, Rackspace and other vendors, the EC2 cluster instance is the fastest in almost every category (CPU, disk IO, encoding, database, others).

http://cloudharmony.com/b/2010/09/benchmarking-of-ec2s-new-c...

tomn · on Nov 15, 2010

For me, the biggest thing about AWS has been easy access to * ridiculously* powerful machines for not very much money -- a few of us decided that we wanted to build a markov chain from the entirety of Wikipedia, which is quite doable if you have access to a box with 68GB of RAM. That's quite a bit more than a standard student laptop.

I think on the last run we got about 2/3 of the way through; we should be able to do the whole thing if i ever get round to running it again.

RK · on Nov 15, 2010

Interesting. I just read the article Can Cloud Computing Reach the Top500?, which concludes that EC2 won't crack the top 500. I wonder what changed.

http://www.cs.utexas.edu/users/pauldj/pubs/uchpc09.pdf

amock · on Nov 15, 2010

I think the two big changes are that the nodes have 10Gb network cards and full bisection bandwidth. They're also HVM instead of PVM but I don't know how much that matters for this benchmark.

RK · on Nov 15, 2010

Internode bandwidth seemed to be the limiting factor cited in the paper, so that makes sense.

pbh · on Nov 15, 2010

It looks like they are using extra-large standard and extra-large high CPU instances rather than the more recent quadruple extra-large "cluster compute" instances. Maybe that's the reason? (EDIT: Though amock seems to have said it better than me.)

jasonjei · on Nov 15, 2010

It would be interesting to see if Amazon will donate/discount AWS services to non-profits and research institutions (e.g cancer) to leverage this node power to make up for processing shortfalls.

RK · on Nov 15, 2010

They do award AWS in Education research grants. We got one related to cancer research in fact.

rbanffy · on Nov 15, 2010

"One of the fastest supercomputers in the world for $1.60/node hour. Cloud computing changes the economics in a pretty fundamental way."

Good luck getting much computing done at $1.6 an hour... Supercomputing speeds are bound to cost a lot more than that.

But it's still cool to have an account on #231 of the top500 list. I have to brag about it someplace.

bjg · on Nov 15, 2010

I used to work on the current #114 Palmetto

http://www.top500.org/site/systems/2938

When I worked on it, we were #64. I told lots of people :)

amock · on Nov 15, 2010

Why wouldn't you be able to get much done on this cluster?

apl · on Nov 15, 2010

You would, but it's slightly misleading to say that two dollars buy you an hour on a supercomputer. It buys you an hour on a tiny fraction of a supercomputer. Once you reach proper cluster size, you'll pay significantly more; maybe even more than on a "normal" machine of similar power.

rbanffy · on Nov 15, 2010

> maybe even more than on a "normal" machine of similar power.

To be fair, it has the added benefit of you not having to sell a cluster of PS3's on e-Bay after your number crunching is done... I think that could be the biggest change cloud computing brings to HPC.

lsc · on Nov 15, 2010

when I price out my services (which usually come out to being cheaper than ec2) a rough rule of thumb is that the monthly fee should be around 1/4th the capital cost of the hardware.

When thinking about the rent vs buy equation, this is something to think about. If you only need it for a month or less, you almost certainly will save money by renting. If you keep the hardware for a year, you will nearly always save money by buying. (of course, it's a little more complex; if you have have a bunch of hardware knolwedge in-house and can sell hardware that was used for a month at nearly full price, it might make sense to buy even for one month's usage. If the opposite is true, e.g. there is something in your organization that prevents you from hiring hardware contractors, or you have budge for MRC but not for capital costs, renting might make sense longer. I just use the 'four months' as a starting place.)

This is leaving aside the "of reasonable size" bit. If you need 256MiB of ram and a tiny slice of a x86 CPU in a data center, it's nearly always cheaper for you to rent a virtual than to own a physical bit of hardware. But for large clusters, you usually need enough ram/cpu to justify buying and hosting a real (32GiB ram/8 core or better) server.

rbanffy · on Nov 15, 2010

It's a bit more complicated than that: your computing needs may vary with time. One month you may need to do a huge amount of processing on a thousand-node monster while the next you would be perfectly happy with two Nvidia GPUs for number crunching and visualization. In order to justify the purchase of a given computing capacity, you need to make sure you will have use for it.