Hacker News new | past | comments | ask | show | jobs | submit login
Google Shows How To Scale Apps From Zero To One Million RPS, For $10 (forbes.com/sites/reuvencohen)
126 points by zafirk on Nov 26, 2013 | hide | past | favorite | 68 comments



I only skimmed the gist with the instructions for reproducing the test [1], but in that skimming I could not figure out what the request and response were in this simulation. Were these responses 100 bytes, 1k, 10k, 100k? I guess small responses, but I'm not certain.

I know the goal of the test is to demonstrate how capacity can be scaled upward in very little time and with comparatively little effort. And for that this demonstration is impressive!

That said, the requests per second number is not especially impressive given that a modest single server can easily saturate gigabit Ethernet with 100-byte responses (and it's even easier to do so with larger responses) [2]. I am left wondering, does the $10 cost cover the cost of server instances and bandwidth? If so, that is a very good deal. The bandwidth charges for exceeding the capacity of a gigabit Ethernet connection (1M RPS with even the most trivial requests requires more than 1 gigabit) would be substantial with many hosting providers.

[1] https://gist.github.com/voellm/1370e09f7f394e3be724

[2] http://www.techempower.com/benchmarks/#section=data-r7&hw=i7...


Indeed. On a singular i7 machine with a gigabit connection, a number of frameworks are in the 400,000-600,000rps range when returning super trivial responses. In comparison, from the original Google blog post:

To demonstrate scaling of the Compute Engine Load Balancing fanout we used 200 n1-standard-1’s Web Server running Apache v2.2.22 on Debian 7.1 Wheezy Images. Users are encouraged to use larger VM types for better single machine backend web serving, however here we demonstrated the scaling of the load balancer to backends and were not concerned with the backends themselves using every cycle to serve responses. Each backend web server received ~5K requests per second, which is an even distribution.

So, to match the peak rps of solid (but not top of the line) dedicated hardware appears to take upwards of ~120 instances of n1-standard-1 (assuming that it scales linearly, of course). Not a trivial number.

That said, I am impressed at how quickly this can scale up. If you have a site that normally runs fine on a couple of instances, but occasionally sees massive spikes in traffic, this could make sense. And from a purely engineering point of view, GCE and EC2 are quite interesting.


Couldn't agree more. As I (poorly) tried to point out in my earlier off-the-cuff comment, requests-per-second is a meaningless measure without the context of the size of those requests. The Forbes repost and the original google marketing article are both vague on details, and appear to conflate requests-per-second with throughput.


Marketing Post.. Ouch :) One of the things I love about working at Google is we in engineering are empowered to share our knowledge. For those wanting the details check out the step-by-step instructions in the Gist (https://gist.github.com/voellm/1370e09f7f394e3be724).

The goal was to measure the speed of scaling and load balancing vs egress. Bigger egress would not change the load balancing decisions.

Anthony F. Voellm Google Cloud Performance Engineering Manager @p3rfguy


I wish there were more posts here on HN where the OP linked items such as yours, but alas, clickbait rules apply.

I think I'm going to play with your script package over the long weekend in our dev cluster - Thanks! And nice work!


I saw you dropped this line in the comments under the article. Guess this would be a better place for it :)

"PS... Cloud Performance is hiring :)"


We really wanted to focus on the scale of operations for this article so we used a full http request / response. The payload of the response was 1 byte.

Anthony F. Voellm Google Cloud Performance Engineering Manager @p3rfguy


The blog post says 1 byte responses excluding HTTP headers. Since it was a standard Apache configuration, the headers would probably be something like 200-300 bytes.


Thats about right. In hindsight a tcpdump or other trace might have been useful to include.

Anthony F. Voellm Google Cloud Performance Engineering Manager @p3rfguy


$10 for a 10 minute test, okay. A month of 200 US-hosted n1-standard-1 machines would cost 200 * $.115 * 30 (days) * 24 (hours)... over $16,000. That doesn't include load-balancing, bandwidth, or any additional charges there may be. The price shouldn't be mentioned anywhere in the article.

Demand is elastic of course, and if you really want to scale in a cost-effective manner you also need to do auto-scaling. As far as I can tell (I have no direct experience with GAE), it's much easier on AWS. It would also be interesting to see if you can scale your pool of webservers from 1 -> 200 faster on AWS or GCE.

The article does quote @cloudpundit, who hits on the true point of the exercise: Relevance of GCE LB load test: With AWS ELB, if you expect a big spike load, must contact AWS support to have ELB pre-warmed to handle load. I would also guess that Amazon is working to improve ELB to behave similarly, especially now that Google's product has less restrictions than theirs.


GCE instances are charged in 1 min increments (https://cloud.google.com/pricing/compute-engine), not 1 hour increments like AWS (http://aws.amazon.com/ec2/pricing/).

1 min increment billing enables you to spin up massive Web clusters to handle spikes or Hadoop clusters so big that you can process the entire dataset it a few minutes, while paying less than you would for a smaller AWS cluster that's billed by the hour.


It's definitely finer-grained, but there's still a 10-minute minimum per instance: billing is only per-minute above 10 minutes. So if you run a massively parallel 2-minute Hadoop job, you pay num_instances x 10min. Admittedly that's still a win, since AWS would bill you for 6x as much instance time. But it's not a 30x differential as you might guess from the headline "hourly" vs. "per-minute" pricing.


A fair point, but the scaling granularity advantage is a powerful cost win nonetheless


I don't think $16,000 a month is really that bad if you factor in what type of money a consistent 1 million reqs/sec would probably give you.

1m req/s is:

60 million+ reqs an hour

1.4 billion+ reqs a day

43 billion+ reqs a month

It would be a weird situation to be serving that much traffic but not being able to afford hosting.

It would be cool if someone could estimate the cost of what it would take to sustain 1m reqs/s with dedicated hardware or maybe an unmanaged VPS cluster.


The point here isn't necessarily about the actual elasticity of the instances themselves, but the load balancer in front of them. With ELBs, they can only handle so much traffic so quickly. You literally cannot just point a million requests per second at it, even if you have 200 instances supporting it, and handle them all right off the bat. The ELBs themselves scale with the load, and currently, that scaling is slow during spikes.


I am pretty excited and yet shocked to see what Compute Engine has evolved into. The original Compute Engine was about scaling (as cluster) in scientific experiment and yet this news makes me think Compute Engine is becoming the new App Engine but with full control of the VM (plus the amazing autoscaling feature). I always like to control my own VM because I can do much more with a VM than a sandbox (to me an App Engine is just a sandbox loaded with X framework and X database). I have always wanted to work on Compute Engine :(

This is an interesting marketing stragery. People who wish to launch a VM can choose CE and people who just want a sandbox quickly they can use App Engine? Though I am really skeptical about the future of App Engine if the CE became cheaper. I am sure if that happens, Google will do everything it can to migrate things over to CE. This is probably many years down the road...

I still think CE is really good for computations.


App Engine is not just a place to run your code. It has a nice deployment system with the ability to flip between versions instantly. It provides a few great pieces of core infrastructure: the datastore, task queues, and memcache (there are many others, but those are the essentials). The SDK gives you a stand-alone development server for testing locally. If you want to build web apps (or a backend for mobile apps), App Engine is a great way to go.

Compute Engine is for all the other stuff that you can't run on App Engine.

Incidentally, the datastore is now available as a stand-alone service, Google Cloud Datastore: https://developers.google.com/datastore/ This should benefit Compute Engine users.


I like App-Engine and now that there are open-source, API compatible alternatives, there is no lock-in to worry about either.

The biggest drawback for App Engine is lack of async support. The only ways to scale are: multiple-threads (slow) or multiple instances (costly).


Or you can use Go and get high performance concurrency in a single-threaded instance. The best of both worlds. ;-)


Can you explain this a bit more? Specifically what async type tasks can you not perform?


The main difference, for me, is that with EC2 (and GCE), my team manages the servers. With App Engine, it's Google's Site Reliability Engineers who do it.


Compute Engine is not for scientific experiments, or at least not specifically. Perhaps you are thinking of Exacycle? http://research.google.com/university/exacycle_program.html


I'm not sure about Google's CE, I don't really know what it can do, but Amazon's AWS has had for some time auto-scaling capabilities by means of auto-scaling groups coupled with their ELB load-balancer. We used it and it worked out great.



Having beta tested the Google Compute Engine while working at Google, I have only good things to say about it. They've come along way in an extremely short space of time and knowing what they have internally, this is just the beginning. They'll far surpass AWS for performance, ease of use and cost.


> They'll far surpass AWS for performance, ease of use and cost.

Most likely. But those aren't the only things developers care about when looking for PAAS.

There is that whole "Platform" aspect. And AWS destroys Google in this respect. It has far more offerings and more importantly it has a very large ecosystem of companies who will be in the same data centre who you can leverage e.g. Iron.io.


Don't forget support. I know I can get decent AWS support if I pay Amazon. Google? HAH!


We actually have come a long way in Support and now offer affordable paid support options and some very high end packages for mission critical operations. Check out https://cloud.google.com/support/packages where we have the details.

-Brian Head of Marketing, Google Cloud Platform


I know these are entirely different departments, and likely have little correlation to reality.. but google has a worse track record than microsoft for supporting some of their most popular products (iGoogle, Reader, free apps account levels)... it's hard for me to trust services to be around in the long run (this is probably more of an emotional bias here).

I know that there are api compatible systems out there, and one could roll your own (so to speak), it will get very interesting over time. I'm also curious what happens in the application space for docker.io based cloud offerings in the next year.


Thanks for mentioning that. Google will probably do something unique with support as the demand increases. To the other comments above; yes aws has a wider range of services available at the moment but they've been at this publicly since 2006. There are services within Google used by devs, managers, etc that blow away anything amazon has. There's a projected timeline for the release of a great deal additions to GCE. 2014-2016 will be game changing in the cloud industry.


The only reason I can see for this being this high on the front-page [18 points and only one comment(from a googler)] is the google employee block vote (apparently tech-savvy isn't a prerequisite in the google upvote marketing department):

1 million requests per second is 20 times greater than the throughput in last year’s Eurovision Song Contest, which served 125 million users in Europe No, r/s != Gb/s.*

(Cue the down vote from the google up voters)


Have you worked extensively with ELBs? They need pre-warming in advance of significant traffic changes, they can't handle spikes well because they take many minutes to respond to changes in traffic volume, and they use an ever-changing set of IP addresses for your load balancer nodes.

The GCE load balancer has none of these problems, which makes it a huge advantage over AWS and ELBs.

Disclaimer: I'm an engineer at Heroku. We manage dozens of ELBs for ourselves, and thousands of them for our customers.


>They need pre-warming in advance of significant traffic changes

Why?


Because they consist of a set of EC2 instances, the vertical and horizontal scale of which is determined automatically based on the average traffic profile of each node. Once the traffic has increased enough to warrant a scaling event, it takes minutes for new ELB nodes to come online and go into DNS rotation before they can start serving traffic.


It's obnoxious that AWS hasn't developed an API call or web interface option for the ELBs to pre-warm them yourself, vs having to contact AWS support to get the pre-warming done "manually".


Out of interest, what is a rough idea of the total requests per second on heroku for everything? All your nodes or whatever you call them? Dynamos?

Is 1,000,000 request per second a stupid, pointless number as no-one ever gets 1,000,000 requests per second or is this some sort of meaningful number?


I can't release any of our traffic numbers, but I can tell you that 1M req/s is an enormous number. For some context, here's a post from Netflix from the end of 2011 where they state that their API received 20,000 req/s at peak: http://techblog.netflix.com/2011/12/making-netflix-api-more-...


Yes, but! The impression I get from this 1 million req/s test is that no actual logic is happening on the backend. E.g. no database queries, no business logic, etc - basically a noop call.

As we saw when running the techempower benchmarks, simply going from the plaintext test to the single database query dropped the best performer from ~600,000 req/s to ~100,000 req/s. Throw in a bit more business logic, another query, and a slightly heavier response, and it is easy to imagine that 1 million req/s now sitting much nearer to 20,000 req/s.

My point being that, that 1 million req/s is a very optimistic number when used in such a comparison. Is it still an impressive max throughput? Yes. I just don't want anyone to think that they can now, say, host 50 netflixes on this setup.

Note: I realize you probably weren't meaning to directly compare those two numbers, but it somewhat read that way. I definitely do appreciate the context though - quite interesting to know that the netflix API was peaking at ~20,000 req/s in 2011.


This is not the point of the test, the test is about showing you that the load balancer in GCE can handle that many requests per second and with a single IP address. Whatever the machines are doing behind doesn't matter since the load balancer job is to handle a ton of traffic. This is practically the only case in which responding with 1 byte makes sense in the test.


I completely get that. I responded to the parent because he introduced the 20,000 req/s number as a comparison point.

The Google test is both a theoretical max throughput (that one wouldn't reach under basically any normal use case) and a test of the load balancer capabilities. The Netflix 20,000 req/s number is, instead, a real use case example.

My point was that one shouldn't directly compare those numbers and say, for example, that this GCE setup has 50x better throughput than Netflix.

I imagine that if Netflix were to stub all of their API calls with noops that returned 1 byte responses, they would be able to handle significantly more than 20,000 req/s. Basically, I don't think we actually disagree here.


I don't work at Google, but when people say something like "cue the downvotes" or "I know I'm going to get downvoted," I proceed to downvote them because that's an incredibly stupid and meaningless thing to say. In fact, it reeks of arrogant stubbornness ("I don't care about the community, I know I'm right, etc." Knowing you're right is not bad in of itself, it's being arrogant about it that is).

So I downvoted you.


Thanks, but I was specifically referring to the same voting ring that appeared to have put the article on the front page in the first place. I would hope the rest of the community would recognise a blogspam repost of a marketing puf-piece puff piece when they see one.


I found it interesting and I upvoted the article. So just because you don't like the company that the article is published by does not mean the article is a marketing puff piece.


Anything that can process 1MM req/sec gets upvotes from me.

1 million requests per second is 20 times greater than the throughput in last year’s Eurovision Song Contest, which served 125 million users in Europe No, r/s != Gb/s

Not sure what the point you think you are making is here, but the Eurovision site was tested to 50,000 req/sec[1]. 50000 * 20 = 1MM

[1] http://googlecloudplatform.blogspot.com.au/2013/05/how-scalr...


I don't know man, someone's codepen with some pretty CSS is pretty high on the front page with 10 pts and 0 comments.


Throughput is a measure of work over time. So here I was referring to throughput in terms requests per second and not Gb/s. So it was a direct comparison of 1M RPS (Compute Engine Load Balancing) vs 50K RPS (DNS Load Balancing).

Anthony F. Voellm Google Cloud Performance Engineering Manager @p3rfguy


To add a bit of context to Anthony's post here (I'm the guy who wrote the blog post on the Eurovision).

GCE Load Balancing uses a single IP address, you can point your DNS there and forget about it.

What Anthony's post shows is that this IP address will be able to serve 1 (upwards?) of 1 million requests per second.

This matters because you have control over scaling your backend (design properly, add more instances), whereas you don't have full control over scaling your frontend.

Indeed, a problem we had during Eurovision was that mobile providers (it was a mobile app) would cache the IPs of our frontend Nginx servers, so scaling those wasn't as easy.

So this new GCLB essentially "solves" scaling your frontend. That's something I'd care about ;)

Hope this helps shed some light here!


DNS load balancing isn't all bad. Google uses it to host www.google.com for example ;)


I didn't upvote the submission, I didn't downvote you - but if you call people shills when they are not, some of them might. And I find 1M R/s impressive, but you can ignore it if you think it is trivial.


I keep checking to see whether or not I can run my own images yet but Google Compute Engine so far supports less things than even Digital Ocean.

What if I don't want to run your version(s) of CentOS/Debian, guys? Hmm? (I hope somebody from there reads this) :)

Still, its a very very interesting offering and something to keep an eye on. I bet nobody here is using it yet because its still early days and the thread devolves into such old classics like "you'll never be able to reach a human at Google" and "they changed up their pricing with Google App Engine that one time and my app was no longer free to run all of a sudden!". Sigh.

I personally think it would be great if you could spin up the same instances on either cloud and load-balance/fail-over as needed/cheapest. Docker makes it doubly exciting.


Stay tuned - Getting custom image support and custom kernels so you can run things like Docker is just around the corner.

-Brian (@bgoldy)


You can build, create, deploy and use your own images. Check out https://developers.google.com/compute/docs/images

Anthony F. Voellm Google Cloud Performance Engineering Manager @p3rfguy


AWS has a really interesting feature in their auto-scaling groups, coupled with their ELB (elastic load balancer). You can configure an auto-scaling group with a variable number of instances that ads or removes instances automatically, based on the ELB's measured latency or the number of requests coming in or many other such metrics.

This works out great and results in cost savings, as it can survive spikes, plus during the night the traffic is at most half or even less than the traffic you get during the day. I had a setup that was handling over 30,000 reqs/sec and during the night it kept about 6 h1.medium instances active, while during the day it could go upward to 20 instances, but was usually stable at around 14 instances.

This article mentions ELB, but I don't understand - does Google's Cloud Compute offer something similar? Can one vary the number of instances based on the incoming traffic or other metrics?


Google's load balancer offering doesn't need to prewarmed was the main point. Its a fair point, having to prewarn ELBs isn't exactly fun, form takes about 10 minutes to fill out. If you're doing TV ads, lots of stuff is last minute. You can prewarm it yourself by throwing artificial traffic at it but you always risk just taking down the ELB if you ramp up to fast. Weird things happen to ELBs if you spike them with big bursts of traffic, one of service providers had 24 hour backlog of web hooks that they sent out all at once. It caused our ELB to fail all the instances even if we brought up new ones. Also during load you can get other sites to show up on your ELB. Had asp.net error page show up on ours. We run rails. Wish amazon would scale up ELBs instances based on the number instances you have behind it using the ratio calculated on past usage.


Google Compute added an ELB equivalent (the "GCLB"), but it doesn't have autoscaling built-in.

To get autoscaling, you can either roll your own — Google explains how here[0] —, or use a multicloud autoscaling solution (disclaimer: I work for such a provider).

[0]: https://cloud.google.com/resources/articles/auto-scaling-on-...


Thanks for the link. I wonder how well it works. Did you try implementing it?


Well, that's what our software does too, so I didn't really have an incentive to do so!

Now, judging from how busy the engineers are here, I'd say this is a bit harder than it seems!


RPS and $ are not interconvertible units, unless that 1M RPS is financed by the interest on a $10 investment.


This is the reason that our site. http://plexisearch.com uses Google App Engine.

Google has allowed us to run "Code" rather than worry about infrastructure. Because it is an "Engine" we don't have to configure much, and the autoscaling is fantastic.

Google Edge Cache means we get Better than CDN performance on static assets.

There are limitation to AppEngine, but because we can also leverage virtual servers we can create a hybrid environments that let us do things AppEngine won't like install C Libraries, or run Windows (we aren't doing that, but we could).

We have been very happy with Google App Engine and since we are running millions of pages a day through our Natural Language Engine it has worked out really well for us.

-Brandon Wirtz CTO Stremor.com


AFAICT this article is talking strictly about Google Compute Engine and its GCLB load balancer product, not about App Engine.


How do you think App Engine works? AppEngine runs on the same hardware and the same process described as a how to is nearly identical to what is obfuscated and provided to AppEngine users via a GUI.



The gist describing how to reproduce the test yourself: https://gist.github.com/voellm/1370e09f7f394e3be724


The big thing they're crowing about is because you can spin something like this up quickly on demand in the G cloud.

Most sites aren't remotely close to this artificial traffic pattern (1 packet request, 1 packet response).

It's kinda cool from an L4 load balancing perspective that it's only one fault tolerant IP address. In terms of L4 LB throughput though, a single box with IPVS will happily do 1M pps.


BTW, the Gartner analyst (Lydia Leong[1]) who's tweet was quoted is worth following. She tells it like it is, with no bullshit (read some of her stuff about the various OpenStack debacles for example).

[1] https://twitter.com/cloudpundit


The cake is a lie, they don't teach howto scale yourself. They teach howto use THEIR infrastructure to scale. I mean it's still really useful to know and all, but the headline makes the impression that Google teaches you howto scale your own company's App from Zero to 1Million RPS.


This is misleading, $10 is just for the duration of the test. Roughly $58k/mo if $1.33/min.


you do need context... a million things a second is not a great metric. these are small numbers compared to a lot of things. e.g. how many pixels your graphics card is pushing right now.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: