If you haven't taken a look at the ideas in SEDA, it's definitely worth the time. Most modern web apps incorporate at least some pieces of it to great effect.
// check if we're toobusy() - note, this call is extremely fast, and returns
// state that is calculated asynchronously.
if (toobusy()) res.send(503, "I'm busy right now, sorry.");
Saying it is calculated "asynchronously" seems pretty confusing in this context. Maybe "is cached at a fixed interval"?
Side note: caching at a fixed interval is a great way to save CPU cycles. If you're serving 100req/s and you had to grab the current time within 1sec accuracy for them, you could have 1/100th the amount of Date calls by getting the time each second instead of each request.
Why you would do this, instead of putting your node server behind a more battle tested proxy, like Varnish or HAProxy, and limiting the number of simultaneous connections.
Do both: each layer and service in an application should be able to manage its own load. Helps prevent cascading failures too.
Read up on Netflix's architecture, where they've talked about this a lot. And the book "Release It!: Design and Deploy Production-Ready Software" (Nygard), though Java-oriented in its examples, covers the concepts well.
Our problem is the the number of simultaneous connections we can support isn't static, it varies depending on the type of traffic bursts (i.e. New vs. Returning users).
But agree that a higher level proxy is useful - because this way your overloaded application doesn't even need to deal with traffic (we hope to use toobusy to have applications instruct the routing layer to temporarily block / reroute traffic in times of load).
The reason that I asked that is I feel that most applications end up needing something at that layer eventually anyways, but not knowing exactly what level of traffic results in too much load is a good reason to do this.
I think this would simplify implementing a load-balancing proxy in some ways. If a server responds with a 503 generated by node-toobusy, the load balancer will know immediately to reroute the request to another server, rather than having to wait for a timeout or some other threshold.
If you only need one server that's going to be up to a lot faster than putting anything in front of it.
A single server setup might be simpler, but it won't be faster. Varnish serving a cached page from memory is going to be faster than 5 asynchronous calls that take 5ms of CPU time (and filesystem I/O in the case of the database and template given in the example). With varnish, even with a 1 second TTL (and 1 second grace), your first request will take the 5ms hit, but the next 199 for that second will be served from memory.
Now with Varnish serving 199 out of 200 requests from memory, if your backend is still toobusy, by all means serve a 503, and Varnish can cache that too.
I think of caching for apps like clothes for people...while you could survive naked, it's more comfortable with clothes, and you're protected from heat, cold, etc. Of course there is the problem of being over/under dressed, but you have to wear something.
That's super unnecessary for a single server, you can literally just use a variable outside of your request handling as a cache.
var cache;
module.exports = function(request, response) {
if(cache) {
return response.end(cache);
}
// get my data from wherever
cache = the_data;
return response.end(cache);
}
Now 199 out of 200 requests are from memory, there are zero extra moving parts, and you're using a cool part of the language instead of a 3rd party tool you have to select and configure.
How do you selectively serve the cached response to some users based on cookies, header, etc., and expire it after a set amount of time? How would you gather stats about how many cache hits vs misses you have? How do you serve the cached response when your server is pinned? These are problems 3rd party tools have solved.
You can expire things with setInterval and a counter. You serve a cached response if your server has a cached response, or if you are using it as a fallback you would combine it with something like toobusy maybe so you have a functional (if not fresh) 'under load' page.
Where it gets really fun with NodeJS is you can do all of your data work outside of the requests so you can pull all of your content out at the start and refresh it on an interval independently of the users, which can eliminate some or all of their trips to the database if you're lucky and it fits in ram and is viable etc.
The less moving parts that need to cooperate to serve your site the better - most things don't warrant a deep stack of technology to serve HTML and run CRUD operations.
Very few sites are so dynamic that something can't be cached for at least a second. Even a heavily dynamic site like HN with people commenting all the time could at least cache for logged out users.
I'd prefer that solution too, especially since such proxies can mitigate some types of DoS attacks (e.g. known patterns in the "Referer" header, or from known IP address ranges) and load will generally be much lower if some of the popular content is cacheable.
If you are writing server software, I recommend Michael Nygard's book "Release It!: Design and Deploy Production-Ready Software". Measuring event loop lag sounds like Nygard's "Circuit Breaker" pattern to avoid cascading failures.
The book's examples and text are all Java, but the lessons are applicable anywhere. He offers many scalability patterns (resource pools) and anti-patterns (runaway log files) with interesting stories from his experience debugging real systems. I especially liked his story about debugging a crash in an Oracle DB driver that caused unexpected Java exceptions to be thrown from java.sql.Statement.close(), which quickly blocked a DB connection pool.
Apache's MaxRequestsPerChild directive is not at all about limiting load; it says after serving N requests (or N connections in a keep-alive setting), the worker kills itself and another may be spawned in its place (subject to spare server/thread config). This mostly helps keep slow memory (or other resource) leaks in check by starting fresh every so often.
Did you mean to link to MaxClients[1]? MaxClients sets
the maximum number of simultaneous connections; any additional connections will be queued by the OS socket api (subject to listen backlog, etc).
I think waiting to accept sockets that you can't handle is a better solution than either accepting a socket to return an error message, or (much worse) accepting a socket that overloads your system. Unfortunately, sometimes it can be hard to set MaxClients to the right value that isn't so big that you get reduced througput, or too small that you don't use all your resources. (One thing that does help you get to the right number for MaxClients is to set MinSpareServers to the same value as MaxClients; you will avoid issues where MaxClients is too big and you start swapping during high load, but you don't notice it because things are fine with a small number of servers).
This is a neat little technique and the fact that the developers did this puts nodes.js higher in my stack of "technologies to consider".
A harder problem is to solve the same "melt-out" problem for the entire gamut of servers that are usually found in large serving stacks (various web servers, relattional and nosql db servers etc). The users are usually not the developers of these pieces of technology and there's never enough dev-bandwidth available to actually implement self-tuning techniques like the one mentioned here.
For such situations I once came up with a "little" trick to limit the damage in sudden overload situation. It allows the admins of said servers to tune the threadpool and request queue lengths in an intelligent way (with the help of historical performance data that most shops should have). I mentioned it in this discussion thread here:
http://www.linkedin.com/groups/Whats-generally-used-methodol...
Forgive my ignorance, but I thought the whole point of Node.js was not to meltdown.
Are there some recommend resources I could learn more from about the out of the box scaling vs tactics and strategies that benefit growth? I love learning how this problem is tackled anywhere. :)
There's only so much load any piece of software running on a given and finished set of resources can handle properly... Once limits are reached, things will misbehave.
I agree completely, have lots of experience in complex hosting.
With all software having it's limits (assuming hw is relative) has a starting and ending point of when you need to start tweaking it, and in what direction. I was hoping to read more about that.
I've been looking for a solution for this, every time I had used 'nodemon' it couldn't handle the load, even if it was quite insignificant... perhaps I had misconfigured something, but I'm pleased to find this, good timing!
Instead of returning a 503 page it could attach a ticket (by cookies) and say: "Your request will be solved in 55 seconds, be patient". The time calculation is based on statistics in the last 10 minutes or 1 hour...
The problem is not the lag before resolving the request, it's the lag the user finding out that the request will not be solved. Underlying this is the fact that the server simply does not have enough power to resolve all requests, so some must fail.
The trick of using event loop lag is pretty neat, but the general strategy is IMHO a must have for any service. By aborting early and somewhat gracefully, the user knows ASAP, receives a controlled message that suggests she should not quickly retry, and the server can avoid trying to do part of the work for the request that is not going to be completed.
This is pretty cool. It seems like 10 seconds is still a pretty long time to wait for a response when connections over the max limit are being dropped. Is there a reason for this or some way to tune it?
Using event loop lag is pretty clever. I think I first learned about event loop lag the hard when trying to do DOM animations that relied on setTimeout intervals without checking the current time.
It'd be interesting to experiment with pairing this to a proof-of-work system (e.g. Hashcash) to smooth the transition between toobusy and !toobusy. With an additional signalling mechanic, it could help the surge against peers in a cluster when one server begins to struggle.
Though you'd need a custom client, so maybe for appcache'd webapps or mobile apps, but not typical webapps. Or browser vendor participation (not serious (ok, half serious)).
well it's still pretty bad ass and rock star that you can measure server load by looking at "event loop lag", that's actually really neat... how do you do that with threads?
Interestingly, this is one piece of a broader strategy for building scalable web applications, addressed in Matt Welsh's thesis (SEDA):
http://www.eecs.harvard.edu/~mdw/proj/seda/
If you haven't taken a look at the ideas in SEDA, it's definitely worth the time. Most modern web apps incorporate at least some pieces of it to great effect.