Ruby - Handling 1 Million Concurrent Connections

qompiler · on Feb 20, 2013

I have done this before using Java 7 async nio. The performance would drop like a brick when data received actually needed some sort of processing. How does this implementation hold up when you need to perform a O(n) operation on received data? Experiment with different sizes of n to see how the performance holds up.

trailfox · on Feb 20, 2013

Exactly. Establishing 1 million connections is pretty easy in any language that supports async io, even scripting languages. The article is super-vague about the direction of the traffic, the nature and size of the messages and how much of the application is actually written in Ruby (vs. just being a glorified wrapper around redis and other systems written in c/c++) and the amount of work actually being done by the app. 179 requests per second is hardly something to brag about while you've completely saturated 8 cpu cores.

hawleyal · on Feb 20, 2013

Ruby is a wrapper around c.

dschiptsov · on Feb 20, 2013

The performance would drop like a brick when data received actually needed some sort of processing - priceless!

This reminds me another post from a guy who measured and graphed JVM performance in adding (or multiplying, can't remember) numbers. And he concluded, that, well, JVM is fast at doing dumb arithmetic in a loop.)

afhof · on Feb 20, 2013

I have to ask, is there even a practical purpose for this? Is there even some remote screwball application for having to have one machine handle 1,000,000 requests? One of the things I like about coming to HN is that the items on the front page are often actionable pieces of advice or clever and interesting hacks. I don't feel that "$LANG can do $LARGE_NUMBER of things" fits the bill. For example:

  C - Handling 1 Million Concurrent Connections 
  Java - Handling 1 Million Concurrent Connections 
  Javascript - Handling 1 Million Concurrent Connections 
  Go - Handling 1 Million Concurrent Connections

If it was 10^10 connections, now we are talking about some clever hacks to get that to work.

lucaspiller · on Feb 20, 2013

WhatsApp had over 2 million users connected to their (Erlang) server last year.

http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2...

Here is the same in Erlang for reference (from a few years ago, I would be interested to see if there is a more efficient way now):

http://www.metabrew.com/article/a-million-user-comet-applica...

e12e · on Feb 20, 2013

Thank you so much for that last link especially. Interesting stuff :-)

slivuz · on Feb 20, 2013

start with Ruby as backend for online games...

continue with Ruby as backend for audio/video chats...

consider Ruby for streaming podcasts...

etc. etc.

3amOpsGuy · on Feb 20, 2013

What happens when one of the fully loaded 1 million connection nodes goes bang? That's potentially a million users getting a poor experience.

Re-establishing a million connections at once is going to be hard on the network - the million were built up over a period of time previously yet now they're being re-established Big Bang style.

josephlord · on Feb 20, 2013

For any given user the probability of the one machine with everyone on it going bang is similar to the probability of the particular server that they were connected to in a horizontally scaled scenario. However the cost of redundancy may be higher if it is a replication of 100% of main system on the other hand a big system may be designed for high uptimes.

3amOpsGuy · on Feb 20, 2013

Would the probability not be less in this case? In general, less moving parts = less chance of outage. E.g. if a device is rated for 300,000 hours MTBF and you have 2 of them, their individual MTBF remains the same, but your chance of experiencing an outage in either one has doubled because you have 2 of them.

It's more the impact side of the risk equation i'm thinking of than the probability.

EDIT: typo

josephlord · on Feb 21, 2013

Depends whether looked at from the ops point of view or the end user point of view. You expressed concern about 1 million customers simultaneously having a bad experience. For a given end user if the hardware is equally reliable the odds of something happening are the same whether they are sharing with 1 million or 1 hundred thousand (or even have the server to themselves). On the ops side there is more to go wrong and failures will be more frequent but affect less end users each time.

The positive in the one big machine scenario is that you have potential to take strong efforts to keep it reliable. The advantage in the lots of machines scenario is that there is a better chance you have well tested failover solutions.

It is the combination of impact and risk that I am discussing.

mixedbit · on Feb 20, 2013

I don't fully get the point of this benchmark. Connection handling is operating system business, so you should get similar numbers with any framework that runs on the same OS and correctly uses epoll (or an equivalent system call).

willvarfar · on Feb 20, 2013

Tangent: back when I paid careful attention, all the epoll frameworks were level-triggered not edge-triggered.

dschiptsov · on Feb 20, 2013

Let's make a bit like a real world.) The test case could be like this:

A lookup. Get any reasonable big publicly available simple data-set and import it into any persistent storage (a sorted file is OK) and let a client perform a lookup for a row (preferably utf8 text ,)) then render a simple html table in response. As stupid as MVC.

or

Perform any simple lookup, but only for authenticated user (against some passwd-like file, to make it easy).

Then we could see how cool any JVM stuff or over-engineered OO ruby frameworks really are, with all that shinny graphs and smooth curves.

tmartiro · on Feb 20, 2013

Good job...

You can also open 1 million connections using one linux box a) increase local port range 1024 to 65535 b) setup 17 ip address. c) open from each ip address 58824 connections

adlpz · on Feb 20, 2013

What would be the point of that? The bottleneck here is not the amount of ports. TCP can handle concurrency over one single input port just fine.

The issue here is concurrency on the service software. If you have to launch a million instances to listen on a million different ports, you are doing it wrong.

Erwin · on Feb 20, 2013

An TCP connection is uniquely identified by a {local IP, remote IP, local port, remote port} tuple.

So if you are on 192.168.1.1 and want to connect to a specific port on 192.168.1.2 there aren't enough free port numbers to get 1 million connections. Thus the extending of the "ephemeral port range" (the local port number the kernel is allowed to assign) and addition of more local IPs.

gilgoomesh · on Feb 20, 2013

Except there's no assumption here that the 1 million connections are between just two computers. Clients are spread over 50 different EC2 instances (which each have a unique address). The host does not need more ports in this scenario and the clients are using 20,000 ports (possible without altering port allocation).

Erwin · on Feb 20, 2013

This was a reply to tmartiro's thread, where he says "You can also open 1 million connections using one linux box" and describes how you could do it a single instance rather than 50 separate ones; adlpz asked "what would be the point of that".

adlpz · on Feb 20, 2013

Well, of course, you can only have 65535 established connections sending data concurrently at a given point in time. But what I meant is, that is not the bottleneck at all, but the software behind handling all these requests, so tmartiro's comment was either pointless or sarcastic.

Note: This scenario is only valid for two computers talking to each other. As gilgoomesh said, if you have multiple clients you have virtually unlimited valid connection tuples (src addr, dst addr, src prt, dst prt).

slivuz · on Feb 20, 2013

sure, i tried, but connections comes slow... i wanted they to come like a tornado.

kenkam · on Feb 20, 2013

tl;dr OP throws 3 libs together, wrote a few scripts, made a few measurements. Look, it can handle 1 million connections.

I like what he has done but don't know why it made HN frontpage. I'm guessing it's because it has Ruby, 1 million, and concurrent in the title.

Not bashing the author, it's good work in its own right. I'm trying to understand how stuff like this makes frontpage HN. Where's the value of this article? That Ruby can handle 1 mil connections? Am I missing something (even if it's obvious?)

I think someone mentioned better real world test cases. I agree that would be a place to start. Perhaps he could send more meaningful data to clients. Maybe market data for some stocks or something. '\0' is not very useful after all!

mmahemoff · on Feb 20, 2013

There's a lot of Ruby developers on HN and many of them are interested to know how it can scale.

Benchmarking experiments of this nature are actually remarkable because they are so rarely done. Most scaling and performance principles still arise from conjecture. It's not trivial to set up a test like this.

Also, _this_ is the place to start in preparation for a real world test. The next iteration, maybe a more realistic test, is only a fork away.

kenkam · on Feb 20, 2013

Your forking argument won me over. Very good point.

agentultra · on Feb 20, 2013

I'm not a genius Ruby hacker by any stretch but if EventMachine is like any other "async IO framework" in other scripting languages then it is built on libev or libevent... so Ruby isn't really handling the connections, it's handling the callbacks. Pedantic, but important to note.

saraid216 · on Feb 20, 2013

> Pedantic, but important to note.

Pedantic, but "pedantic" actually idiomatically means "not important to note".

agentultra · on Feb 20, 2013

In what culture?

In my dictionary it's "of or like a pedant"; a pedant being someone who is overly concerned with minor details.

Still... noted!

saraid216 · on Feb 21, 2013

If the detail is minor, it's not important.

FooBarWidget · on Feb 20, 2013

EventMachine does not build upon libev or libevent. It implements all that stuff by itself.

kyledrake · on Feb 20, 2013

The practical purpose of this is to show that Ruby is perfectly capable of scaling to fit the needs of most companies. It kills the myth that Ruby is not capable of doing serious infrastructure work.

That is EXACTLY THE POINT of this. We use Ruby for a ton of real work, and we are sick of seeing it get pummeled on unfairly for the misconception that it is a slow language and you are guaranteed to have scaling problems if you use it. Productivity and performance are not necessarily tradeoffs, with Ruby and some good planning, you can get both.

darkarmani · on Feb 20, 2013

Is "179 requests per second" really an example of scaling?

trailfox · on Feb 20, 2013

> unfairly for the misconception that it [Ruby] is a slow language

Ruby is a slow language, certainly considerably slower than c/c++/c#/scala/java/etc. The question is whether the performance gap is big enough in your particular use case for it to matter.

papsosouid · on Feb 20, 2013

If that is the purpose, it fails miserably. I don't know of anyone who was looking to build a business on handling a million idle tcp connections. Ruby is a slow language, it is not a misconception. Having a million idle connections doesn't make a language fast.

0xbadcafebee · on Feb 20, 2013

LOL. A couple hundred requests per second? Is that something to brag about?

What the fuck is the point of opening a million connections if it takes you 1.55 hours to process one request from each connection?

slivuz · on Feb 20, 2013

mm, strange math.

179 means that while app holding and communicating to 1 million persistent connections it is still able to process 100+ standard requests per second. pretty enough to accept new clients to your online game, audio/video chat, podcast etc.

this graph shows how many requests per second app may process depending on amount of established persistent connections:

https://raw.github.com/slivu/1mc2/master/results/requests-pe...

trailfox · on Feb 20, 2013

I don't think your original post is very clear at all. I get the same number as the post you are criticizing: 1000000/179 / 60 / 60 = 1.55 hours.

Are there other messages being processed too? Which direction are the messages going? How big are they, what do they contain, how is the data processed. Also, if you have 1 million clients how do you handle all of them arriving at around the same time? How long did it take your system to get up to 1 million connections? Also how sure are you that each micro instance is sending and receiving the messages at exactly the expected rate and is not getting overloaded. What server type are you using for your central server?

0xbadcafebee · on Feb 20, 2013

"pretty enough to accept new clients to your online game, audio/video chat, podcast etc."

If 'pretty enough' means only allowing 179 new connections to your server per second, that's great. When you have a large event, everyone gets on your site at the same time, not to mention times like getting to work, lunch break, evening rush, etc. You ever hear of the slashdot effect? That's more than 200 requests per second.

Ignoring the poor connection time, you could only have 179 users actively using your app every second. Out of a million. %0.000179 of your user base. Talk about really shitty user engagement.

I'm not even going to talk about the incredibly bad idea it is to host a million connections using one server. The idea that "most websites" only do "about 100 requests per second" is laughable. Sure, the average may be 100, over a month, but that's nothing compared to peak times. Try tens of thousands per second. A high-traffic site might do something on the order of thousands of database writes per second. Which, when all those connections come in, will kill your database servers, which backs up your frontends, which is why you have to have fast forward-facing pre-loaded cache. But I digress.

Focus on scaling your application to actually handle traffic before you obsess over concurrent connections.

potomushto · on Feb 20, 2013

I am excited about ec2-fleet tool. How expensive is the tests in your case?

slivuz · on Feb 20, 2013

a micro instance costs 2 cents per hour, so $1 per hour for 50 instances.

nivertech · on Feb 20, 2013

what's the name of the program with CPU/memory graphs in the right-upper corner above htop on slides ##13-171?

slivuz · on Feb 20, 2013

it is Gnome's System Monitor

nurettin · on Feb 20, 2013

For JRuby Streaming I have used Torquebox (JBoss) successfully.

headius · on Feb 21, 2013

The author asserts that JRuby would consume more than 15GB of memory without providing any justification. Perhaps it would be better to actually try it rather than just making sweeping generalizations.

headius · on Feb 21, 2013

Here's a case study showing over 500k connections to a Java instance in 2.5GB of memory using NIO. JRuby's implementation of Ruby's IO is implemented directly atop NIO. So yeah...that 15GB assertion is nonsense.

http://urbanairship.com/blog/2010/08/24/c500k-in-action-at-u...

slivuz · on Feb 21, 2013

you right, sorry for pointless blaming, my bad, had to check it first. updating orignal post

marketer · on Feb 20, 2013

Doesn't Ruby have a limit of 1024 open file descriptors? If some method in Ruby's standard library calls 'select' internally, with the 1024 limit, what happens?

slivuz · on Feb 20, 2013

it will segfault. that's why EventMachine using epoll on linuxes and kqueue on bsd

FooBarWidget · on Feb 20, 2013

I/O is handled through EventMachine, not the normal Ruby I/O calls, and can therefore scale arbitrarily.

marketer · on Feb 20, 2013

It is a little worrying that if some (perhaps inexperienced) developer arbitrarily calls a method that invokes Ruby's select, the process will crash mysteriously.

FooBarWidget · on Feb 26, 2013

It won't crash. On many platforms, Ruby uses special select() hacks to extend the number of file descriptors select() can handle. On OS X it can apparently handle 10556 file descriptors. If you go over that, Ruby apparently simulates an EMFILES error.

toddwahnish · on Feb 21, 2013

very cool experiment

moneypenny · on Feb 20, 2013

dumb question I thought the GC tunings, e.g. RUBY_HEAP_MIN_SLOTS, were only available in REE, rather than MRI?

slivuz · on Feb 20, 2013

starting with 1.9.3 they are also available on MRI, though not sure about 2.0.0 as there GC are highly refactored/optimized.

> Starting with Ruby 1.9.3, the GC in mainstream ruby can also be tuned

http://www.web-l.nl/posts/15-tuning-ruby-s-garbage-collector...

ksec · on Feb 20, 2013

Would love to see the updated test once Ruby 2.0 is out in a few days time.

moneypenny · on Feb 20, 2013

Brilliant - thanks, hadn't found that info.

assente · on Feb 20, 2013

it would be nice to compare the different ruby implementations with this test.

assente · on Feb 20, 2013

interesting experiment!