Hacker News new | past | comments | ask | show | jobs | submit login
Ruby - Handling 1 Million Concurrent Connections (github.com/slivu)
133 points by slivuz on Feb 20, 2013 | hide | past | favorite | 58 comments



I have done this before using Java 7 async nio. The performance would drop like a brick when data received actually needed some sort of processing. How does this implementation hold up when you need to perform a O(n) operation on received data? Experiment with different sizes of n to see how the performance holds up.


Exactly. Establishing 1 million connections is pretty easy in any language that supports async io, even scripting languages. The article is super-vague about the direction of the traffic, the nature and size of the messages and how much of the application is actually written in Ruby (vs. just being a glorified wrapper around redis and other systems written in c/c++) and the amount of work actually being done by the app. 179 requests per second is hardly something to brag about while you've completely saturated 8 cpu cores.


Ruby is a wrapper around c.


The performance would drop like a brick when data received actually needed some sort of processing - priceless!

This reminds me another post from a guy who measured and graphed JVM performance in adding (or multiplying, can't remember) numbers. And he concluded, that, well, JVM is fast at doing dumb arithmetic in a loop.)


I have to ask, is there even a practical purpose for this? Is there even some remote screwball application for having to have one machine handle 1,000,000 requests? One of the things I like about coming to HN is that the items on the front page are often actionable pieces of advice or clever and interesting hacks. I don't feel that "$LANG can do $LARGE_NUMBER of things" fits the bill. For example:

  C - Handling 1 Million Concurrent Connections 
  Java - Handling 1 Million Concurrent Connections 
  Javascript - Handling 1 Million Concurrent Connections 
  Go - Handling 1 Million Concurrent Connections
If it was 10^10 connections, now we are talking about some clever hacks to get that to work.


WhatsApp had over 2 million users connected to their (Erlang) server last year.

http://blog.whatsapp.com/index.php/2012/01/1-million-is-so-2...

Here is the same in Erlang for reference (from a few years ago, I would be interested to see if there is a more efficient way now):

http://www.metabrew.com/article/a-million-user-comet-applica...


Thank you so much for that last link especially. Interesting stuff :-)


start with Ruby as backend for online games...

continue with Ruby as backend for audio/video chats...

consider Ruby for streaming podcasts...

etc. etc.


What happens when one of the fully loaded 1 million connection nodes goes bang? That's potentially a million users getting a poor experience.

Re-establishing a million connections at once is going to be hard on the network - the million were built up over a period of time previously yet now they're being re-established Big Bang style.


For any given user the probability of the one machine with everyone on it going bang is similar to the probability of the particular server that they were connected to in a horizontally scaled scenario. However the cost of redundancy may be higher if it is a replication of 100% of main system on the other hand a big system may be designed for high uptimes.


Would the probability not be less in this case? In general, less moving parts = less chance of outage. E.g. if a device is rated for 300,000 hours MTBF and you have 2 of them, their individual MTBF remains the same, but your chance of experiencing an outage in either one has doubled because you have 2 of them.

It's more the impact side of the risk equation i'm thinking of than the probability.

EDIT: typo


Depends whether looked at from the ops point of view or the end user point of view. You expressed concern about 1 million customers simultaneously having a bad experience. For a given end user if the hardware is equally reliable the odds of something happening are the same whether they are sharing with 1 million or 1 hundred thousand (or even have the server to themselves). On the ops side there is more to go wrong and failures will be more frequent but affect less end users each time.

The positive in the one big machine scenario is that you have potential to take strong efforts to keep it reliable. The advantage in the lots of machines scenario is that there is a better chance you have well tested failover solutions.

It is the combination of impact and risk that I am discussing.


I don't fully get the point of this benchmark. Connection handling is operating system business, so you should get similar numbers with any framework that runs on the same OS and correctly uses epoll (or an equivalent system call).


Tangent: back when I paid careful attention, all the epoll frameworks were level-triggered not edge-triggered.


Let's make a bit like a real world.) The test case could be like this:

A lookup. Get any reasonable big publicly available simple data-set and import it into any persistent storage (a sorted file is OK) and let a client perform a lookup for a row (preferably utf8 text ,)) then render a simple html table in response. As stupid as MVC.

or

Perform any simple lookup, but only for authenticated user (against some passwd-like file, to make it easy).

Then we could see how cool any JVM stuff or over-engineered OO ruby frameworks really are, with all that shinny graphs and smooth curves.


Good job...

You can also open 1 million connections using one linux box a) increase local port range 1024 to 65535 b) setup 17 ip address. c) open from each ip address 58824 connections


What would be the point of that? The bottleneck here is not the amount of ports. TCP can handle concurrency over one single input port just fine.

The issue here is concurrency on the service software. If you have to launch a million instances to listen on a million different ports, you are doing it wrong.


An TCP connection is uniquely identified by a {local IP, remote IP, local port, remote port} tuple.

So if you are on 192.168.1.1 and want to connect to a specific port on 192.168.1.2 there aren't enough free port numbers to get 1 million connections. Thus the extending of the "ephemeral port range" (the local port number the kernel is allowed to assign) and addition of more local IPs.


Except there's no assumption here that the 1 million connections are between just two computers. Clients are spread over 50 different EC2 instances (which each have a unique address). The host does not need more ports in this scenario and the clients are using 20,000 ports (possible without altering port allocation).


This was a reply to tmartiro's thread, where he says "You can also open 1 million connections using one linux box" and describes how you could do it a single instance rather than 50 separate ones; adlpz asked "what would be the point of that".


Well, of course, you can only have 65535 established connections sending data concurrently at a given point in time. But what I meant is, that is not the bottleneck at all, but the software behind handling all these requests, so tmartiro's comment was either pointless or sarcastic.

Note: This scenario is only valid for two computers talking to each other. As gilgoomesh said, if you have multiple clients you have virtually unlimited valid connection tuples (src addr, dst addr, src prt, dst prt).


sure, i tried, but connections comes slow... i wanted they to come like a tornado.


tl;dr OP throws 3 libs together, wrote a few scripts, made a few measurements. Look, it can handle 1 million connections.

I like what he has done but don't know why it made HN frontpage. I'm guessing it's because it has Ruby, 1 million, and concurrent in the title.

Not bashing the author, it's good work in its own right. I'm trying to understand how stuff like this makes frontpage HN. Where's the value of this article? That Ruby can handle 1 mil connections? Am I missing something (even if it's obvious?)

I think someone mentioned better real world test cases. I agree that would be a place to start. Perhaps he could send more meaningful data to clients. Maybe market data for some stocks or something. '\0' is not very useful after all!


There's a lot of Ruby developers on HN and many of them are interested to know how it can scale.

Benchmarking experiments of this nature are actually remarkable because they are so rarely done. Most scaling and performance principles still arise from conjecture. It's not trivial to set up a test like this.

Also, _this_ is the place to start in preparation for a real world test. The next iteration, maybe a more realistic test, is only a fork away.


Your forking argument won me over. Very good point.


I'm not a genius Ruby hacker by any stretch but if EventMachine is like any other "async IO framework" in other scripting languages then it is built on libev or libevent... so Ruby isn't really handling the connections, it's handling the callbacks. Pedantic, but important to note.


> Pedantic, but important to note.

Pedantic, but "pedantic" actually idiomatically means "not important to note".


In what culture?

In my dictionary it's "of or like a pedant"; a pedant being someone who is overly concerned with minor details.

Still... noted!


If the detail is minor, it's not important.


EventMachine does not build upon libev or libevent. It implements all that stuff by itself.


The practical purpose of this is to show that Ruby is perfectly capable of scaling to fit the needs of most companies. It kills the myth that Ruby is not capable of doing serious infrastructure work.

That is EXACTLY THE POINT of this. We use Ruby for a ton of real work, and we are sick of seeing it get pummeled on unfairly for the misconception that it is a slow language and you are guaranteed to have scaling problems if you use it. Productivity and performance are not necessarily tradeoffs, with Ruby and some good planning, you can get both.


Is "179 requests per second" really an example of scaling?


> unfairly for the misconception that it [Ruby] is a slow language

Ruby is a slow language, certainly considerably slower than c/c++/c#/scala/java/etc. The question is whether the performance gap is big enough in your particular use case for it to matter.


If that is the purpose, it fails miserably. I don't know of anyone who was looking to build a business on handling a million idle tcp connections. Ruby is a slow language, it is not a misconception. Having a million idle connections doesn't make a language fast.


LOL. A couple hundred requests per second? Is that something to brag about?

What the fuck is the point of opening a million connections if it takes you 1.55 hours to process one request from each connection?


mm, strange math.

179 means that while app holding and communicating to 1 million persistent connections it is still able to process 100+ standard requests per second. pretty enough to accept new clients to your online game, audio/video chat, podcast etc.

this graph shows how many requests per second app may process depending on amount of established persistent connections:

https://raw.github.com/slivu/1mc2/master/results/requests-pe...


I don't think your original post is very clear at all. I get the same number as the post you are criticizing: 1000000/179 / 60 / 60 = 1.55 hours.

Are there other messages being processed too? Which direction are the messages going? How big are they, what do they contain, how is the data processed. Also, if you have 1 million clients how do you handle all of them arriving at around the same time? How long did it take your system to get up to 1 million connections? Also how sure are you that each micro instance is sending and receiving the messages at exactly the expected rate and is not getting overloaded. What server type are you using for your central server?


"pretty enough to accept new clients to your online game, audio/video chat, podcast etc."

If 'pretty enough' means only allowing 179 new connections to your server per second, that's great. When you have a large event, everyone gets on your site at the same time, not to mention times like getting to work, lunch break, evening rush, etc. You ever hear of the slashdot effect? That's more than 200 requests per second.

Ignoring the poor connection time, you could only have 179 users actively using your app every second. Out of a million. %0.000179 of your user base. Talk about really shitty user engagement.

I'm not even going to talk about the incredibly bad idea it is to host a million connections using one server. The idea that "most websites" only do "about 100 requests per second" is laughable. Sure, the average may be 100, over a month, but that's nothing compared to peak times. Try tens of thousands per second. A high-traffic site might do something on the order of thousands of database writes per second. Which, when all those connections come in, will kill your database servers, which backs up your frontends, which is why you have to have fast forward-facing pre-loaded cache. But I digress.

Focus on scaling your application to actually handle traffic before you obsess over concurrent connections.


I am excited about ec2-fleet tool. How expensive is the tests in your case?


a micro instance costs 2 cents per hour, so $1 per hour for 50 instances.


what's the name of the program with CPU/memory graphs in the right-upper corner above htop on slides ##13-171?


it is Gnome's System Monitor


For JRuby Streaming I have used Torquebox (JBoss) successfully.


The author asserts that JRuby would consume more than 15GB of memory without providing any justification. Perhaps it would be better to actually try it rather than just making sweeping generalizations.


Here's a case study showing over 500k connections to a Java instance in 2.5GB of memory using NIO. JRuby's implementation of Ruby's IO is implemented directly atop NIO. So yeah...that 15GB assertion is nonsense.

http://urbanairship.com/blog/2010/08/24/c500k-in-action-at-u...


you right, sorry for pointless blaming, my bad, had to check it first. updating orignal post


Doesn't Ruby have a limit of 1024 open file descriptors? If some method in Ruby's standard library calls 'select' internally, with the 1024 limit, what happens?


it will segfault. that's why EventMachine using epoll on linuxes and kqueue on bsd


I/O is handled through EventMachine, not the normal Ruby I/O calls, and can therefore scale arbitrarily.


It is a little worrying that if some (perhaps inexperienced) developer arbitrarily calls a method that invokes Ruby's select, the process will crash mysteriously.


It won't crash. On many platforms, Ruby uses special select() hacks to extend the number of file descriptors select() can handle. On OS X it can apparently handle 10556 file descriptors. If you go over that, Ruby apparently simulates an EMFILES error.


very cool experiment


dumb question I thought the GC tunings, e.g. RUBY_HEAP_MIN_SLOTS, were only available in REE, rather than MRI?


starting with 1.9.3 they are also available on MRI, though not sure about 2.0.0 as there GC are highly refactored/optimized.

> Starting with Ruby 1.9.3, the GC in mainstream ruby can also be tuned

http://www.web-l.nl/posts/15-tuning-ruby-s-garbage-collector...


Would love to see the updated test once Ruby 2.0 is out in a few days time.


Brilliant - thanks, hadn't found that info.


it would be nice to compare the different ruby implementations with this test.


interesting experiment!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: