Hacker News new | past | comments | ask | show | jobs | submit login
How we’ve made Raptor fast (rubyraptor.org)
178 points by triskweline on Nov 10, 2014 | hide | past | favorite | 71 comments



Just some glaring inconsistencies that I found:

1.

> It’s less work for the user. You don’t have to setup Nginx. If you’re not familiar with Nginx, then using Raptor means you’ll have one tool less to worry about.

>For example, our builtin HTTP server doesn’t handle static file serving at all, nor gzip compression.

Sounds like I would need nginx(or another frontend server) anyways?

2. > By default, Raptor uses the multi-process blocking I/O model, just like Unicorn.

> When we said that Raptor’s builtin HTTP server is evented, we were not telling the entire truth. It is actually hybrid multithreaded and evented.

So, which it is? I assume by default is multi-process + events, but a paid version offers multithreaded + events? If so, isn't unicorn's model of multi-process+blocking IO is pretty good as well because OS becomes load balancer in that case.

Overall it seems they wrote a very fast web server. Kudos to that! But I don't think the web server was ever the problem for Rack/Ruby apps? Still on fence with this one until more details emerge. :-)


The first point makes sense for me given that all my static files are served gzipped out of cloudflare/cloudfront on another domain.


And the rails asset pipeline can pre-compress the files, so you don't waste compressing them for each request individually.


Does it seem weird to anyone else that they're doing all this marketing for a project that's supposedly going to be open source? Why all the suspense, why not just release it? Or, if it's not ready yet, why not just finish it and then start promoting it?


Half way through the document they mention their plan to add paid for features: "Raptor also optionally allows multithreading in a future paid version."


Yep, that's when I stopped reading and closed the article. Why hide the fact that you are building paid software? Perhaps it was naive to assume that it was open source, but that piece really surprised me.


What problem do you have with open source software that has a paid premium version? For example, the Sidekiq gem is open source but they also have a paid Sidekiq Pro. Would you avoid Sidekiq too?


They have a full drip with email capture around this product as they are looking to sell a portion of it. (I think it was a multithreaded variant.) I do agree that I wish the product was ready but I can't fault them for "following the playbook".


As someone working in an enterprise environment, I've sort of lost interest in this breed of Rack server now that I've gotten used to having SSO and LDAP authorization available via Apache modules, to name a few features. Apache allows me to accommodate all sorts of requirements like setting up vhosts that require authentication except on the LAN, or vhosts which allow members of certain groups to access an internal app via reverse proxying.

I don't mean to be negative; other posters have that angle covered. But I would comment that this ongoing proliferation in prefork backends is hardly disruptive to organizations who have already made significant commitments to Ruby web apps. Our Apache/Passenger servers aren't going away anytime soon.


This is Passenger Phusion +1. As was pointed out in a thread several months ago, the DNS resolves to the same place. The writing style is similar and the feature set far too mature for a 1.0 product.


How would one use this on Heroku? It doesn't support static file transfers allegedly... and per Heroku they require your app server to serve them by default.

https://github.com/heroku/rails_12factor#rails-4-serve-stati...

Any ideas?


You use a rack middleware to handle static files, which is how Rails handles it by default. So, unless I'm completely mistaken (which I may well be), this should work just fine?


Wouldn't that sit behind the app server? (Equally mistaken potential. :)


Using a CDN (Cloudfront, Fastly add-on, ...) is a common choice, as it allows delivering assets from a closer datacenter to the client and removes load from the app server.


Yeah but getting assets to said cdn is annoyingish without having app server origin, I'd argue.


I find the setup pretty simple and convenient if you configure the CDN to fetch the assets from the app (in which the app will need to serve the assets only once for each new release).


It's trivial with Rails, at least.


"You will need 5000 processes (1 client per process). A reasonably large Rails app can consume 250 MB per process, so you’ll need 1.2 TB of RAM."

Quibble: most multi-process web servers use fork() for child processes, which means they can share identical memory pages.


Not deeply understanding copy on write[1] semantics for virtual memory paging and designing application servers seems just foolish.

I'll chalk this one up to the PR/marketing person probably not taking an OS course.

Still it would be nice if they really did go back and read a little W. Richard Stevens[2].

[1] - http://en.wikipedia.org/wiki/Copy-on-write [2] - http://en.wikipedia.org/wiki/W._Richard_Stevens


> I'll chalk this one up to the PR/marketing person

This entire web server is a marketing hype since day one. I imagine they are trying to build a pro product and support company out of this.

It's a web server with event loops and some fancy memory allocation. Shouldn't Node.js have taught us all by now the perils of event loops and insanely tweaked HTTP parsers? Sure, it looks great for "Hello World" benchmarks but falls right on its face as soon as you have an app of significant size spending real time on CPU.


I'm wondering just how well these app servers perform with a real-world Rails app behind them? My understanding was that Unicorn deliberately does not try for maximum performance simply because you'll lose most of that as soon as you add Rails, and then the app server is not the bottleneck. Amdahl's Law applies - you can't get more than a 1% speedup by optimizing a component that consumes 1% of the total CPU time.

I also wonder how their hybrid evented/threading/process model works in the presence of a GIL (which, last I checked, Ruby still has) and in the presence of blocking socket calls (which, last I checked, both the MySQL and PostGres APIs used).


It sounds like the real benefit is not performance but the simplicity of having the slow-client spoonfeeding built in rather than requiring an external Web server.


I agree that would be a benefit, but

a.) they could achieve that a lot simpler by bundling nginx, Unicorn, Rails, and a pre-vetted set of config files and shell scripts to bring the whole thing together and

b.) that's the value proposition of PaaS offerings like Heroku. Heroku is pretty damn simple already - just git push your code - and you'd outgrow it around the same time as you'd outgrow the bundled slow-client spoonfeeding, so what's the value proposition of this?


I would really like to hear their answer to this question.


In my experience -- as the Phusion Passenger author, and as the one of the original developers behind the copy-on-write feature in Ruby -- a moderately large Rails app can use 250 MB per process even with copy-on-write.


Now that they have fixed the garbage collector. However, how much of that 250MB is built up after the fork, and how much is static? I actually don't have that answer.


In the case of Rails, all of your Gems, Rails and application code are loaded pre-fork.


I strongly believe that this is the next version of Phusion Passenger.


The writing style is very similar.


Their architecture looks to me really mature - a lof of features are the same as Passengers'


Spoiler: Insane amounts of low-level optimization.


Actually, everything was pretty simple: Use libev since it's faster than libevent, use Node's http parser because it's fast. Then allow each thread in the pool to run its own event loop. This pretty much sums up my university's Internet Programming course. There were few hairy bits about tcmalloc, but they did a great job about explaining how they took advantage of object pooling and region-based memory management. Great post guys, I can't wait to give your source a read :)


Any idea why they didn't use libuv instead?


I haven't seen any benchmarks comparing the two, but Node first used libev, and then the team created libuv out of a need to support Windows, because libev is unix only.

Actually, it's all in this about page: http://nikhilm.github.io/uvbook/introduction.html


I would have loved to see some data that showed their bump pointer regions and thread local stuff performing significantly better than tcmalloc or jemalloc, both of which do thread-local caching that avoids locks for the vast majority of allocations. Additionally, what they came up with sounds like talloc[1], which has been in production for years with samba.

1 - http://talloc.samba.org/talloc/doc/html/index.html


Congratulations on Raptor, I'll definitely give it a whirl. Regarding static asset serving, I'm fairly certain serving them through the application server is often not the way to go anyway.


Raptor seems pretty interesting, but personally I don't like its marketing approach, and I'm not the only one.

On twitter some ruby heroes say : " Raptor is 4x faster than existing ruby web servers for hello world applications" :)

The strong proclamations in favour of an open source project is a little bit strange if the open code is not yet released.

However I hope that all graphs on the home page are real for the ruby programmers happiness



The section "Hybrid evented/multithreaded: one event loop per thread" suggests that the whole model is basically SEDA [1]. I'm surprised the article does not directly reference the project/paper.

[1] http://www.eecs.harvard.edu/~mdw/papers/seda-sosp01.pdf


The "hybrid" IO architecture is historically known as AMPED (asynchronous multi-process event driven): https://www.usenix.org/legacy/event/usenix99/full_papers/pai...


Puma says it runs best when using Rubinius or JRuby, but from my limited understanding, not everything can run on those implementations.

Are there any giveaways in the blog that wouldn't allow Raptor to run on Rubinius or JRuby?


That's basically because Puma uses threads, and threads are way better on Rubinius and JRuby, due to lack of a GIL. It's not due to some kind of incompatibility.


well if part of it is written in C++ it most likely won't run easily on either?


Does anyone know if the 60.000 in the chart means 60 or 60,000?


It's from Phusion[1], which is headquartered in the Netherlands. Non-English Europe conventionally uses . as a digit separator, so 60.000 would be sixty thousand.

[1]: % ping rubyraptor.org PING rubyraptor.org (97.107.130.55) 56(84) bytes of data. 64 bytes from shell.phusion.nl (97.107.130.55): icmp_seq=1 ttl=50 time=98.3 ms


If it's the Passenger guys then 60.000 is how you write 60,000 in the Netherlands ;)


I was thinking the same thing. 60 reqs/sec is not very fast. I suppose it depends on what is being tested. "Hello World" is pretty useless to base a benchmark on as DB and other IO is not involved. Who knows what that number means without some source code it was benchmarking.


Not bad work. Seems somewhat futile though, since the speed will probably be massively slowed down by the actual ruby application code and database accesses etc..


If it means you can squeeze more requests/sec out of the free tier of Heroku, I'm all for it!


I am all for a faster ruby application server. If Raptop can stand behind its claims on November 25th, that will be the best birthday present I could get.


Slightly OT but does uwsgi feature much in the ruby world? /from a curious python guy


Ruby uses a specification called Rack.

http://rack.github.io


uWsgi is an implementation of the python WSGi spec that also supports Rack: http://uwsgi-docs.readthedocs.org/en/latest/Ruby.html


I haven't heard about it myself. Sounds like it could be helpful if you're trying to fit some Ruby services into a mostly Python shop.

According to that page, it doesn't support the most recent Ruby versions. That might not be accurate, though.


Ruby support in uWSGI is really solid, and it is more used than you can think of (As an example the biggest italian rubyonrails site run over it, and you will find a bunch of interesting blog posts about it combined with ruby). The main issue here is that we never pushed it in the community like we did with python and perl. Frankly i do no know why, maybe it has been a terrible error (taking in account that whoever tried it with ruby has been really excited)


What does uWSGI support bring to the table?


I hope better subdomain support


I try to keep up with most things ruby and I have never heard it discussed in a ruby context


Does it support SPDY?


what are these instead of just using a webserver like apache or nginx?


You generally use Unicorn etc. behind something like nginx. The last time I did that I used nginx to handle thousands of concurrent connections that were forwarded to nCPUs Unicorn instances. Nginx is very good at handling lots of connections, Unicorn is good at handling a Rack app.


To be a bit more explicit, using a separate httpd and application server allows a division of labor between the resource-bound task of handling the request + building the response from the network-bound task of dibbling bytes back to the original requestor.

Nginx (and the general class of highly concurrent servers) is good at handling lots of connections largely because it tries to minimize the resources (memory, process scheduler time, etc) required to manage each connection as it slowly feeds the result down the wire.

The application server generally wants an instance per CPU so that it can hurry up and crank through a memory-, cpu-, or database-hungry calculation in as few microseconds as possible, hand the resulting data back to the webserver and proceed to put the memory, DB, and CPU to the task of processing the next request.

This is in contrast to the (simplified here) old-school CGI way that say ancient Apache would receive a request, then fork off a copy of PHP or Perl for each one, letting the app get blocked by writing to the stdio pipe to Apache then Apache to the requesting socket. All the while maintaining a full OS process for each request in play.


This is an "application server". When you are doing anything that isn't PHP you basically need a process behind your web server (Apache/Nginx) that runs your actual ruby/python/java application code and speaks HTTP.


Even in PHP you want that for decent performance: php-fpm is similar in concept to a Ruby/Python application server, albeit slightly different due to PHP being a web language first and foremost, and it's interesting execution method. Still, you run a process and connect nginx to it :)


Even for PHP you do, it's just delivered as an apache module. (Passenger can be run as an apache module too, fwiw).

Although to be fair, the PHP model doesn't require a _persistent_ process between requests (I think?). But most other platforms do.


It isn't required for Ruby either, but loading a Rails app is slow so it's better to persist it between requests.


Unless you also need high density multi-tenancy in which case you probably want to use mpm-itk for security..


because you need something between your Ruby code and nginx/apache.


Does this mean that the majority of the performance improvements over Puma actually come from the fact that they are using the considerably less battle-tested HTTP parser of PicoHTTPParser over the Mongrel one?

Of course, this may be mitigated by the fact that any reasonable production environment will have a web server layer over the app server/s anyway for load balancing, fail-over and exploit detection/prevention anyway.


They actually aren't using PicoHTTPParser.


Whoops! My reading skills need some work.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: