For a little perspective, the remaining 20ms still leaves enough time for oh, 55 random SSD IOs and time spare to zlib a megabyte of text. Or about 4.8 million instructions of a modern 2.4ghz CPU. Just what does this app do exactly?
Aah.. Django.
Seriously, 20ms isn't good even for Python. That's not something to be proud of, that's reason to rip up whatever overelaborate mess you've constructed to figure out just how you're blowing so much time.
As a reminder, a 200 byte HTTP request can be parsed in around 600 instructions or less (about 250 nanoseconds at 2.4ghz).
The real strength of uwsgi is it's emperor mode [1].
On my servers I have one upstart job whose role is simply to start the emperor daemon and make sure /var/run/uwsgi permissions are correct for writing sock files.
Each django project deployed on the server has a uwsgi.ini file in it's etc folder. This ini file holds the config for that app only. It defines where the socket file is located, where the virtualenv is located and where the project.wsgi file is located.
It's simple because you can host multiple websites on a single server, but you uwsgi configuration is separated into 'system' and 'project', so your 'project' specific config can live in git and your 'system' config never has to change.
This is pretty light on the details. I also use Nginx + uWSGI in front of an API that shakes up a lot of network IO behind it, and it has had a few headaches. On FreeBSD it ignores the die-on-term option, making it a pain to use in conjunction with service managers that rely on SIGTERM like daemontools. It's good that in your example you include the need-app option, as it will happily start up without being able to load anything otherwise. I'm curious what the intended default use case is, of mostly just wanting the server up - oh, and if you manage to load my app that would be ok I guess but don't sweat it! :P I disagree with a few choices in how it load balances, and tuning the backlog has proven futile with my setup. But yeah, it's lightyears ahead of mod_wsgi.
If you're a control freak who bases engineering decisions primarily on flexibility (and are bound to python/WSGI) why not use Mongrel 2 with a simple python backend instead? You can escape having to shoehorn your code into a gevent/gunicorn/uwsgi/whatever container if it causes headaches to do so. Here's a teaching example for how to play with mongrel 2 from python (feel free to change it into something that would work for you) https://github.com/tizr/belasitsa/blob/master/belasitsa.py
A few configuration issues while we learned which ones performed as described on FreeBSD. It loads python modules BEFORE forking workers, so if you use multiprocessing or anything to create procs it will do it once before the fork, and all post-fork children are left sharing the same pool unless you instantiate it during the first request or a similar approach. These are just things to be aware of, and are not necessarily black marks against its usefulness assuming you are willing to accept a certain cost of learning the features (there are a lot). I'm still using it, but I'm not really convinced that I'm better off than if we had picked gunicorn or similar initially and skipped some headaches.
Both gunicorn and uwsgi are good stuff. Though a new contender is Phusion Passenger (https://www.phusionpassenger.com/), originally designed for Ruby but recently gained official support for Python. With the next release (4.0.6), Python support will be raised from beta status to production-ready status. Like uwsgi, it performs the bulk of the process management for you. It goes a step further than uwsgi by integrating directly into Nginx, so that the web server is the only thing you have to deal with.
A couple of years ago there was an article comparing the various wsgi servers. The one that seemed most performant/low memory at the time was gevent, so I chose it for the application I was writing at the time to run on an ec2 micro behind nginx. It was very easy to use too, just wrote a 10 line python script and that was it.
I've never seen any articles about it since, so I wonder if others know something I don't. Can anyone compare gevent's webserver vs. uwsgi, gunicorn, etc behind nginx?
gevent is a very simple webserver that wont handle many things that the "real" servers do.
One of the main things I'm talking about in this post is load balancing, which gevent (and uwsgi, and gunicorn, etc) don't implement very well. Nginx does a pretty good job at this, in addition to many other things (like buffering).
You can definitely get a lot of performance out of a coroutine, async, etc framework, but at the end of the day it's not going to be the entry point to a production application.
I've been running Nginx + uWSGI for a long time but recently switched over to Nginx + Gunicorn for Gevent support, and so far it seems to be working well.
Gevent's monkey patching approach allowed me to convert a traditional blocking web app (built on Flask/SQLAlchemy) into an evented model without having to rewrite the entire app around an event-driven approach (ala Twisted or Tornado). All you have to do to get Gevent working with Gunicorn is to pass it a command line argument to use the Gevent worker instead of the default synchronous worker.
Gunicorn + Gevent also makes it quite easy to integrate WebSockets into your existing web app, which is something uWSGI doesn't handle very well yet.
All this is to say that though uWSGI is a very solid piece of software, Gunicorn is definitely a reasonable alternative to uWSGI depending on your use case.
gunicorn + meinheld is even better than gevent in my experience. meinheld is written in C and offers a monkey patch option, and is a little bit faster than gevent. i'm using it with flask + sqlalchemy + redis with great success.
I don't get why this is getting so many points. It consists of a single graph and a bunch of hand-waving. Python doesn't have to be like the ruby community; it doesn't have to have a one-size-fits-all solution.
We're happily using tornado in production. Between the heroku router and tornado's built-in support for multi-processing and asynchronous processing, I don't have to worry about nginx + uwsgi + whatever framework. I just run tornado. And we're immune from the heroku random routing debacle [1], something that neither rails nor uwsgi can say.
Tornado certainly does. We convert our most I/O intensive bits to asynchronous code, and the server is no longer blocked. Random routing doesn't affect us because there's no long pauses in, e.g. making a third-party API request.
I'm not using uwsgi as a router, and the speed of the requests isnt the issue.
If for example, you randomly routed to a bunch of tornado servers that were overloaded/slow serving requests, you'd hit the same issue as that post.
Async/coroutines dont solve the problem, better routing does.
In this case, we use HAProxy, and you ideally would say "I know node X can handle 100 concurrent requests), and within HAProxy you could weight the node/set the maximum concurrent requests to said node, and it would distribute based on that.
The point is that the probability of the servers getting overloaded is greatly diminished, because they're not blocked for extended periods of time. An external API request can take several seconds. In that time, a framework that isn't async-capable is going to be blocked and unable to work on more requests in the dyno queue. Tornado (and node, etc.) can continue happily along.
On top of that, tornado has multi-processing support built-in, which adds intelligent routing at the dyno-level.
HAProxy is another solution, although it can't be setup on heroku so that's irrelevant. Also, it doesn't change the fact that your workers are blocked while executing long-running API requests. uwsgi won't save you either.
The linked doc literally says as much, toward the bottom: "So the only solution is for Heroku to return to routing requests intelligently. They claim that this is hard for them to scale, and that it complicates things for more “modern” concurrent apps like those built with Node.js and Tornado. But Rails is and always has been Heroku’s bread and butter, and Rails isn’t multi-threaded.
In fact a routing layer designed for non-blocking, evented, realtime app servers like Node and its ilk — a routing layer that assumes every dyno in a pool is as capable of serving a request as any other — is about as bad as it gets for Rails, where almost the opposite is true: the available dynos are perfectly snappy and the others, until they become available, are useless. The unfortunate conclusion being that Heroku is not appropriate for any Rails app that’s more than a toy."
Uhm...am I misunderstanding something? It sounds to me like the author is saying we should be spending time tuning UWSGI's plethora of configuration options rather than using gunicorn. I much prefer components which "just work" for a healthy range of typical use cases.
Don't get me wrong, I think it's cool he made it work.. but seriously?
There was absolutely no tuning other than adjusting buffer sizes (which only matters because it accepts large POST packets).
The options that a standard application will use are almost identical between the two. UWSGI just provides numerous lower-level options that I (and it sounds like you) probably don't care about.
If you want performance, tune. If you don't, leave the defaults. How can more options be bad when the default is as good or better than standard Apache + WSGI?
uwsgi is great, but only for users that want to do a PhD in performance tuning:
'There is no magic rule for setting the number of processes or threads. It is application and system dependent. Do not think using simple math like 2*cpucores will be enough. You need to experiment with various setup an constantly monitor your app. uwsgitop could be a great tool to find the best value.' (from uwsgi doc)
Should I be using Nginx + UWSGI with Go apps? I know someone wrote a seemingly comprehensive uwsgi package for Go, but IIRC it kinda got shredded a bit in the mailing list.
uWSGI != WSGI. The name is misleading. To quote the uWSGI docs: "The “WSGI” part in the name is a tribute to the namesake Python standard, as it has been the first developed plugin for the project."
So, as far as I can tell, you have failed to demonstrate that Twisted Web with PyPy is not a viable alternative, which is disappointing.
Also, uWSGI's core team doesn't have any faith in their own code; they use Apache instead of dogfooding. (And they're moving all hosting and development to sites like Github.) Twisted, at least, is confident enough in the quality of their code to self-host.
We use Apache as a frontend because we are an hosting company and customers want .htaccess (sad but true). But behind Apache frontend (read: frontend) there is ALWAYS uWSGI (even for php). Unbit.it uses uWSGI for everything (even for load balancing MX servers ;) and i am proud we are the first (only...) company giving php hosting on uWSGI by default. Finally, we moved to github because customers/contributors wanted it, i was perfectly fine with self-hosted mercurial, but customers are more important than my taste :)
PyPy uses too much memory and isnt that much faster for many things.
(Sorry PyPy guys!)
I dont have numbers/charts to pull up for this right now, but I was testing it out to see if I could push more cpu on a server, and the memory cost was insane for the minimal test case.
Twisted also is not a web server, it's a framework. Same with Tornado, and honestly, the same with gevent/etc.
How much memory usage is "insane"? Are you sure the memory cost grows with the size of the application? It sounds like you tested a minimal test case, saw high memory usage, then didn't bother to test a large application. Maybe the memory usage is constant.
I have absolutely no idea how HN replies work, but this is in response to the PyPy memory usage.
I'm optimizing for my application. If you want to use PyPy, use it. I couldn't care less.
If I was having a hard time already maxing CPU due to memory constraints, how does switching to something that will use a very large amount of memory (relative to the CPU gains) make any sense at all? Do you really think "more complex code" is going to use less memory?
If you want benchmarks, go find some, or go create some. It has absolutely nothing to do with the gains I'm pointing out by simple webserver choices and minor tuning.
Aah.. Django.
Seriously, 20ms isn't good even for Python. That's not something to be proud of, that's reason to rip up whatever overelaborate mess you've constructed to figure out just how you're blowing so much time.
As a reminder, a 200 byte HTTP request can be parsed in around 600 instructions or less (about 250 nanoseconds at 2.4ghz).