Rachel presumably wrote her server in a reasonable language like C++ (though I don't see a link to her source), but when I wrote httpdito⁰ ¹ ² I wrote it in assembly, and it can handle 2048 concurrent connections on similarly outdated hardware despite spawning an OS process per connection, more than one concurrent connection per byte of executable†. (It could handle more, but I had to set a limit somewhere.) It just serves files from the filesystem. It of course doesn't use epoll, but maybe it should — instead of Rachel's 50k requests per second, it can only handle about 20k or 30k on my old laptop. IIRC I wrote it in one night.
It might sound like I'm trying to steal her thunder, but mostly what I'm trying to say is she is right. Listen to her. Here is further evidence that she is right.
As I wrote in https://gitlab.com/kragen/derctuo/blob/master/vector-vm.md, single-threaded nonvectorized C wastes on the order of 97% of your computer's computational power, and typical interpreted languages like Python waste about 99.9% of it. There's a huge amount of potential that's going untapped.
I feel like with modern technologies like LuaJIT, LevelDB, ØMQ, FlatBuffers, ISPC, seL4, and of course modern Linux, we ought to be able to do a lot of things that we couldn't even imagine doing in 2005, because they would have been far too inefficient. But our imaginations are still too limited, and industry is not doing a very good job of imagining things.
† It's actually bloated up to 2060 bytes now because I added PDF and CSS content-types to it, but you can git clone the .git subdirectory and check out the older versions that were under 2000 bytes.
> I feel like with modern technologies like ... we ought to be able to do a lot of things that we couldn't even imagine doing in 2005...
As a self-taught programmer I would say that what all these less efficient bit easier to learn technologies have done is enable people like me who evidently are not geniuses like yourself to write software. Should programming always be an ivory tower thing?
It does take geniuses to design and operate the incredibly complex cloud-native distributed systems we all insist on doing now.
It’s fair to point out how far you can get just programming one computer using traditional and well understood concepts like sockets and threads. And how weird it is that we live in a world where Kubernetes is mainstream and fun but threads are esoteric.
I'm not a genius. I started programming in BASIC. I haven't had a lot of schooling: I haven't had a computer science class since I was 12, and I didn't finish high school. I just didn't stop learning.
Adding unnecessary complexity doesn't always make things easier to learn.
Very well put. It's also a matter of use cases. These people seemingly implement servers for big companies. I just make shitty websites. We aren't their target audience yet it comes off as 'this is what everyone should be doing'.
I think part of the point of this is that this approach isn't particularly complicated (writing in assembly is unecessary, but everything else described in both the article and the comment above is basically the simplest way to make a webserver).
> single-threaded nonvectorized C wastes on the order of 97% of your computer's computational power
Can you elaborate on what this means exactly? For example, is there some reasonable C code that runs 33 times slower than some other ideal code? In what sense are we wasting 97% of our computer's computational power?
Roughly that 3000× is 18× from multithreading, 3× from SIMD instructions, 15× from tuning access patterns for locality of reference, and 3× for turning on compiler optimization options. This is a really great slide deck!
I was assuming "single-threaded nonvectorized C" already had compiler optimization turned on and locality of reference taken into account. As the slide deck notes, you can get some vectorization out of your compiler — but usually it requires thinking like a FORTRAN programmer.
So I think in this case reasonable C code runs about 54× slower than Leiserson's final code. However, you could probably get a bigger speedup in this particular case with GPGPU. Other cases may be more difficult to get a GPU speedup, but get a bigger SIMD speedup. So I think my 97% is generally in the ballpark.
A big problem is that we can't apply this level of human effort to optimizing every subroutine. We need better languages.
This is great! Which of these do you think could be extended to general-purpose programming without the HPC expert? Taichi and DAPP seem to be aimed at that goal, but you seem to be implying they don't reach it yet?
You can use them without the HPC expert, Halide for example has a good autotuner and has been used by Google and Adobe to create image filters for mobile devices.
I'm still kind of a newb myself but from what I understand these are special CPU instructions that allow you execute the same instruction in parallel against multiple data points. This allows you to eke out a lot more performance. It's how simdjson[1] is able to outperform all other C++ json parsers.
8 cores times 4 SIMD lanes is a 32× speedup; that's where "97%" comes from, as explained in the note I linked to.
It's pretty variable: some things we haven't figured out how to speed up with SIMD, sometimes we have a GPU, sometimes we can get 8 or 16 SIMD lanes out of SSE3 or AVX128 or 32 of them out of AVX256, sometimes you only have four cores, sometimes make -j is enough parallelism to win you back the 8× factor from the cores (though not SIMD and GPGPU). But I think 97% is a good ballpark estimate in general.
Just to clarify, I was only estimating a speedup of 4× from vectorization, while the other 8× comes from multithreading.
Fifteen years ago we thought regular expression matching and compilers were unlikely to benefit from vectorization, but now we have Hyperscan and Co-dfns, so they did. So I think it's likely that we will figure out ways to do a wider variety of computations in vectorized ways, now that the rewards are so great.
As an example, if you are checking a boolean flag (1 bit) on an object, and it ends up being a cache miss (and x86_64 cache line size is 64 bytes), then your computer just went through all the expense of pulling in 512 bits from RAM yet it only used 1 of them. You are achieving 0.2% of the machine's possible throughput.
I imagine if you could make the most out of vector instruction set in your code (where they can operate on a vector of data at once instead of one by one), you'll get a huge performance boost for "free". GP seem to be working on a vm that let you do that (a lot of it was flying over my head though, need some coffee).
Despite all my rants here about C, on my travel netbook I switched to XFCE as I could not stand the performance impact of all JavaScript and Python based extensions on GNOME.
I would expect so, but did you mean 500k or something? "50k rps on a beefy machine" sounds like about the same as, or maybe even a bit slower than, 20k–30k on this 2011 laptop, which was how fast httpdito was last time I measured it.
went back and looked at it. a webserver written in asm for the lols is okay but my point was that you probably want a proven, battle ready web server (along the lines of nginx or apache) if running something in production. So, 50k rps on a vanilla well used/well maintened server > 50k rps on an experiment (and don't get this wrong, it's pretty impressive for what it is)
Yeah, I definitely wouldn't advise anyone to run httpdito in production. It's so lacking in observability that it doesn't even log hits, it doesn't do timeouts, and its MIME types are configured by writing in assembly language. But it shows that some surprising things are possible. And it can be handy to serve up some static pages from your laptop to your phone or whatever.
Whether intended or not, there's an undercurrent of "you're all so dumb for using Python" (or Ruby, or PHP, or other similarly performant language) here. I want to surface that and question it a bit.
It's totally reasonable for a company to choose the Python/Gunicorn option if they already have a bunch of people who know Python and they don't need to serve tons of requests per second.
Even if they do need to serve tons of requests per second, it's totally reasonable for them to still choose Python/Gunicorn if the cost of the additional servers is less than the cost of having to support multiple languages. Or if they get a lot of value from libraries that are unique to the Python ecosystem. Or if they care more about quickly iterating on features than driving down server costs.
I agree that there's a point where it stops making sense, and there are plenty of engineers who don't recognize when they're past that point because they keep doubling down on sunk costs and things they're familiar with. But let's not be too quick to assume people are in that camp when we don't know all the tradeoffs they're facing.
I don't really get what the point of this post is. Is it really a dig at python? Python can handle thousands of connections in a single thread no problem with basic enough stuff.
Is what the author did supposed to be impressive? Is it supposed to make python look bad?
I don't get it. Seems like run of the mill stuff. Python might struggle at the same level of concurrency (was it like 15k?) but you can still do 10k connections easy enough iirc.
I made a post some time back about maintaining 64k connections in a Ruby process a while back: https://www.wjwh.eu/posts/2018-10-29-double-hijack.html . It only stopped at 64k because I could not be bothered to rig up multiple IPs so it eventually ran out of ports.
Just having a lot of connections that do trivial stuff is not very difficult. It becomes way more interesting when all of those connections need to access shared data structures and whatnot.
Which I read as "I have no experience whatsoever with modern python, and async is something that catches water".
Async is typically less prone to error and complexity than thread code, and also typically faster/lighter for io (in python). I can't see a reason for this not to be the case in other languages too.
In terms of error-proneness, I would say the hierarchy is this:
Actors < CSP ~ Async << Threads
In terms of "getting started" difficulty, I would put the order like this:
Async < Actors ~ CSP < Threads
In terms of first-order maintainability in large projects I would put the order like this:
Actors >> CSP ~ Threads > Async
Async is on its face easy to grok, and saves you a ton of problems with locking and the like, and you can run it on a single threaded system with the right type of dispatcher. However, small amounts of complexity rapidly devolve into towers of async calls that cannot be untangled from a spaghetti pile, and there's not really satisfying ways of figuring out how to unwind error-handling in async, except for the basic case of "I truly don't care if this async fails".
I will say though, it's spectacularly easy to write messy and garbage code in all four of these concurrency models (I certainly have). My preference for actors comes from four opinions:
- some systems (if you want to be flippant, microservices architecture) use actors to encapsulate failure domains, which is really fantasic, and truly the #1 reason to use actors
- 99.8% of the time no need to write mutexes, and good actor systems basically won't deadlock unless you really try hard. (also true for async system btw)
- gives you an organizational framework to write well-designed and well-engineered systems.
- with a small amount of discipline "not having a spaghetti ball" scales with complexity (I find it takes a lot of discipline to not have a spaghetti ball with async in the more complex cases)
The async stack traces I've come across in Python are extremely simple to grok. I've never had something I couldn't figure out. (Using aiohttp, aoipg, asyncpg).
I can't even say the same for synchronous Django. Sometimes it's just the quality of the tool you're using, not the higher-level concept it implements.
You can trivially implement async on top of actors, (like exists in Elixir's Task module), and generally most async in elixir should go through that model. But if you're in python-land you really don't know what you're missing. In elixir, my tests autopartition the state so that each test exists in it's own "universe in the multiverse", so I can run concurrent, async integration tests with stubs, database transactions, even http requests that exit the vm (through chromedriver) and come back, finding their own correct partition of the test mockset and database state, and I have a few custom extensions to the multiverse system like process registries, and global pubsub channels. We're talking hundreds of highly concurrent integration tests that run through completion in seconds.
It seems like doubling down is the standard thing to do. Here's a video about how Instagram bugs engineers to do fewer string manipulations in Python instead of using a faster language https://youtu.be/hnpzNAPiC0E
Django and Python somehow handled websites for entire newspapers on far wimpier hardware just fine--which makes me wonder if "cloud" (ie. non-deterministic memory and I/O accesses due to sharing with other tenants) isn't the problem rather than Python.
You need a market. You need paying customers. You need cashflow. You need features. You need a business plan.
You don't need scaling. Ever. To first, second and third order approximations.
Your company has a higher probability of bankruptcy than needing scaling.
Newspapers run everything behind a CDN, so Django isn’t a bottleneck. The things they do that can’t just be CDN buffered (like ad targeting and paywalls and comments) end up being supplied by outside vendors. I think Django fits well for a news org, but it’s good to understand why it fits well.
I just got roped into a project with the promise of Django, but the project lead ended up deciding to go with Laravel, so I've been looking into it for a few days.
It's a resounding no from me. If you're stuck with PHP, sure, this beats the WordPress style of...well...I can't really find words to describe how bad it is..., but it's still miles behind anything else. It full of strings and other things that you just have to memorize with IDE integration even with specialised extensions still worse than other languages with standard IDEs.
How much of the old "A fractal of bad design" post would you say still applies? That one post showcased so many footguns right at the language level that it spooked me away from it forever.
A lot of those are still applies, but these days you can ignore those bad parts and only use the new good parts just like how you would use javascript. Still, I only use php occasionally but I wish it's not as eager in treating string as number in many cases.
if you have an hour to spare, this (https://www.youtube.com/watch?v=wCZ5TJCBWMg) is a great talk by Rasmus Lerdorf, the creator of PHP, on the design of the language, the reason it ended up the way it did, and the ways that it is evolving for the better.
> Even if they do need to serve tons of requests per second, it's totally reasonable for them to still choose Python/Gunicorn if the cost of the additional servers is less than the cost of having to support multiple languages.
How hard is it to get up to speed on any other tech stack? ASP.NET Core is extremely fast and the learning curve is close to none, for example.
If someone was able to wrap his head around backend development with Python I'm pretty sure they have the mental fortitude to onboard a tech stack that doesn't suffer from major performance problems.
That's because I'm more productive with Python (been using it for ten years to implement many backend services), never hit any of its performance limitation yet (I mostly use it to develop b2b app with medium traffic at most, and always leverage distributed task queue for anything that might excessively blocks the webserver process), and also didn't want to maintain a fleet of windows servers (at least in the past when .net was windows only).
Now you have to figure out how to run a CI on a build server. Deployment on your production systems, be it containers or even not. Monitoring, alerting, profiling, tuning under load. You have to support a new database connection library with new shenanigans. In general, you need to integrate the new language into the existing ecosystem. The latter may even be impossible depending on the stack and solution chosen. Just look up those weird Java only caching servers.
All of that is possible, sure. Due to company acquisitions and specialized teams in some areas, we're kinda running the full bingo card of languages.
But there's little denying: We overall spent less time handling language runtimes and language-specific monitoring back when we were java+mysql and that's it.
> Now you have to figure out how to run a CI on a build server. Deployment on your production systems, be it containers or even not. Monitoring, alerting, profiling, tuning under load. You have to support a new database connection library with new shenanigans.
Most if not all of those items are either trivial or non-issues.
In fact, I would argue that deploying a Python app is a far more convoluted process than getting an ASP.NET Core app up and running.
With Docker, the problem simply disappears.
Getting it to build and test on a CICD pipeline is as hard as typing $ dotnet build, or $ dotnet test.
> All of that is possible, sure.
Not only it is possible, it's laughably easy.
We are supposed to avoid our problems, not perpetuate and aggravate them as eternum because we are too lazy to look for ways to make our life easier.
Learning curve close to none?
I like that many times there's "the way" of doing things, but when I have to do something slightly outside the recommended way, I feel that the effort required outweights all the benefits.
“The way” of doing things depends on the language, and for example for me Python is very hard. It’s not my main stack but I use it regularly and after years I still can’t find "the way" of doing things by myself, I feel like I end up on stackoverflow way too often because of anxiety of not being idiomatic. It’s like there is an enormous meta knowledge very specific to the language. That’s not something I experience with other stacks.
I'll find out myself. Just got handed a codebase that is a mix of Python/Gunicorn/node when this sprint started off. I've seen the word gunicorn in the repo... but still don't know what it does yet. First time for this old back end Java/C++/XQuery programmer, so how hard can it be?
(On the plus side, seems my Jetbrain kit includes PyCharm, so I've even got an IDE!)
Gunicorn is just a WSGI server, basically used to spawn a pool of webserver processes for your python backend. A python webserver is more optimal when used in multiprocess configuration (as opposed to multithreaded configuration, where python sorely suck at), and gunicorn will do that for you automatically, routing each http request to available worker process in the pool.
> Gunicorn is just a WSGI server, basically used to spawn a pool of webserver processes for your python backend.
Or a pool of asynchronous "green threads" using any of a number of libraries (gevent is the one I'm most familiar with). The thing to avoid is mixing the two (multiple processes and green threads).
> A python webserver is more optimal when used in multiprocess configuration
For CPU bound worker tasks, yes, this is true. For I/O bound applications, not so much; asynchronous I/O can handle the same I/O load with much fewer resources (particularly as forking Python processes uses a lot more memory because so much of the memory in the Python interpreter is dynamic, so you don't get a lot of benefit from what in a compiled language would be shared read-only code that doesn't need to be copied for every fork).
> (as opposed to multithreaded configuration, where python sorely suck at)
Yes, the limitation of the GIL is one of Python's worst warts. (It looks like there are finally efforts to remove it, but it's taken a long, long time.)
Ah yes, I haven't used gevent for years and now associate gunicorn with wsgi. I have far easier experience with nodejs and golang when I need to do async parts on the backend (usually for websocket stuff), then use some message queue/passing system with celery or zeromq to communicate back and forth with the python backend.
You don't need to master one stack to get up to speed. For that investment to pay off you only need to be almost as good as you were with your old stack, which isn't hard if you already are not proficient.
This is where my head always goes as someone who does mostly Asp.Net Core. Why is it always between something like C++ or something like Python? Nowadays with middleware and endpoint routing, asp.net core can be almost as simple as flask (even if that’s not idiomatic or what the docs show you).
I’m most comfortable in .net core but I recently learned Django because I kept hearing how productive it was from HN and all the startups around me use it, but Im starting to feel like I chose to move the wrong way. I will say I like how Postgres is the default for python and the library situation is much better over there as well.
I've tried working with Django and it's always seemed much too heavyweight to me. Flask is my preferred web framework for Python; much easier for me to use and feels like it's helping me where needed and getting out of my way where needed, not weighing me down.
Yeah I actually wrote a prototype in Django and flask and just felt like I was reinventing the wheel too much with flask. e.g. marshmellow = serializer from DRF, sqlalchemy = django orm, etc. I wrote password hashing with bcrypt for user logins and then realized I should salt the passwords and just went back to django since it already has stuff for everything I was doing.
I actually don't mind the heavy framework thing because .net core mvc is the same deal. The developer experience in python was just way worse for me. Autogenerating swagger docs didn't seem possible without manually adding annotations to all my code (for flask at least), the battle-tested python libraries for web dev aren't async, mypy is obviously way worse than an actual type system, model validation (the thing forms/serializers handle) was more tedious, vscode was worse than visuals tudio.
Yeah I've heard that before, but from my (admittedly brief) time it seemed like you could just use the utilities you want and ignore the stuff that forces you into a CMS'y box. Feels like if you really wanted to you could just use a form/serializer to validate input data and to return a domain object, pass the domain object to a service layer, and then use the django orm (or honestly anything if you really want to stray from the django way), to handle database interactions. I understand flask might be nicer because you get to feel like you dont have a bunch of wasted code bloat in your deployable, but there are some pluggable systems that come with django that are nice (user system, pluggable authn/authz, caching framework, etc.) I didn't really mind the bloat because a lean deployable isn't that important to me and it all felt sluggish to me compared to .net core anyway.
Yes you can definitely pick and choose the parts of Django you want. Sometimes they have interdependencies but you don't need to use it all. Or for example, use Jinja instead of its templating system
Yeah Django feels terrible to me compared to .net core. I already wrote a similar answer to someone else so I will just paste it here
"The developer experience in python was just way worse for me. Autogenerating swagger docs didn't seem possible without manually adding annotations to all my code (for flask at least), the battle-tested python libraries like django and sqlalchemy aren't built for async, mypy is obviously way worse than an actual type system and having static types really helps with understanding a large new code base, model validation (the thing forms/serializers handle) was more tedious, vscode was worse than visual studio (code navigation was very hard to do in python. I could jump 1 layer into library code, but when I tried to jump deeper vscode couldn't find anything.)"
I've seen people say you should only choose rails or django for a startup, but .net core mvc provides the same batteries included approach and is built with the modern web ecosystem in mind. It also runs on linux and easily integrates with postgres so its just as cheap now as well. I think people that write off C# haven't really worked with it in its present form. I didn't find python significantly more terse than C# (other than having to define properties on types, which I already stated I prefer), just less feature rich.
Or Azure Functions if you want to go the serverless route. Coupled with Visual Studio, you can build high quality, large scale apps at about the same effort of blinking a LED on Arduino.
Are we using the same Azure Functions? I spend more time messing with functions.json and determining binding types and waiting five minutes for error messages to come in than I ever did when I simply provisioned an app service and deployed code.
> I tried to learn F# a few months ago and the experience was horrible.
Switching to a programmig language based on an entirely different programming paradigm is not comparable to switching to a language based on the exact same programming paradigm to develop the exact same application using the exact same design patterns.
Of course it’s gonna be hard if you are trying to learn a language plus a framework at the same time, especially given that F# is treated as a second class citizen by Microsoft. An easier path is to break that learning into more manageable tasks: on one hand getting familiar with F# (which is already some work as it has both a functional side, and a CLR one; you need more the OOP part for doing ASP.net) and in the other ASP.net, for which the golden road is C#.
Otherwise the doc is quite good for ASP.net Core, and it’s rare I get stuck on a problem for too long with it.
I can't speak from recent experience, but on point 1) when I was teaching myself to code, I focused on .NET due to its prevalence in my local market. I took part in the .NET user group (called DUG, natch), attended the meetups etc.
And this was at the time when MS would announce a new blessed way to do things on a reasonably frequent basis. When I started, the blessed way to access data was DAOs, then it was ADO.NET (note, those could be around the wrong way, I have trouble figuring it out now in hindsight), then it was Linq2Sql, then that was deprecated for Entity Framework (which, I'll give credit, they seem to have stuck with, even if it does feel like a half-cribbed NHibernate).
I was frantically trying to learn the blessed thing, because the MS shops I was familiar with only used the blessed thing - the server was IIS, the database was SQL Server, the language was C# (I was the only member of the DUG who coded in F#, and few others knew of it), and you used the blessed patterns and the blessed frameworks.
Incidentally, this is why FOSS has had such a hard time in .NET, as soon as MS releases something that is reasonably feature complete, a lot of single-vendor minded companies switch to it.
And I met a lot of developers in their early 40s who were quietly terrified of getting left behind on the MS technology treadmill, and trying just as frantically to learn the new blessed thing as I was.
And then I got hired by a Java shop, and faced a paradigm where shit code cough java.util.Calendar, java.util.Date cough stuck around for yonks because it was good enough and replacing it had to be done very thoughtfully and gently.
Point #2 isn't super relevant in the age of IDEs, but I agree wholeheartedly that F# deserves a lot more love than it gets.
ASP.NET Core 2.1 was released in 2018 and will be supported until late 2021.
ASP.NET Core 3.1 was released a few months ago and there is no end of support in sight. Moreover, the changes between 2.1 and 3.1 were not that many. I've migrated a whole ASP.NET Core 2.1 web service to 3.1 in less than 1 hour.
> 2) C# is a very verbose language, that requires a lot of typing.
Nonsense. The only added verbosity to C# when compared with Python are the type declarations, which arguably are a problem plaguing Python. The first class support for events and async programming and properties in C# more than make up for it.
> 3) F#, the best language in .NET, is largely ignored by the .NET community.
> ASP.NET Core 2.1 was released in 2018 and will be supported until late 2021.
Which is way too unstable, especially for the kind of corporate environment c# has typically been used in. Getting those places to upgrade to stable supported versions of the framework has always been a battle even when backwards compatibility was great, if they have to deal with breaking changes every few years they will never upgrade.
This is why so many companies stick with their ancient COBOL systems, most modern alternatives don't offer the stability they need.
ASP.NET Core 2.1 is the LTS release of ASP.NET Core 2, which was released in 2017. I fail to see how a first class framework with a LTS that was released years ago can be described with a straight face as "way too unstable".
> Getting those places to upgrade to stable supported versions of the framework has always been a battle
ASP.NET Core 2 is stable since at least 2 or 3 years ago, depending on how you decide to count.
> This is why so many companies stick with their ancient COBOL systems, most modern alternatives don't offer the stability they need.
This assertion is simply wrong at so many levels. Don't confuse "why waste money maintaining working software" linesof reasoning as a sign of respect for stability.
More importantly, it's disingenuous to even think of the technical debt that keeps cobol on the map as relevant to the world of web services.
> I fail to see how a first class framework with a LTS that was released years ago can be described with a straight face as "way too unstable".
I fail to see how you can call 2 years of support an LTS with a straight face, it's taking the piss out of the term. The LTS of the OS I'm likely to run it on is supported for 8 years. 2 years isn't even enough time to finish many projects on the same LTS it started on.
At work we've got 30 year old c/c++ code bases that still run, they'll probably run for another 20 at least, we've got 20 year old python code that still runs (for now) and we've got 20 year old c# projects that still run. That last one will never get rewritten in .net core, in part because they've pissed away the stability the framework had. It would be crazy to use tools with 2 years of support for any of those projects.
> Don't confuse "why waste money maintaining working software" linesof reasoning as a sign of respect for stability.
Why should they waste money maintaining working software when there are stable options available? What does upgrading to asp.net core get them? Why should tens of thousands of companies waste money modifying working software just because someone on the core team thought the existing API was inelegant or too hard to maintain compatibility?
The strengths of any programming language can also be their weaknesses, so there isn't really a "best" language.
Notice a language can't be everything. They're either too verbose, too terse/cryptic, or trade ease of development for lack of control/performance/efficiency.
IMO, C# strikes a good middle ground on such matters...
It does not take that much time to get up to speed on another tech stack, but it does take some time, and servers are really cheap. Even if the learning curve is close to none, it's just cheaper to rent a dozen machines than have an engineering team spend a day or two looking into a new ecosystem.
> How hard is it to get up to speed on any other tech stack?
If I find myself debugging python tools, I usually just add debug statements to figure out WTF it is trying to do, and reimplement it in bash. It invariably is less than 10% as many lines of code, and also more debuggable / readable than the original.
Granted, most of the python scripts I see these days are build processes or cluster coordinators.
Given how badly bash goes wrong if spaces in filenames sneak in, or if you want to do error handling, at several places I've had a policy of rewriting scripts from bash into python. It has the advantage that it's usually a better cross-platform solution than running bash on Windows.
> It's totally reasonable for a company to choose the Python/Gunicorn option
Yes, if done properly. The issues with the particular Python/Gunicorn setup the author described in an earlier article (linked to in this one) were not so much Python/Gunicorn issues as "not understanding how to properly use Python/Gunicorn" issues, or more generally "not understanding how the tool you are trying to use actually works" issues. (I actually shudder to think what such a group would have done trying to program the same application in C.)
Missing an order of magnitude? Twitter was claiming to be still running Rails in the mid-2010s, when they apparently had 200+ million active users. The service was finally sunsetted during the big Scala rewrite.
They abandoned their message queue (Starling) written in Ruby early on. That abandonment was often misattributed to them abandoning Rails, I guess because of the shared usage of Ruby. The confusion was compounded to back when someone at Twitter (may have been Odeo at time?) posted a message board rant about how Rails does not scale because ActiveRecord did not allow connecting to multiple databases out of the box. That event seems to be the origins of the "Rails does not scale" mantra that swept the internet for a time. But, humorously, someone replied with a solution a few minutes later.
But how many instances did they have running that app, and at what cost? Did they have to build a ridiculous amount of caching in? And wasn't that the time period where they were incredibly unreliable to the point that their "fail whale" server error page became a running gag?
Twitters architecture was a bad joke. Many to many communication in a scalable manner has been a solved problem from decades: you federate. You assign users to buckets, you assign buckets to servers. You route messages like you'd route e-mail. Been there, done that. The federation does not need to be outwardly visible.
Twitters problem was a too centralized architecture not Rails.
And I say that as someone who at the time hated Rails and who still hates Rails. I find it bloated and over-complicated. It may even have led them to make bad architectural choices because of how it was structured.
But they still did make bad architectural choices, and they fixed those choices at the same time they moved off Rails.
Not to mention that python is plenty fast compared to the time it takes to write stuff to the network. Of course heavyweight frameworks like django don't help the equation, but writing fast network code in python isn't exactly hard either.
This is something a lot of people don't get with most higher level languages.
My first commercial use of Ruby was in 2005. Not web facing, but messaging middleware. As in a pub-sub type passing of messages between various endpoints.
We had a C version. It was about 7k lines to support the bare minimum we needed. As an experiment to teach myself Ruby I wrote a Ruby implementation. With the usual caveats (it's often easy to make a rewrite better in all kinds of ways, including size), it was ~700 lines, far easier to read, and supported far more functionality, so I put it in production.
Was it slower? As usual that depends what you mean by "slower". It consumed 10x more CPU, but it also did much more work (e.g. supporting more flexible routing of messages etc.). The throughput, however was the same, and 10x more CPU means that maxing out the network connection took 10% of a single core instead of 1% of a single core.
For some types of tasks CPU is the most important thing, but for a lot of tasks you'll be IO limited. And for a lot of tasks that people think are CPU limited are really down to poor IO handling (causing excessive context switches e.g. through lots of small reads is a common one)
And other people don't get it is possible to have high level languages and almost C like performance.
You don't need to give up on JIT and AOT compilation to use high level languages, and this is where current tooling for Ruby and Python ends up losing.
Fair enough, I've been playing with D a lot lately, which is basically "what if python was a C dialect instead". It's an incredibly simple language to learn but it's no less easy to write in than python. For most things I still reach for python though, probably because I've grown comfortable with duck typing.
side-note when starting with D: make sure to install dub. it's the package manager and basically eliminates makefiles from the compilation process. Just "dub init" and "dub run" and you're off to the races.
> python is plenty fast compared to the time it takes to write stuff to the network
With the proliferation of microservices, I find this increasingly not true. Sure, python definitely is plenty fast when you need to send something to a user many miles away. But with microservices, writing to the network might mean writing to a machine in the same data center, or even the same host in a different container. That's as fast as a few memory copies and a few context switches.
>python is plenty fast compared to the time it takes to write stuff to the network.
Give us some numbers.
Inside the data centre you have 10G, 40G, 100G ethernet connections. I know for a fact that you will struggle to soak a 10G connection using a single thread so I know you can't do this in Python without multiple processes using SO_REUSEPORT.
So use multiple processes with SO_REUSEPORT then. Or find yourself a wsgi server that does, because it's not exactly an unsolved problem.
That said by far most programs don't need to worry about saturating a 10G connection. I'm not writing a file server in python, I'll leave that to nginx or S3. I'm writing business logic in python which tends to be bottlenecked by a database in any case.
Python is great for plumbing together other functionality, which it turns out means most backends you'd be writing anyways. Python is less great at handling a large quantity of data, though most of the time you can get away with handing the data handling to some library (e.g. numpy or libuv or any one of thousands of libraries).
Worst case you can easily plumb in some C-calling-convention code into python. With FFI it's a matter of copying the header definition and you're off to the races. That way you can still write the bulk of the program in python, delegating the bulk data wrangling to C or D or rust or go or whatever you prefer.
The only reason to use Python for anything more than few hundreds lines worth of utility is if you're working on a codebase that's already in Python, and even then it's debatable.
There simply isn't an excuse for using Python for any infrastructure. It does nothing particularly well - or even right - other than very purpose-specific scripting. It can tie things together well enough. And your codebase becomes a liability rather than an asset.
I always say this opinion when it comes to Python discussion in HN and I always get downvoted but hey, "all it takes for evil to triumph...".
So you always get downvoted to hell whenever you post this and you've decided everyone else is the problem? Can you even understand how you sound? I cringed with sympathetic embarrassment just from from reading this. Seriously, rethink your life choices man.
This coming from someone who has literally never written a line of python in his life.
"Rethink your life choices" — because they have opinions that are unpopular and they express them anyway? I might not agree with those opinions, but I fail to see how discouraging them from expressing them is in any way good.
People expressing unpopular opinions and defending them with evidence is how we find out when the popular opinion is wrong, and how, when our discourse is functioning properly, we can gradually change the popular opinion to be less wrong. You, and the people downvoting the comment, are throwing a monkey wrench in those works.
I love Python and use it most of the time when I need to get things done quickly, but its runtime efficiency cost is becoming increasingly unappealing for three reasons: the end of Moore's Law, the concurrent rise of manycore and SIMD, and the steep rise in Python's footgun count. And now there are better alternatives.
— ⁂ —
In 2000 or 2005 Python was a simple, consistent, practical language with a policy of strict error handling that was very useful for producing reliable code: "Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it." Its runtime cost was significant but bearable, about a factor of 20–40: if you wrote your code in C instead of Python, it would run 20 to 40 times faster, and that was all the machine could do.
In 2020 Python is an overcomplicated, inconsistent, slow, unreliable language with a persistent schism resulting from the core developers' poor choice to make backwards-incompatible changes to simplify the language. It has metaclasses, superclass method resolution order linearization to enable mixins (two different ones, in Python 2), a lazy sequence construct that has been gradually Frankensteined into a general coroutine construct (with an additional lazy sequence construct added on top), two different incompatible language constructs to compensate for the lack of block arguments or full-fledged lambdas (decorators and context managers — I'm excluding generators here since they're more powerful than block arguments), and on and on. The reference documentation for "import" alone is 20 pages, and that's in Python 3, the simplified version of Python.
Python's performance cost has not increased in absolute terms — in fact, it's even improved a bit — but it's increasingly painful. In 2000 we could rest easy knowing that whatever we wrote in Python would be sped up by Moore's Law and Dennard scaling, roughly a doubling in speed every 18 months, so in three years it would be four times as fast, and in three more years it would be 16 times as fast. That, together with a little judicious implementation of inner loops in C, was a small price to pay for getting things done sooner and not having to open core files in a debugger.
But then Dennard scaling slammed into a wall around 2006 and Moore's Law sank into a swamp around 2016. Meanwhile, manycore meant that without multithreading, or at least multiprocessing, your program suffered an additional order of magnitude slowdown. Even US$40 hand computers now feature quad-core CPUs. Today, the gap between what the machine can do in absolute terms and what it can do when saddled with Python is a gap of 1000 or 10,000, not 20. If you can cope with the limitations of PyPy (it supports Numpy now! Since 2017) then you can get up to the speed of single-threaded C, which is about 3% of what your computer is capable of. But it's not going to get faster just because hardware progressed: computers will maybe be twice as fast in five years, at best, and maybe not. If it's too slow today, it'll probably be too slow then too.
But that's not the worst part. Python's completely botched Unicode handling introduces bugs into most Python programs that handle strings from the outside world, latent bugs that only surface once those strings contain non-ASCII characters — similar to the situation with bash scripts and filenames containing spaces, although that can be detected by purely local analysis (missing doublequotes around a $var, red alert!). Plan 9 had already demonstrated one correct way to handle the situation (the one used in Golang and Rust) and Markus Kuhn's UTF-8B proposed another, one which was eventually partially implemented in Python as PEP 383 ("surrogateescape") but turned off by default. I've had bugs in on-orbit satellite control software that I couldn't track down because Python generated a UnicodeDecodeError when it tried to log the stack trace.
— ⁂ —
At the same time, other alternatives got a lot better. Java grew into a mildly reasonable language, and Kotlin and Clojure are outstanding ones. Haskell, defying everyone's expectations, became practical. Microsoft started trying to embrace and extend free software, so now we have F# on Mono, which is almost OCaml — almost as convenient and concise as Python, but enormously less bug-prone. Mike Pall, a superhuman intelligence from the future, wrote LuaJIT, which gives you performance on par with C in a language as friendly as Python — not modern overcomplicated Python, old Computer Programming For Everybody Python. 100 million people, including little kids, program computer games in Roblox using Lua. (It's bug-prone as hell, though. Lua has a footgun for each toe, as Sean Palmer says.)
Even C++ has been tamed somewhat. And of course we have Rust and Golang. Golang is only a little bit uglier to program in than Python, and both of these new systems-programming languages make it a lot easier to take advantage of manycore, though not SIMD.
Switching from Python to Golang is as easy as switching from Perl to Python, but with a lot more benefits. And that's a big reason why a lot of the important infrastructure software written over the last decade has been written in Golang.
On the horizon, we have things like arcfide's Co-dfns APL compiler, Matt Pharr's ISPC, and GLSL to show us how massively parallel programming, including SIMD, can become accessible to mere mortals. They aren't yet practical options as alternatives to Python (except that GLSL is practical for its original purpose, of making nice graphics), but they might be pointing the way to something that is.
— ⁂ —
So I think it's eminently defensible that Python should now be consigned to "a few hundred[] lines worth of utility". Python is great for scripting TensorFlow, and it's a far superior substitute for MATLAB. But writing infrastructure in Python in 2020 is like writing infrastructure in Perl in 2005.
...which I was also doing. I think I probably owe an apology to a lot of folks at Aruba Networks who are maintaining that code today.
I primarily do all sorts of systems programming, but the few web backends I did were C# on Azure Functions and it was spectacular. I’m a very big proponent of Microsoft’s tooling and language development.
The difference in productivity between C# and Python is such that they might as well have been developed by different civilizations.
(I have occasionally used Python since about 2003 and C# since 2018)
> The only reason to use Python for anything more than few hundreds lines worth of utility is if you're working on a codebase that's already in Python, and even then it's debatable.
One important reason for using Python is that it almost always forces people and companies to release the source code.
I will take a shitty Python script over shitty C/Rust/Go/Java code any day for that reason ALONE.
With source code, I can fix your shitty program (and all programs are shitty--even mine). If it's compiled, that path is blocked.
I think going back to basics would be a really good idea for a lot of people in software. It seems like modern developers are more disconnected than ever from the reality of the hardware situation sitting right next to them. I believe there was a post on the front page today detailing a certain 22ms Hello World execution...
First of all, it does not take "that much machine" to serve a fair number of clients. Ever since I wrote about the whole Python/Gunicorn/Gevent mess a couple of months back, people have been asking me "if not that, then what". It got me thinking about alternatives, and finally I just started
writing code.
```
Another day, another questionable premise for a blog post. No, @rachelbythebay, the question is not "if not that, then what", and the answer is not reinventing the wheel. The question is just "what ___" -- what do you plan on doing, what does your software need to do, what does it need to support? Use the right tool for the right job. If you have a language you're proficient in and with an ecosystem that supports you developing something rapidly, it's borderline malpractice not to start there. When you need to optimize, optimize then. Maybe that means you carve out a subcomponent into a new service, and you choose a language purpose built for speedily doing what you need. Maybe it means a lot of things, but it doesn't mean you throwing out the baby with the bathwater and setting out to recreate the baby, the bathtub, and the bathwater from scratch to answer the question of why your tub is overflowing.
I wish this blog post was about solving real engineering problems instead of writing code to provide mediocre answers to poor questions.
> If you have a language you're proficient in and with an ecosystem that supports you developing something rapidly, it's borderline malpractice not to start there.
I think the point of this post is that most new programmers do not know things can be faster than their monstrous JS blob. The solution to slow requests is more servers instead of fixing the code.
> I think the point of this post is that most new programmers do not know things can be faster
Related to this: programmers who know an ORM or two but never learn SQL proper.
At my job I refactored a giant, slow, memory hungry reporting task into a single sql query. It used to take 10s of minutes to collate 1000s of datapoints. Now it takes 10s of miliseconds to collate a factor 10 more data. Never mind after I added some indexes to speed the thing up.
Knowing about the layer below the abstraction you're working at can be rather useful at times.
What's telling here is that you refactored the individual task into a single query, and nothing outside of that. That's fantastic! You scoped your work into a small unit and derived a significant impact. You maximized your impact to effort ratio.
Notably, you did NOT decide to write your own dialect of SQL to "scratch an itch". Knowing the layer below the abstraction you're working at can be useful at times, but only in relation to understanding the context of how all the layers fit together and making effective, pragmatic decisions. Pointless rewrites are anything but. Good on you for avoiding that impulse and doing the right thing.
It's fascinating that you see the misguided impulse being prematurely adding more servers rather than fixing the code. I would agree with that, but I would also see the premise of this blog post as a similar misguided impulse -- rather than fixing the code, writing a completely new part of infrastructure from scratch in a new language because the old one wasn't good enough. Unless you've exhausted reasonable attempts to optimize existing code inside the language the original code was written in, I believe that following this impulse would be (as I alluded to in my original comment) borderline malpractice.
It’s not aimed at introductory programmers. It’s aimed at more experienced programmers to suggest to them that they can show new programmers what’s possible.
People don't need to explicitly audience-tag every single thing they write. I don't know if "entitled" is the right word, but it's really an annoying demand. Most times, if you can understand something, or are willing to Google the jargon, then you're part of the audience, and that's plenty of precision.
I can Google medical terms until I pass out, but that doesn't make me the intended audience for a medical research paper.
Besides, I'm not suggesting (or "demanding") that all articles explicitly state their target audience. Just that this one does a poor job of indicating it.
So I think it's a poor argument when the parent comment (to which I was responding) assumes an exclusive target audience: "not introductory programmers... more experienced programmers".
You could argue that one was supposed to assume that from context. However, I didn't make that assumption whilst reading, so I especially wouldn't expect "introductory programmers" to pick up on that either.
You can probably understand most medical research papers after reading a few dozen medical Wikipedia articles and taking an introductory stats class. Math and physics research is a little more difficult.
What part of "I think some of us who have been doing this a while have been doing a terrible job of showing what's possible to those folks who are somewhat newer." didn't you understand?
Compare this to the Enterprise platform I am dealing with on a current project, which has 4 x ec2 nodes with 8 CPUs and 32 Gb RAM.
I can DoS it with a single java client running 50 threads. If I use 100 the p95 shoots up to 30 - 40 seconds.
But the kicker is that no one (other than me) really cares. 50 concurrent threads is probably around the peak load it will get in prod, and various people involved think why bother trying to fix it?
Oh man, don’t remind me. We have a bunch of GraphQL proxies in ECS that somehow cannot handle more than 5 connections each, so naturally the solution is to just spin up 19 more of them to get to 100 concurrent connections...
Honestly, I think that's what makes working at a large "web-scale" company so attractive to me. When you're running at that sort of scale you can't afford to be as apathetic about performance so there's a lot more engineering effort put into efficiency, because it makes financial sense to do so. OTOH, in a lot of enterprise type companies, you can be nothing more than a "feature monkey".
The only endpoint to ever do anywhere near as little work as this example would be the heartbeat. Most of the cost comes from a combination of a bunch of things other than simply fetching some data from a local device:
- Web-anything is fetched off of a DB nowadays. That's another crapton of latency, because a) it's just easier to understand its characteristics if it runs on a separate system and b) most companies IME have either no DBAs at all or the DBAs have no time to look in depth at every system being built. So the DB resources are vastly underutilized, and the DB itself is badly understood.
- Cache invalidation is still the hardest problem, and every caching framework I've looked at seems to gloss over that part. Just cache all the things and hope people retry enough times to get the latest update. I would love someday to work on a system where things are aggressively cached at every level and invalidated at every level and with perfect granularity.
- Building for web scale from the beginning is premature optimization for the vast majority of companies. In the vanishingly unlikely scenario that the company actually grows 100-fold or more it makes sense to start investing heavily in performance. Of course, this also has the knock-on effect that the vast majority of software developers never get anywhere near a web scale system. OTOH it creates jobs for millions of developers, some of whom might end up building at scale someday.
Another elephant in the room is that building anything at web scale is just not something anybody straight into the workforce is anywhere near qualified for. We desperately need more focused learning (mentoring, pairing, etc.) across the board to bring everybody up to speed faster.
The typical response to these types of posts is "oh your /toy/ server doesn't account for x, y, z in my use case, like ddos, network issues, etc. But how many people actually handle those cases in your production application? I can say for the majority of the applications I've written at large companies handling significant traffic API compatibility was far higher on the priority list than the cases that people often bring up.
IMO she's right, I wish we didn't mess wrap around gunicorn and gevent for some of our services. Certainly would've made my life easier and the services faster.
What WSGI do people recommend for python? I've been using gunicorn but this made me think of alternatives. Quick google search found this benchmark [0], is it really that bjoern is much quicker? It seems all other WSGI are ~ equivalent.
I've been using gunicorn quite happily without the gevent stuff that Rachel ran into issues with. Running 2-3 workers per core is enough for my application to make full use of the CPU. It is a "waste" of memory, but memory is so cheap and plentiful I've never had an issue with it.
bjorn is fast because it has a minimal feature set. No threads, no multiprocessing, no nothing. If it works for you, great, but it's never satisfied my requirements.
I'd avoid uWSGI, it's performance is good but it's so complicated with so many features that I never felt confident using it.
Never used waitress except in development, but people seem to have had success in production.
Do you have a performance issue that you have tracked back to gunicorn? If not, then continue using gunicorn. There are almost certainly more valuable things to spend your time on than replacing something that is already working.
IDK but the mod_wsgi, although there seems to be a ton of documentation... It's way harder to install than a virtual environment, python and django. I love doing backend stuff in python and jinja but I fucking hate setting up wsgi on apache. It really makes me love php again.
Is there any reason why do you need mod_wsgi and apache? gunicorn behind nginx is an ideal setup for me (nginx handles static files and reverse proxying while gunicorn handles the python backend). I wrote a Dockerfile that combine all of those in a single package and I've been using it with very little tweaks for many of my smaller projects over the years and it makes deployment very easy.
I've been using uWSGI behind NGINX (using uwsgi_pass) for years. It might not be the absolute fastest option around but it's faster than a lot of the other options, rock solid and extremely configurable.
Some fun extra features that uWSGI provides:
- Daemon management: It can manage other daemons, for example Celery, for you, so you can put your whole environment into a single uWSGI.conf.
- A cron-like interface for generating events on a schedule.
- Emperor - hosting of multiple apps. This is extremely configurable, you can have the server look at the filesystem for config files for the simplest setup but it also supports AQMP, a Postgres database, MongoDB, a shell command or a bunch of other things. It has a bunch of interesting isolation options, like running each app in its own Linux namespace. It also has (optional) socket activation, so apps won't be started until the first request.
- Auto-scaling
- Soon, multiple event-loop/IO subsystems to choose from, including asyncio
I've been meaning to have a look at nginx-unit though.
The thing to note is that waitress is not built for the highest speed, or the fastest, or anything along those lines.
Its primary use case is that it is pure python, doesn't rely on any specific libraries or compilers to run/build, and is a threaded WSGI implementation so it uses Python threads to run a WSGI app.
It works well for what it needs to do, and hopefully it is fairly robust. I've personally ran waitress directly facing the internet, but will readily admit that in most cases running it behind a load balancer is a good idea, especially since it doesn't support SSL out of the box (yet, I should say, it's on my roadmap).
It won't win any speed contests and it won't win performance contests, but it holds its own.
I use uwsgi at work, we've found a lot of bugs in the more interesting features of the uwsgi module, and had to fix some signal and memory leak issues (I think we're still trying to get them upstreamed). That said, it's reasonably fast and the WSGI interface in general is pretty pleasant to work with.
Always uwsgi. It's the hidden gem of web serving. It's even useful for other things than Python, and has the operational features you would expect (such as graceful restarts). The more exotic features are a bit less tested however, and it shows.
The general attitude here reminds a bit of the following post from the architect of the Varnish proxy. I think the attitude comes down to the fact that modern kernels and in general the foundations of network programming are pretty strong. We should trust them more.
I would argue you can say the same about foundations of rdbms systems as well. People build similar elaborate caches around those, for example rails has a “russian doll” cache layer built in, not realizing how much time has gone into developing well tuned caches within the database system itself, which simply needs
to be allocated sufficiently large ram.
Honest question: why go through the hassle of multiplexing waiting in a single thread only to dispatch to a thread per client anyway? Simply using blocking IO for the clients in those threads should be much simpler right?
If you get stuck in read(), you can't do neat things like waking up when it's time to kick a client for being idle, doing other housekeeping, or cleanly shutting down the whole thing in a timely fashion. When I ^C the server, it sends the same wake condvar-poke but it twiddles the flags so the worker shuts down instead.
Yes, using epoll with nonblocking I/O is better than blocking I/O on each worker thread. But that basically means that you are doing asynchronous programming--i.e., the exact same thing that the wonky Python/Gunicorn stack you described is doing! You're just doing it with better attention to important details.
Here, to me, is the key item:
The "listener" thread owns all of the file descriptors (listeners and clients both), and manages a single epoll set to watch over them.
This is exactly what any async server does: it centralizes all the file descriptor management and handling in one place, and only uses workers (whether they are threads or "green threads" or whatever) to read from/write to fd's that are marked as ready in the epoll set.
For your case, unless I'm misreading something, what the workers are doing in between the read/write is CPU intensive (or at least it's CPU work and not I/O work, even though it's not very "intensive" CPU work), so actual OS threads are a better choice for the workers since you can't rely on cooperative scheduling.
If what the workers were doing was I/O work (for example, sending a request to a remote database and waiting for a response), "green threads" would work fine (since their only real purpose would be to organize the I/O--the actual fd's are going to be managed by the central server that manages all the fd's and checks which ones are ready for read/write). And one definitely should not try to run "green threads" for the same server in multiple O/S threads (or worse still, multiple OS processes). For an I/O bound server, one shouldn't need to anyway.
> If you get stuck in read(), you can't do neat things like waking up when it's time to kick a client for being idle
Totally possible with another thread acting as watchdog timer and sending a signal which causes the read to return with EINTR which can then check a flag whether it should retry or abort. And that's for file IO. For socket IO you can just set it to non-blocking.
File I/O is usually not interruptible with signals.
An alternative to putting the watchdog timer in another thread is to use alarm(2) and use the kernel's built-in watchdog timer, and the default behavior for SIGALRM is probably adequate. This might be easier than non-blocking I/O.
Good point. Would still be possible with the threads being blocked in a read() but that would require signals and more logic in the threads, so a central multiplexing and coordinating thread seems like a cleaner solution.
I think the answer is in the previous article about Python linked in this article: to show that you can serve more requests with less resources if you avoid the Python/Gunicorn way and do it her way instead.
When was this time? In BSD, which introduced select(2), most network servers ran from inetd. I got a stern talking-to from the computer security folks for running a process in my .cshrc that would repeatedly finger someone at another university, where their fingerd ran from inetd, because I was making their shared VAX run unacceptably slowly. Early versions of httpd included instructions on how to run it from inetd, along with a note that you would probably regret it.
Are you thinking of, like, CICS systems from the 1970s connected to SNA or something? I mean they didn't have select(2) but they did serve many clients in a single process.
IRC servers were indeed written using select(2), because interaction between the clients is the whole point of IRC; that gets a great deal more difficult if you spawn off a separate process for each client! I think ICB/FNet predates IRC slightly and was also written using select(2). But IRC wasn't written until 1988, at which point select(2) and inetd were already about ten years old, and things like (most) FTP servers continued to run one process per concurrent client throughout the 1990s. (Walnut Creek CDROM wrote their own high-performance event-driven FTP server, IIRC.) Typically on the machine where you were running the IRC server you would also be running about a dozen or two other servers from inetd, all written with the one-process-per-client model.
If there was a time when all network servers were written using select(2) or similar event-driven APIs, it wasn't 1988 or later.
No, it isn't, it's doing the same thing as a select(2) server would do, except it's using epoll to avoid scaling issues when you have a lot of file descriptors in the polling set. The only difference is that the workers are doing something that requires CPU, not I/O, so OS threads are being used for them (a single threaded server would be fine if the workers were just doing more I/O, like sending a request to a remote database and waiting for a response). But the worker threads are not doing any I/O management at all; they read from or write to an fd only when the central server that is calling epoll tells them to. In the listen/accept/fork model, the central server forgets about an fd once it has passed it to a handler process, and the handler process using blocking I/O.
> Ever since I wrote about the whole Python/Gunicorn/Gevent mess a couple of months back, people have been asking me "if not that, then what". It got me thinking about alternatives, and finally I just started writing code.
I actually want to know: then what? As a web developer who usually reaches for Django or Flask with Gunicorn because I just don't know any better, is there a better stack that doesn't face these problems? Or is this a 'call to action' for somebody to build a better web server that follows this advice?
Every time I read one her posts, I think, wow she must either work for literally dirt cheap or her code is expected to run on several thousand machines. Most projects just don't reach the scale or steadystate where developer time is cheaper than machine time. It's fun trying to squeeze the last drop of blood from stone, but it rarely makes economical sense.
As we've seen in the cloud costs thread, going from python on AWS to Rust on dedicated hardware can push your bills from 30'000$/month down to 300$/month, which more than pays for the dev salaries (especially if you're not paying the excessive and unnecessary salaries of the bay area)
How many hours would you expect that rewrite to take when done by Python developers learning Rust as they do it?
And what's the opportunity cost of the new features they can't create because they're rewriting existing apps?
If you're arguing that Python has more runtime overhead than Rust, I don't disagree.
But there's a reason people invented higher level languages than C. Rust is a far nicer systems language than C, but is it faster to develop in than Python or Kotlin? It really depends.
I think the argument of the article is to pick the right language in the beginning so you save both developer time and runtime and don't have to rewrite in a better language later. For example, Python is a lot slower than threaded C, but quicker in terms of developer time and is memory-safe. If you pick one of the newer JVM languages though, you can get far closer to C than Python in performance once the JVM is running and still keep most of the expressiveness.
The reason I’m mentioning Rust is because you can still build your fun part of your application in python, and easily use the FFI to build any performance-critical part in Rust, C or C++.
That said, with Rust or C++ you can – depending on the situation – be clearly as effective, or even more effective than with Python, and you gain an enormous performance benefit (which in turn saves you money, which in turn means you can hire even more developers)
It is not clear that Rust can be as fast to develop in as Python or Kotlin, for the typical run of coders--people using Rust now are mostly well above average--but it is abundantly clear that, supported by good libraries, modern C++ can. Faster would be a tall order, but is not necessary.
Put in the same effort, and get 10x-1000x faster code or lower resource needs. Why not?
I've gone down this rabbit hole recently, I was using apache because it come with a lot of nice to haves beside http handling, like authorization, session and cookie modules, but I wanted to see if c and cgi were viable for modern web development and the results were fantastic. Response times of 1ms while hitting an sqlite database every. I didn't go much further with the performance testing but it could easily handle more than most of the enterprise crap I build ever needs.
C and CGI are a great simple combo. When you get down to it 99% of web development is shuffling data from sql to html and vice versa. It's so stupidly simple that the main code doesn't even hit the rough edges of c, there are almost no allocations for isntance so no memory management to worry about, everything that could possible leak memory or cause security issues is in a handful of utility functions.
No MVC, no DTO's, no view models, no service layers, no templating, no client side javascript, just reading from and sql connection and printf'ing html to stdout. Without all the useless complications I ended up with far less code than the typical equivalent in a high level language with a framework. I'm now fairly convinced that most of the projects I've worked on would have been better off this way.
As a fellow Python user, honestly I think you're fine. Rachel's issue with gunicorn stems from the gevent stuff, which you probably don't need and I would avoid unless you do.
I'm a bit mystified by Rachel's use case—A green thread hogging the process for so long that another request times out on the client? That means that requests are doing large amounts of processing and latency requirements are super tight, which sounds like a very specialized use case.
If you're shy about managing your own memory and want to stay in a higher level language you could use the Quarkus framework or Micronaut which are Java on the JVM. Kotlin is a super-set of the java language and also adds a lot more syntactic sugar. I don't have experience w/ kotlin, YMMV. The introduction of Lambdas and the choice of modern post-EE java frameworks really takes a lot of the grind out of programming in Java. With the upcoming data classes it's going to get even better.
Actually, now that i'm re-reading this, it's a disaster. She claims to write this to help newcomers, yet everything is so cryptic.
`it kicks off a "serviceworker" thread to handle it.`
It doesn't even tell me how I should do this. I don't know what 'serviceworker' code looks like. I don't know what 'kicking off' means.
This post reads like it was made for maybe 3 or 4 people in the world who truly 'get it' and if you aren't in that elite club you're a terrible engineer, apparently. There's not even a lick of example code to get an idea of what is going on. Engineering is a big word that shouldn't be used in this blog post.
The problem with this kind of "benchmark" is that it doesn't measure anything relevant for realistic situations where thousands of connections will behave in arbitrary unexpected ways, like just stalling, and real web applications have absymal worst-case behaviour in case of I/O, cron-jobs running in the background, network hiccups etc.. You don't provision your systems with only best-case situations in mind. But sure, if you want to serve useless random numbers without ever even hitting the disk, a Raspberry PI is able to saturate its network connection.
The C10k problem was challenging around the turn of the century. I suppose it's now not. I wonder how much CPU would be saved using an event-based architecture.
C10K was solved by switching to an event system like epoll or kqueue. This decreased the big O complexity of kernel<->user information flow so that you don't pay more for listening on more sockets.
C10M seems to be solved by colocating the network stack and app stack data structures by running the driver and the app in the same context. This can be achieved via DPDK-like schemes or pushing more into the kernel like Netflix's work to push TLS into kernel sockets that you can sendfile to.
> By the early 2010s millions of connections on a single commodity 1U server became possible: over 2 million connections (WhatsApp, 24 cores, using Erlang on FreeBSD), 10–12 million connections (MigratoryData, 12 cores, using Java on Linux).
Even in 2007 when I was starting to cut my teeth on larger web traffic there was a lot of discussion around serving 10k concurrent connections. I remember being blown away by a graph showing high throughput at 70k concurrent by a YAWS server, and started following Erlang as a result.
Oh, wow, I vaguely remember running across eDonkey at one point. I don't think I ever realized it could handle that kind of load. I was in a position to mostly stick with Apache for various non technical reason basically for long enough that eventually Apache got to the point that it could handle the traffic I needed to deal with especially with a CDN in front of it.
You must know a lot in order to write a slow performing program. If you just do a simple program like in the article, it will be fast, even if you do stupid things like spawning a new thread and file descriptor for each connection.
Now in order to make it slow you must know enough stuff to introduce complexity to the program. To make a slow program you probably have to learn about frameworks, php, micro services, cloud databases, etc.
I know pcwalton has been banging the "just use threads" drum for a while now. Rust used to use a green thread approach, but if I recall correctly, they ripped all that out for just native threads with the idea that in most cases it's fast and efficient enough.
I remember when C10K was the big challenge, but even the naive approach of spawning a thread per connection now can handle that.
I've written a "just use threads" approach to a web server in Rust, and if you're just juggling IO and not doing any real work like most CRUD webapps, just about every other async-capable language scaled more gracefully and used less resources.
And tomorrow there will be an article describing an exploit in a web service written in C/C++.
Does she even see the size of the batteries included in a framework like Django? Here is some brand new information. Everyone knows Python/Django is slow and inefficient (same for rails) but they still use it for those batteries.
It is easy to write some web service in a low level language that just crunches some numbers for your benchmark. It quickly gets complicated when you start thinking about accounts, security, passwords, databases, orders, carts etc.
The premise is that people don't understand their system, or software, and need to be told how it works. So you have an admittedly ignorant person telling other ignorant persons that there are possibilities out there. That's... not really useful.
What is useful is to spend 2 days researching massively concurrent network applications, and find the ones that were already written, and see how they evolved over time. For the most part, it's just understanding your operating system and network protocols. Once you learn how they all work, the answers come quickly. This problem has been solved many times.
But for the most part, none of you need to know how this stuff works. You can write the crappiest network server in the world, and there are still so many other components you can wrap around it to make it scale that you never even need to get close to network optimization. So you may have to run 10 instances of your app; who cares? We have virtually unlimited everything these days. The answer to a poorly performing app is "just throw more cloud at it".
Also, I feel the need to remind everyone that single app instance handling 100K+ concurrent connections is a terrible fucking idea. What happens when the app crashes? What happens when the hardware dies? What happens when you need to, like, upgrade/restart the app? Several million SYN packets in 100ms, and default kernel TIME_WAIT settings, do not make great bedfellows.
Stuff like this worked great until someone tries to DDoS you, or found some buffer overflow which took over the box. I dont like the new world either but it was built this way for a reason.
said it has no real purpose yet. that's key. no doubt she knows what she is writing about and also picked a favorable language. but the gunicorn folks have also been doing this server thing forever now. they probably have their share of stories.
should try this in erlang/elixir. i’m gonna bet you it could handle hundred of thousands of connections on a beefy machine (an million of connections w/ optimizations)
why tho? In real life if you're in need of handling millions of users per second, I bet you're already part of FAANG, at which point you simply open offices in each country and deploy local servers.
Doesn't WhatsApp story contradict to what you're saying? They handled millions of users per second, weren't part of FAANG and this helped them get acquired by FAANG and a high valuation for being a very nimble team and architecture.
how that contradicted me? I see this as exactly opposite, as in they became part of FAANG exactly because they managed to handle those users. And let's be real, they handled millions users/second after becoming Facebook. Like it or not FB is no.1 social network.
They got aquired in 2014. They hit 400M users in 2013. By then it was already THE messaging app for several European countries.
I know amount of users doesn't equal users/second. But we shouldn't pretend like they weren't able to handle high traffic before Facebook aquired them.
400M in 2013 globally and millions per second does not equal. A simple math says 4x10^8 / 24 / 60 / 60 = ~4.7k users / second. Kind of 3 orders of magnitude lower. Even if you say most users are not spread evenly over those 24 meridians due to Earth being mostly water it will still not get to those millions per second.
for how much time? you see lab result and real life results are different. Love it or hate it currently there is only one company that can deal with millions of users per seconds and that's Google. No one else does it, not Amazon, not Facebook including their whatsapp (a small math from my above calculation says the number goes from ~5k to ~50k if you say whatsapp has 4 billion daily users, which I doubt it has), not Microsoft.
Each time football Cup is active Twitter goes down. Same for plenty of big names when they launch a hyped service (Blizzard for example is another one). Scaling up from lab to real life dealing 24/7 with those millions/second is an entire different beast.
yeah no. a lot of people can do this. i don’t know why you think google is special but it’s not. maybe you can share so that i can understand your angle
yeah. to be fair those are websockets which are sort of the same thing but not really (and performance optimization can be made more easily since the server [mostly] decides when to push stuff)
Unfortunately this approach precludes GIL languages like Python. If you're able to use a language/runtime that is amenable to multithreading, then using a thread per connection works fine for most use cases (and it's probably easier than using whatever async/await interface your language has).
> probably easier than using whatever async/await interface your language has
Is it? I haven't used threads directly in a while, but I remember dealing with sinchronization issues. Problems that just don't exist in single threaded node with async await.
I find the async Promise or Task to be a more useful abstraction than the thread. Although, you need threads or a task dispatcher with a pool of threads if you need to run cpu intensive stuff.
> If you’re just handling requests there isn’t much for shared state
No, but if your single listener/server thread is managing epoll for all your file descriptors, you do have to have a way of synchronizing the worker threads with it, so they know when and when not to read from/write to their fd's. I assume Rachel is using some kind of semaphore or other threading synchronization mechanism for this.
True that the GIL prevents the specific expressed pattern you're talking about, but many have been able to do multiple threads and processes with Python itself and also things like gunicorn or uwsgi.
But this isn’t engineering. This is the IT equivalent of building a bridge and driving successively larger trucks over it. In real engineering fields, you can do predictive analyses based on prior empiricism. There’s none of that in our fields until you’re talking about very small systems where, for example, the stack consumption can be determined in advance and the scheduler can give you guarantees about worst-case performance.
And this is to the detriment of all of us. That’s why you’ll never hear me call myself an engineer.
If “real engineering” worked the way you described, we would test planes by filling them full of passengers and flying them around the world.
We don’t. We built wind tunnel models, we taxi them around at higher and higher speeds, we put them in machines that wiggle the wings at high loads. The first flight is a little hop and then right back down. Months later there might be a big ceremony with VIPs where the new plane takes a lap around the airport as it’s “first flight”.
And there are mistakes song the way, giant ones that add years to the schedule and tiny ones that engineers argue over even telling their boss about.
Sure there’s planning and experience, just like she knew to use epoll and not select, and that Linux can handle thousands of threads per process. But there’s no magic to it, just lots and lots of human attention and testing along the way.
She developed a prototype based on a hunch and a whim. Great work, no doubt, but engineering isn't based on intuition and some experience. Real engineering has a goal or specification in mind, and then proves through modeling, analysis, and _finally_ testing that it meets those specs.
Whipping something up and then seeing what it's capable of doesn't qualify.
"Whipping something up and then seeing what it's capable of" in simulation or prototype is also known as "modeling and analysis". Prototypes and testing are definitely a valid, even central, part of engineering.
The gist of it was:
Other engineering disciplines use the techniques you've mentioned because of the the costs, both time and money, associated with getting it wrong.
Software engineering lends itself to different methods of development and construction, as the costs associated with getting it wrong or making changes after the fact are much lower. (For most applications, anyways).
As such, (this definition would be another sticking point) these less rigid methods should still be considered engineering, with engineering being a balancing of resources with outcomes, not fixation on mathematical models.
What is actually happening we don't know. IT is dramatically changing everything. It's not all bad but it isn't all good either. I hate to give examples since it is really vague what goes in and what comes out and people tend to mistake an example for full coverage but... for example, we have no idea what drives suicide rates.
> for example, we have no idea what drives suicide rates.
This is a good point but I wanted to make it a bit more clear.
Across a population we have a good idea of the risk factors and of the things that increase rates of suicide: deprivation, abuse, substance misuse[1], previous self harm.
The bit we have no idea about is how to apply these to an individual person to see if they're high or low risk of suicide.
There are a load of different tools that input lots of different information and put out a risk rating, and none of them are as good as just asking the person "what do you think your risk is?"
What is stress load testing then? Most large companies do perform something to this....
Most large companies do some stress load testing first for any new system. Then they release to few percent of the users (less than 10%) and gradually increase to the rest of the userbase.
I worked at Spotify, when Tidal was launched. They had failed to do proper capacity testing, and the service failed under the load the first week and it showed. But most mature large company tend to be really good on this.
It is remarkable how many tech companies have managed to stay mostly up with very few outages, given this whole pandemic situation, where everybody is online.
I'd say working in large scale deployment is proper engineering .... creating a simple webpage, maybe not. Deploying that webpage to millions of users, it is.
Also, don't forget that bridges have been made since the dawn of time, while tech and the internet are very very young in human terms.
> It is remarkable how many tech companies have managed to stay mostly up with very few outages, given this whole pandemic situation, where everybody is online.
I must admit that it was very gratifying watching all our engineering decisions pay off when our system started seeing record high traffic every day and scaled effortlessly and with no outages - the traffic that came with lockdown is, even on our slower days, twice our planned for and tested for high water mark.
But it took us about 4 years of consistent effort in changing our organisational mindset all the way through, devs, testers, product managers, c-suite members, to get here. 4 years ago, we would've been waking everyone up and going without sleep for a couple of days trying to get something back online, then fixing the next system down the line that failed because of the load, then the next one, and then writing long post-mortems for our business team.
I think the analogy to mechanical engineering is something like this:
A seasoned engineer notices that everyone is only building suspension bridges all of a sudden.
They point out that you can span a stream with some bricks or rocks and a bit of mortar, and are ridiculed.
Next, they build a highway overpass out of concrete pylons, and stress test it to 10x the necessary engineering load, and point out it cost 10% as much as a typical contemporary suspension bridge. That’s this article.
Or vice versa: they notice that everyone is only building concrete bridges all of a sudden, disregarding the traditional and much cheaper approach of hanging boards under some rope handrails; upon being ridiculed, they build a suspension bridge across a river for 10% as much as a concrete bridge.
When you hear "Engineer", you often think of calculus, statistics, formal testing, requirements gathering, documentation, repeatable results, etc... along with a fundamental understanding of the problem space and possible solutions.
I think this is akin to NASA working on the Apollo program vs. someone in their garage attempting to build a go-cart for the first time.
When you just slap things together and see if they work - are you really engineering? Can you exactly repeat the process and achieve exactly the same result every time?
I think we often cross "research and development" with "engineering". Exploring a problem space and tinkering with concepts isn't engineering. Taking what you've learned, planning out and executing a solution to a precise set of requirements, and being able to repeat your steps and achieve those results again and again - is engineering.
>When you just slap things together and see if they work - are you really engineering?
Why not? That's basically what testing is. Which was one of the attributes you attributed to "Engineer" >formal testing
>I think we often cross "research and development" with "engineering"
My general take:
Scientists primarily focus on learning and proving new knowledge & ideas. (i.e. they research)
Engineers focus in using proven knowledge and applying it to design and create things or solve problems. When things are not perfectly certain, they can prototype and do tests similar to how scientists do experiments (e.g. aerodynamics in wind tunnels). (i.e. development)
Good points, but usually when we talk about testing in the realm of engineering, it's a means of verifying something is within the bounds it was designed for... not just to see what happens.
We don't run test suites on our software to see what it does. We run test suites to validate it operates as it is supposed to.
I think the way you described testing is more in line with tinkering and research rather than engineering. It's experimentation, not testing.
When the outcome is unknown and unreliably unpredictable, it's research (tinkering). When it's predictable and has a known, repeatable outcome, it's engineering.
>Good points, but usually when we talk about testing in the realm of engineering, it's a means of verifying something is within the bounds it was designed for... not just to see what happens.
Yeah that's fair.
I had originally skimmed the article, but after re-reading it the author apparently admits they didn't put any careful thought into what they made. No real goal. Just slapped stuff together so to speak. Which I'd agree doesn't quite sit as engineering to me...
When saying "this is engineering" the article refers to all the engineering effort at the OS level that went into making this possible.
As the title says, this is dumb code. But it makes use of years of engineering effort to deliver a result which can get you very far without thinking about the low level.
It might sound like I'm trying to steal her thunder, but mostly what I'm trying to say is she is right. Listen to her. Here is further evidence that she is right.
As I wrote in https://gitlab.com/kragen/derctuo/blob/master/vector-vm.md, single-threaded nonvectorized C wastes on the order of 97% of your computer's computational power, and typical interpreted languages like Python waste about 99.9% of it. There's a huge amount of potential that's going untapped.
I feel like with modern technologies like LuaJIT, LevelDB, ØMQ, FlatBuffers, ISPC, seL4, and of course modern Linux, we ought to be able to do a lot of things that we couldn't even imagine doing in 2005, because they would have been far too inefficient. But our imaginations are still too limited, and industry is not doing a very good job of imagining things.
—
⁰ http://canonical.org/~kragen/sw/dev3/server.s
¹ http://canonical.org/~kragen/sw/dev3/httpdito-readme
² https://news.ycombinator.com/item?id=6908064
† It's actually bloated up to 2060 bytes now because I added PDF and CSS content-types to it, but you can git clone the .git subdirectory and check out the older versions that were under 2000 bytes.