I feel this benchmark compares apples to oranges in some cases.
For example, for node, the author puts a million promises into the runtime event loop and uses `Promise.all` to wait for them all.
This is very different from, say, the Go version where the author creates a million goroutines and puts `waitgroup.Done` as a defer call.
While this might be the idiomatic way of concurrency in the respective languages, it does not account for how goroutines are fundamentally different from promises, and how the runtime does things differently. For JS, there's a single event loop. Counting the JS execution threads, the event loop thread and whatever else the runtime uses for async I/O, the execution model is fundamentally different from Go. Go (if not using `GOMAXPROCS`) spawns an OS thread for every physical thread that your machine has, and then uses a userspace scheduler to distribute goroutines to those threads. It may spawn more OS threads to account for OS threads sleeping on syscalls. Although I don't think the runtime will spawn extra threads in this case.
It also depends on what the "concurrent tasks" (I know, concurrency != parallelism) are. Tasks such as reading a file or doing a network call are better done with something like promises, but CPU-bound tasks are better done with goroutines or Node worker_threads. It would be interesting to see how the memory usage changes when doing async I/O vs CPU-bound tasks concurrently in different languages.
Actually, I think this benchmark did the right thing, that I wish more benchmarks would do. I'm much less interested in what the differences between compilers
are than in what the actual output will be if I ask a professional Go or Node.js dev to solve the same task. (TBF, it would've been better if the task benchmarked was something useful, eg. handling an HTTP request.)
Go heavily encourages a certain kind of programming; JavaScript heavily encourages a different kind; and the article does a great job at showing what the consequences are.
But you wouldn't call a million tasks with `Promise.all` in Node, right? That's just not a thing that one does.
Instead, there's usually going to be some queue outside the VM that will leave you with _some_ sort of chunking and otherwise working in smaller, more manageable bits (that might, incidentally, be shaped in ways that the VM can handle in interesting ways).
It's definitely true to say that the "idioamatic" way of handling things is worth going into, but if part of your synthetic benchmark involves doing something quite out of the ordinary, it feels suspicious.
I generally agree that a "real" benchmark here would be nice. It would be interesting if someone could come up with the "minimum viable non-trivial business logic" that people could use for these benchmarks (perhaps coupled with automation tooling to run the benchmarks)
You could, and the tasks would run concurrently. Node is single threaded so unless you used one of the I/O calls backed by a thread pool, they would all execute sequentially .
But if I have 1 million tasks which spent 10% of their time on CPU-bound codes, intermixed with other IO-bound codes, and I just want throughput and I'm too lazy to use a proper task queue, then why not?
yeah, right? I mean I don't have a dog in this race, just wished we could get into "normal" repros without having to wonder if some magic is kicking in
The fundamental problem is there are two kind of sleep function. One that actually sleeps and other that is a actually a timer that just calls certain callback after a desired interval. Promise is just a syntactic sugar on top of second type. Go certainly could call another function after desired interval using `Timer`.
I think better comparison would be wasting CPU for 10 seconds instead of sleep.
No professional Go programmer would spawn 1M goroutines unless they're sure they have the memory for it (and even then, only if benchmarks indicate it, which is unlikely). Goroutines have a static stack overhead between 2KiB to 8KiB depending on the platform. You'd use a work stealing approach with a reasonable number of goroutines instead. How many are reasonable needs to be tested because it depends on how long each Goroutine spends waiting for I/O or sleeping.
But I can go further than that: No professional programmer should run 1M concurrent tasks on an ordinary CPU no matter which language because it makes no sense if the CPU has several orders of magnitudes less cores. The tasks are not going to run in parallel anyway.
The basis for running 1 million concurrent tasks is to support 1 million active concurrent user connections. They don't need to run in parallel if async is used. As shown, Rust and C# do well. How would you support it in Go?
The servers I use have limits far below 1M active connections, realistically speaking about 60k simultaneously active connections. So I can't really answer that question. However, it's easy to find answers to that question online [1]. Go is not forcing you to spawn Goroutines when you don't really need them. As I said, the correct way in Go is to use worker pools, the size of which depends on measurable performance because it is connected to how much i/o each Goroutine performs and how long it waits on average.
In what world is the cost of 2.5 GB of RAM per 1 million connections an issue? Are you telling me there's a service in this world handling, for example, 100 million active connections, and they can't afford 250 GB of RAM? It's not the 90's anymore.
And we're talking about a naive microbenchmark. If you were actually building a service like that in Go (millions of active connections) and you were very concerned about memory usage, you wouldn't be naive enough to use a goroutine for every connection. Instead, you would use something like gnet or another solution based directly on epoll events, combined with a worker pool.
Go is a friendly language and it is well liked, but it is inappropriate when the absolute best performance saves more money than the additional developer salaries needed to write it in a higher performance language. An example is for deep learning or other big numerical work, where you'd be wasting expensive hardware resources if using Go. Perhaps the one million concurrent users will however be fine with Go.
> Go heavily encourages a certain kind of programming;
True, but it really doesn't encourage you to run 1m goroutines with the standard memory setting. Though it's probably fair to run Go wastefully when you're comparing it to Promise.All.
Of course! That's why the article is telling you that some languages (C#, Rust) are better at it than others (Go, Java). Doesn't mean that Go and Java are bad languages! Just that they aren't good to do this thing.
The article is telling us that you can run really inefficient code. Goroutines should be run with worker pools and a buffered channel and it's silly to not do that and then compare it to things like an optimized Rust crate like Tokio.
Well... I'm actually not sure what ideomatic means (English isn't my first language), but it's the standard way of doing it. You'll even find it as step 2 and 3 here: https://go.dev/tour/concurrency/1
> or the best way you can imagine
I would do a lot much more to tune it if you were in a position where you'd know it would run that many "tasks". I think what many non-Go programmers might run into here is that Go doesn't come with any sort of "magic". Instead it comes with a highly opinionated way of doing things. Compare that to C# which comes with a highly optimized CLR and a bunch really excellent libraries which are continuously optimized by Microsoft and you're going to end up with an article like this. The async libraries are maintaining which tasks are running (though Promise.All is obviously also binding a huge amount of memory you don't have to), while the Go example is running 1 million at once.
You'll also notice that there is no benchmark for execution time. With Go you might actually want to pay with memory, though I'd argue that you'd almost never want to run 1 million Goroutines at once.
Though to be fair to this specific author, it looks like they copied the previous benchmarks and then ran it as-is.
The post was edited, previously it just said roughly this part: "step 2 and 3 here: https://go.dev/tour/concurrency/1". Which - as far as I can tell - does not mention worker pools...
You're right. It is using channels and buffers, but you're right.
It's not part of the actual documentation either, at least not exactly: https://go.dev/doc/effective_go#concurrency You will achieve much the same if you follow it, but my answer should have been yes and no as far as being the "standard" Go way.
Idiomatic is the word the parent was looking for. The base word is idiom.
It was probably the intent of the parent to mean 'making use of the particular features of the language that are not necessarily common to other languages'.
I'm not a programmer, but you appear to give good examples.
I hope I'm not teaching you to suck eggs... {That's an idiom, meaning teaching someone something they're already expert in. Like teaching your Grandma to suck eggs - which weirdly means blowing out the insides of a raw egg. That's done when using the egg to paint; which is a traditional Easter craft.}
I actually did find "idiomatic" when I looked it up, but I honestly still didn't quite grasp it from the cambridge dictionary. Thanks for explaining it in a way I understand.
In programming Idiomatic is used to reference a programming language’s “best practices” and “style guide”. Obviously programming languages can solve problems in many different ways, but they often develop a “correct way” that matches their design or the personality of their influential community members. Following this advice is Ideomatic. Next time your coworker has their style wrong you can say “I don’t think this is Idiomatic” :D
As far as practicality goes I actually agree with you: if I knew I were trying to do something to the order of 1,000,000 tasks in Go I would probably use a worker pool for this exact reason. I have done this pattern in Go. It is certainly not unidiomatic.
However, it also isn't the obvious way to do 1,000,000 things concurrently in Go. The obvious way to do 1,000,000 things concurrently in Go is to do a for loop and launch a Goroutine for each thing. It is the native unit of task. It is very tightly tied to how I/O works in Go.
If you are trying to do something like a web server, then the calculus changes a lot. In Go, due to the way I/O works, you really can't do much but have a goroutine or two per connection. However, on the other hand, the overhead that goroutines imply starts to look a lot smaller once you put real workloads on each of the millions of tasks.
This benchmark really does tell you something about the performance and overhead of the Go programming language, but it won't necessarily translate to production workloads the way that it seems like it will. In real workloads where the tasks themselves are usually a lot heavier than the constant cost per task, I actually suspect other issues with Go are likely to crop up first (especially in performance critical contexts, latency.) So realistically, it would probably be a bad idea to extrapolate from a benchmark this synthetic to try to determine anything about real world workloads.
Ultimately though, for whatever purpose a synthetic benchmark like this does serve, I think they did the correct thing. I guess I just wonder exactly what the point of it is. Like, the optimized Rust example uses around 0.12 KiB per task. That's extremely cool, but where in the real world are you going to find tasks where the actual state doesn't completely eclipse that metric? Meanwhile, Go is using around 2.64 KiB per task. 22x larger than Rust as it may be, it's still not very much. I think for most real world cases, you would struggle to find too many tasks where the working set per task is actually that small. Of course, if you do, then I'd reckon optimized async Rust will be a true barn-burner at the task, and a lot of those cases where every byte and millisecond counts, Go does often lose. There are many examples.[1]
In many cases Go is far from optimal: Channels, goroutines, the regex engine, various codec implementations in the standard library, etc. are all far from the most optimal implementation you could imagine. However, I feel like they usually do a good job making the performance very sufficient for a wide range of real world tasks. They have made some tradeoffs that a lot of us find very practical and sensible and it makes Go feel like a language you can usually depend on. I think this is especially true in a world where it was already fine when you can run huge websites on Python + Django and other stacks that are relatively much less efficient in memory and CPU usage than Go.
I'll tell you what this benchmark tells me really though: C# is seriously impressive.
I agree with everything you said and I think you contributed a lot to what I said making things much more clear.
> I'll tell you what this benchmark tells me really though: C# is seriously impressive.
The C# team has done some really great work in recent years. I personally hate working with it and it's "magic", but it's certainly in a very good place as far as trusting the CLR to "just work".
Hilariously I also found the Python benchmark to be rather impressive. I was expecting much worse. Not knowing Python well enough, however, makes it hard to really "trust" the benchmark. A talented Python team might be capable of reducing memory usage as much as following every step of the Go concurrency tour would for Go.
Userspace scheduling of Goroutines, virtual stack and non-deterministic pointer type allocation in Go are as much magic if not more, the syntactic sugar of C# is there to get the language out of your way and usually comes at no cost :)
If you do not like the aesthetics of C# and find Elixir or OCaml family tolerable - perhaps try F#? If you use task CEs there you end up with roughly the same performance profile and get to access huge ecosystem making it one of the few FP languages that can be used in production with minimal risk.
> Userspace scheduling of Goroutines, virtual stack and non-deterministic pointer type allocation in Go are as much magic if not more, the syntactic sugar of C# is there to get the language out of your way and usually comes at no cost :)
I don't think C# does it at no cost. I think it's "attachment" to Clean Code makes most C# code bases horrible messes after a while. I know this is a preference thing and that many people will disagree, but I've seen C# code bases that were so complicated to work with that they were actively hindering the development teams ability to meet the business needs. You don't have to write C# that way, but that's what happens in almost every company where I live.
> If you do not like the aesthetics of C# and find Elixir or OCaml family tolerable - perhaps try F#? If you use task CEs there you end up with roughly the same performance profile and get to access huge ecosystem making it one of the few FP languages that can be used in production with minimal risk.
I mean, I don't think I'll ever have to work within the dotnet ecosystem. The way things are going in the green energy and finance sector which is where my career have taken me I'll mostly get to work with Python (with C/Zig) or Go and possibly Java. C# and dotnet is almost exclusively used at stagnant small-medium sized companies and in the consultance business servicing these companies. This is not because of C# or dotnet but more because of the developer landscape. Java is big in "older" organisations because it's what was taught in universities and because it was always good, Go is replacing C#/Java in a lot of newer companies because there are a lot of success stories around it and a lot of the Java developers are retiring. Python is growing really big because a lot of non-swe engineers and accountant types are using it as well as how it's used in ML/AI/Datawarehouse. PHP is big in the web-shop industry and so on. C# manly made it's way into business at places which ran a lot of windows servers. Since organisations rarely change tech stacks in the more "boring" parts of the world, it's not likely to change much.
I don't think dotnet or C# are bad. I write some powershell for azure automation to help IT operations from time to time, but I really don't like working with C# (or Java). I would personally like to work with Rust or more Zig at some point, but it's not like anyone is adopting Rust around here and while Zig can be used for some things in place of C it's not really "production ready" for most things.
As far as I know there is no way to do Promise like async in go, you HAVE to create a goroutine for each concurrent async task. If this is really the case then I believe the submition is valid.
But I do think that spawning a goroutine just to do a non-blocking task and get its return is kinda wasteful.
You could in theory create your own event loop and then get the exact same behaviour as Promises in Go, but you probably shouldn't. Goroutines are the way to do this in Go, and it wouldn't be useful to benchmark code that would never be written in real life.
I guess what you can do in golang that would be very similar to the rust impl would be this (and could be helpful even in real life, if all you need is a whole lot of timers):
func test2(count int) {
timers := make([]*time.Timer,count)
for idx, _ := range timers {
timers[idx] = time.NewTimer(10 * time.Second)
}
for idx, _ := range timers {
<-timers[idx].C
}
}
This yields to 263552 Maximum resident set size (kbytes) according to /usr/bin/time -v
I'm not sure if I missed it, but I don't see the benchmark specify how the memory was measured, so I assumed the time -v.
The requirement is to run 1 million concurrent tasks.
Of course each language will have a different way of achieving this task each of which will have their unique pros/cons. That's why we have these different languages to begin with.
The accounting here is weird though; Go isn’t using that RAM, it’s expecting the application to. The reason that doesn’t happen is because it’s a micro benchmark that produces no useful work..
The way the results are presented a reader may think the Go memory usage sounds equivalent to the others - boilerplate, ticket-to-play - and then the Go usage sounds super high.
But they are not the same; that memory is in anticipation of a real world program using it
Isn’t that kind of dumb when none of the other languages do this? Apparently allocating memory is really fast? Maybe we should change the test to load 1MB of data in every task?
Most of those languages (excepting Java virtual threads) uses stackless coroutines. Go uses stackful coroutines which allocates some memory upfront for a goroutine to use
Then it is fair to compare the memory usage of a stackful coroutine to a stack less one as they are the idiomatic way to perform async task on each language.
I mean this is subjective, but as long as it’s clear that one number is “this is the memory the runtime itself consumes to solve this problem” and the other number is “this is the runtime memory use and it includes pre-allocated stack space that a real application would then use”, sure
Point being: Someone reading this to choose which runtime will fit their use case needs to be carefully to not assume the numbers measure the same thing. For some real world use cases the pre allocated stack will perform better than the runtimes that instead will do heap allocations.
Of course, as any microbenchmark, the bare results are useless. The numbers can be interesting only if you take the time to understand the implications.
Maybe. But in that case you will need to do something for each of those users, and which languages are good at that might look quite different from this benchmark.
Also, for Java, Virtual Threads are a very new feature (Java 21 IIRC or somewhere around there). OS threads have been around for decades. As a heavy JVM user it would have been nice to actually see those both broken out to compare as well!
Interesting. uname -a reports x86_64, and lscpu also reports x86_64, although perhaps that's just the kernel being patched to lie about the architecture.
Chrome and Firefox on iOS are just skinned safari with their own sync and features. Apple does not allow third party browser engines and you absolutely have to use the one built into the system (webkit).
Yep you need to specify the maximum memory amount up-front. Its defined as "webassembly memory pages". Each page is 64kb. You need to specify an initial and a maximum amount. The webassembly module can call memory.grow() to grow it by a page until it reaches the maximum. Though you can't "un-grow" or decrease the amount of allocated memory.
For example, for node, the author puts a million promises into the runtime event loop and uses `Promise.all` to wait for them all.
This is very different from, say, the Go version where the author creates a million goroutines and puts `waitgroup.Done` as a defer call.
While this might be the idiomatic way of concurrency in the respective languages, it does not account for how goroutines are fundamentally different from promises, and how the runtime does things differently. For JS, there's a single event loop. Counting the JS execution threads, the event loop thread and whatever else the runtime uses for async I/O, the execution model is fundamentally different from Go. Go (if not using `GOMAXPROCS`) spawns an OS thread for every physical thread that your machine has, and then uses a userspace scheduler to distribute goroutines to those threads. It may spawn more OS threads to account for OS threads sleeping on syscalls. Although I don't think the runtime will spawn extra threads in this case.
It also depends on what the "concurrent tasks" (I know, concurrency != parallelism) are. Tasks such as reading a file or doing a network call are better done with something like promises, but CPU-bound tasks are better done with goroutines or Node worker_threads. It would be interesting to see how the memory usage changes when doing async I/O vs CPU-bound tasks concurrently in different languages.