Fintech has mostly determined that 1 thread can get the job done. See LMAX disruptor and related ideas.
What problems exist that generate events or commands faster than 500 million per second? This is potentially the upper bar for 1 thread if you are clever enough.
Latency is the real thing you want to get away from. Adding more than one CPU into the mix screws up the hottest possible path by ~2 orders of magnitude. God forbid you have to wait on the GPU or network. If you have to talk to those targets, it had better be worth the trip.
> What problems exist that generate events or commands faster than 500 million per second?
AAA games, Google search, Weather simulation, etc? I mean it depends on what level of granularity you’re talking about, but many problems have a great deal going on under the hood and need to be multi threaded.
I would add a qualifier of "serializable" to those events or commands. This is the crux of why fintech goes this path. Every order affects subsequent orders and you must deal with everything in the exact sequence received.
The cases you noted are great examples of things that do justify going across the PCIe bus or to another datacenter.
That’s half of it, the other half is each of those events is extremely simple so the amount of computation is viable with a single thread.
If individual threads were dramatically slower the architecture would get unpleasant by necessity. Consider the abomination that is out of order execution on a modern CPU.
Sorry, I should have been more specific. I meant DAWs and computer music environments, not simple audio players. Modern DAWs are heavily multi-threaded.
I am curious on that. 500 million events per second sounds high. Even for games. That many calculations? Sure. I take "events" to mean user generated, though. And that sounds high.
Same for searches. Difficulty there is size of search space, not searches coming in. Right?
Google search isn't a good example. AAA games are a great example when you think about graphics. However, most of that is trivially parallelizable, thus "all you need to do" is assign vertices/pixels to different threads (in quotation marks as that's of course not trivial by itself, but a different kind of engineering problem).
However, once you get into simulations you have billions (or multiple orders of magnitude more) elements interacting with each other. When you simulate a wave every element depends on it's neighbors and the finer the granularity the more accurate your simulation (in theory at least).
Thinking of graphics, though, i would assume most of that is in the GPU side. Simulations do make sense, but I see games like Factorio still focused on single thread first. And then look for natural parallel segments.
That is all to say that millions of events still feels like a lot. I am not shocked to know it can and does happen.
There are no good solutions for something like factorio. There are solutions that work but they aren't worth the trouble. My personal recommendation is that you split the world into independent chunks. A big interconnected factorio map is a nightmare scenario because there is hardly anywhere where you can neatly split things up. Just one conveyor belt and you lose. Aka parallelize disconnected subgraphs.
So the game would have to be programmed so that conveyor belts and train tracks can be placed at region boundaries and that there is a hidden buffer to teleport things between regions. Now you need an algorithm to divide your graph to both minimize the imbalance between the number of nodes in the subgraph but also to minimize the edges between subgraphs.
Just dreaming over here, but if someone had the opportunity to rebuild a Factorio from the ground up, I bet they could design something massively scalable. Something based in cell automata, like how water flow works in Minecraft. Current cell state = f(prev cell state, neighboring cell states).
It would take some careful work to ensure that items didn't get duplicated or lost at junctions, and a back-pressure system for conveyor belt queues. Electrical signals would be transmitted at some speed limit.
This is a wrong view of the problem. Often times your application has to be distributed for reasons other than speed: there are only so many PCIe devices you can connect to a single CPU, there are only so many CPU sockets you can put on a single PCB and so on.
In large systems, parallel / concurrent applications are the baseline. If you have to replicate your data as its being generated into geographically separate location there's no way you can do it in a single thread...
As far as I know the LMAX disrupter is a kind of queue/buffer to send data from one thread/task to another.
Typically, some of the tasks run on different cores. The LMAX disruptor is designed such that there is no huge delay due to cache coherency. It is slow to sync the cache of one core to the cache of another core when both cores write to the same address in RAM. The LMAX disruptor is designed that each memory location is (mostly) written to by at most thread/core.
How is the LMAX disrupter relevant for programs with 1 core?
> How is the LMAX disrupter relevant for programs with 1 core?
It is not relevant outside the problem area of needing to communicate between threads. The #1 case I use it for is MPSC where I have something like an AspNetCore/TCP frontend and a custom database / event processor / rules engine that it needs to talk to.
What problems exist that generate events or commands faster than 500 million per second? This is potentially the upper bar for 1 thread if you are clever enough.
Latency is the real thing you want to get away from. Adding more than one CPU into the mix screws up the hottest possible path by ~2 orders of magnitude. God forbid you have to wait on the GPU or network. If you have to talk to those targets, it had better be worth the trip.