Number of connections, message size, and frequency of messages sent are three main parameters to measure performance, and for some frameworks like Engine.IO, number of connections seems to have the biggest impact on performance (https://medium.com/node-js-javascript/b63bfca0539). It would be good to see the benchmarks with much higher number of connections, as non-blocking IO is usually why people choose Node.js platform.
Is 100K mps on 8 cores considered high for node/websockets microbenchmarking of the socket path?
That doesn't seem like much from past experience writing high-throughput messaging code, and all this is doing is spitting out length-framed messages to a socket.
We do not (yet) measure the performance of Websocket in our project (the TechEmpower framework benchmarks), but our "Plaintext" test is a rough analogue to a ping-pong test on Websocket. Our Plaintext test uses HTTP pipelining on a keep-alive connection. However, in our case, each request is sending a couple hundred bytes of HTTP request headers and receiving about the same in response HTTP headers prior to a "Hello world" payload.
We see approximately 600,000 of these HTTP "messages" per second on a i7-2600K 8 HT core workstation [1] from top performers such as Netty and Undertow and these top performers are network limited by our gigabit Ethernet.
We are using Undertow for a Websocket project presently and its performance there has been very good.
That's what I was thinking, too. That's merely around 13,000 messages per second per core. Rates like that weren't all that impressive on low end server hardware a decade ago, so I'd hope that it's even less impressive today when using more modern hardware (or even VMs).
That's raw network (TCP/UDP) packets, not WebSocket frames/messages. Plus that product is a solution to a hardware problem, not a software problem. It doesn't relate at all.
I'd be interested in seeing how difficult it would be to modify this to use multiple communication servers (redis, cassandra, etc) so that it can be scaled across multiple instances.
Would also be interested in seeing how many connections it could handle doing say 5-10 messages per second.
Why does each worker need a seperate store process? It seems on an 8 core machine max worker count can only be 3 (1 master, 3 workers, 3 stores). If workers had in-memory stores -or at least connect to a Redis server-, with 4 more workers performance should increase.
They don't. You can have fewer stores than workers. In the benchmark, we could in fact do with very few stores because they are not really used. I'm sure you could fiddle with the worker, load balancer and store count to get better performance (it depends on the system's requirements).
You can't use more than 1 core with a unique node process. So they spawn n cores for n processes. With a multithreaded runtime, this would not be required.
Yes exactly. My suggestion was the application to fork (CPUs - 1) workers, ie. 1 master 7 workers, instead of 1 master 3 workers and 3 stores, and have workers manage their key-value stores. Apparently each worker don't need a store, see author's comment (https://news.ycombinator.com/item?id=7713561) so it looks good.
"The test was only set to reach up to 100 concurrent connections (each sending 1000 messages per second) - Total of 100K messages per second."
So they had only 100 concurrent connections.