Is a single CPU core able to process 4k/50fps video stream? Or is there no need for any processing, other than encapsulating it into data packets for sending to the network card?
Yes, assuming 4:2:0 that's just under 5 cycles per byte at 3 GHz, which is enough for simple processing, or 2.5 cycles per byte for 4:4:4. But they're using the other cores for processing and just one for network handling.
no. he says he can only spare one core for network. so he used network cards for a 1 to 1 physical cable. something that should have been done with a dedicated pcie card much easier.
other than that they are just bypassing tcp, arp, ethernet, etc. basically one pc has a kennel that says "every one in this file descriptor turns the voltage up on this cable" and the other pc has a driver that "every voltage up on this cable writes a one on this file" then they add some rudimentary sync logic for the timings. maybe just a known initial handshake that both expect and know... like a modem have shake.
i wonder if there is already a well known project/Linux kernel driver for this dumbed down network-as-fast-interface around or if they are writing it. the article is really lame on any detail
They already have a dedicated PCIe card -- it's the standard 10GigE NIC they're using.
What they're doing is bypassing the kernel overhead of header parsing, demultiplexing and copying to user space. Instead, the network card's ring buffers are mapped directly into the user processes' address space.
The application is almost certainly still talking TCP/IP (or maybe UDP), and there are no changes at the physical layer at all. It isn't a case of a file descriptor being hooked up to generate voltages on a cable at all -- in fact, the overhead of read()/write() calls on a file descriptor is one of the things they're trying specifically to avoid!
http://dpdk.org/ is an open-source library to implement this kind of thing, mostly aimed at Intel NICs.
From the description, they use "proper" packets. Probably not TCP, but UDP or their own protocol would still enable them to use a lot of existing tech instead of special purpose devices. Which I guess is the reason why they don't use "dedicated PCIe cards" with special cabling (if I understand correctly what you mean).
What do you mean? There's still a network card at both ends of a connection, which handles physical layer tasks like voltages or timings. A kernel can't perform those tasks.
Clearly not, which is exactly why kernel bypass is such a huge red flag: it's only possible (rather, useful) if you aren't doing anything sensible with the data anyway, or the kernel overhead would be tiny compared to the processing.
Use the right tool for the job and don't funnel network data through your instruction pipeline. When they realized that for memory, they called it "DMA", and when graphics was scaling up, we created the GPU.
The networks used by supercomputers have have kernel bypass with DMA for more than a decade... and they do a lot of processing on the data, too. Check out Infiniband and Intel's Omni-Path for modern examples. Or the Cray T3E (1995), which had an excellent network with a user-level DMA engine that only did 8- or 64-byte transfers.
Are you kidding me? How could not bypassing the kernel POSSIBLY be better here? It's infinitely harder for kernel devs to optimize these scenarios than for userland devs.