I'd be surprised if any of this mattered for them, since their workload (at least the "copy movie files from disk to network" part of it) is embarrassingly parallel.
Unless they're really squeezed on power or rack space budget, I would imagine they'd do just fine being a generation back from the bleeding edge.
They are doing 400Gbps per box already [1], they have 800Gbps Network card waiting to be tested, with PCI-E 6.0 they could do 1.6Tbps, assuming they somehow figure out how to overcome CPU and Memory bottleneck. Sapphire Rapids with HBM3 would give 2TB/s :)
>Unless they're really squeezed on power or rack space budget
I think this is the case for their open connect appliances (or whatever they call them). They want to try to maximize throughput on a single device so they don't have to colocate so much equipment
The headline number is network I/O, but IIRC, the bottleneck is really in/around system RAM. Since most of the content isn't in cache and neither the disks nor the network cards have sufficient buffering, you need to have the disk write to system RAM and indicate readiness, then the network card reads from system RAM. NUMA bandwidth and latency can be an issue there too.
But, I don't think there was room on the pci-e lanes to do something like put a gpu in to use as a DMA buffer to get more ram bandwidth.
I think the next Epyc generation should have PCIe 5 and DDR5, both of which should help.
If an nvme drive supports the controller memory buffer (CMB) feature, an RNIC can do a peer to peer transfer.
From what I recall of the netflix storage node that was linked from HN a few months back, the current generation has 4 x 100 Gb mellanox ethernet ports (CX6, PCIe Gen 4) and somewhere around 20 to 30 PCIe gen 3 NVMe drives.
Assuming they can figure out how to do peer to peer transfers, scaling up by a factor of 4 doesn't seem implausible.
There are a lot of disks in a Netflix content appliance. The latest slide says 18 drives, each with PCIe 3.0 x4. There are 4 PCIe 4.0 x 16 nics, and there's all your PCIe lanes. You could get PCIe 4 drives, but it's not the bottleneck at the moment.
But, RAM needs at least twice the bandwidth as your network, because you can't have the NIC read from the disk directly, you need to have the disk DMA to ram, and the NIC DMA from ram, and (normal system) ram isn't dual ported, so reads and write contend. If you need to do TLS in software, you touch ram 4 times (disk read, cpu read, cpu write, nic read), so ram bandwidth is an even bigger bottleneck.
I thought with things like direct-storage (the equivalent of gpudirect) and ssdk this wasn't the case any more (no more cpu intervention, every device has their own programmable dma engines ?)