Netflix, I imagine, would love to have this kind of I/O bandwidth.

mikepurvis · on Jan 11, 2022

I'd be surprised if any of this mattered for them, since their workload (at least the "copy movie files from disk to network" part of it) is embarrassingly parallel.

Unless they're really squeezed on power or rack space budget, I would imagine they'd do just fine being a generation back from the bleeding edge.

extropy · on Jan 11, 2022

Yeah, using significantly cheaper but just somewhat slower hardware works great if you can parallelize.

Also cutting edge is usually very power hungry and power/cooling costs are majority of your expenses at data center scale.

ksec · on Jan 12, 2022

They are doing 400Gbps per box already [1], they have 800Gbps Network card waiting to be tested, with PCI-E 6.0 they could do 1.6Tbps, assuming they somehow figure out how to overcome CPU and Memory bottleneck. Sapphire Rapids with HBM3 would give 2TB/s :)

[1] https://news.ycombinator.com/item?id=28584738

loeg · on Jan 11, 2022

They are very squeezed on rack space budget in at least some locations.

nijave · on Jan 11, 2022

>Unless they're really squeezed on power or rack space budget

I think this is the case for their open connect appliances (or whatever they call them). They want to try to maximize throughput on a single device so they don't have to colocate so much equipment

SahAssar · on Jan 11, 2022

Isn't netflix pretty much capped at network I/O, not disk I/O? All the posts I've read about them have been focused on network.

toast0 · on Jan 11, 2022

The headline number is network I/O, but IIRC, the bottleneck is really in/around system RAM. Since most of the content isn't in cache and neither the disks nor the network cards have sufficient buffering, you need to have the disk write to system RAM and indicate readiness, then the network card reads from system RAM. NUMA bandwidth and latency can be an issue there too.

But, I don't think there was room on the pci-e lanes to do something like put a gpu in to use as a DMA buffer to get more ram bandwidth.

I think the next Epyc generation should have PCIe 5 and DDR5, both of which should help.

mgerdts · on Jan 12, 2022

If an nvme drive supports the controller memory buffer (CMB) feature, an RNIC can do a peer to peer transfer.

From what I recall of the netflix storage node that was linked from HN a few months back, the current generation has 4 x 100 Gb mellanox ethernet ports (CX6, PCIe Gen 4) and somewhere around 20 to 30 PCIe gen 3 NVMe drives.

Assuming they can figure out how to do peer to peer transfers, scaling up by a factor of 4 doesn't seem implausible.

martinald · on Jan 12, 2022

Surely the disk would be more of a problem than RAM?

toast0 · on Jan 12, 2022

There are a lot of disks in a Netflix content appliance. The latest slide says 18 drives, each with PCIe 3.0 x4. There are 4 PCIe 4.0 x 16 nics, and there's all your PCIe lanes. You could get PCIe 4 drives, but it's not the bottleneck at the moment.

But, RAM needs at least twice the bandwidth as your network, because you can't have the NIC read from the disk directly, you need to have the disk DMA to ram, and the NIC DMA from ram, and (normal system) ram isn't dual ported, so reads and write contend. If you need to do TLS in software, you touch ram 4 times (disk read, cpu read, cpu write, nic read), so ram bandwidth is an even bigger bottleneck.

touisteur · on Jan 12, 2022

I thought with things like direct-storage (the equivalent of gpudirect) and ssdk this wasn't the case any more (no more cpu intervention, every device has their own programmable dma engines ?)

willcipriano · on Jan 11, 2022

Isn't this just pure I/O? You could have a PCIe 6.0 raid controller or network card.