Hacker News new | past | comments | ask | show | jobs | submit login

Kernels and hardware have been optimized for TCP. QUIC will catch up eventually.



AFAIK, kTLS and hardware TLS offload don't solve the latency problem anyways. Those only handle the AEAD and record encapsulation/decapsulation in the critical path, where maximizing throughput is the concern. Control messages are not handled, so session establishment with client hello, cipher exchange, key exchange etc. is still done entirely in userspace, and the handshaking process is where the latency issues arise.


Oh yeah, QUIC is an improvement in terms of latency and even throughput over TCP/TLS. The claim that "QUIC is horribly inefficient" centers around CPU utilization and the cost of delivering each byte. That's where hardware offload shines but it doesn't exist for QUIC yet.


The only reason QUIC delivers throughput improvements over TCP is because it uses the same congestion control as BBR, so occasional packet loss doesn't kneecap the connection. If you use BBR TCP (or RACK TCP) on FreeBSD, you'll see the same improvement in throughput vs older TCP congestion control.

The same server that will do 375Gb/s at close to 50% idle will maybe deliver 70-90Gb/s of QUIC with the CPU maxed.

TLS+TCP delivers that performance with a lot of optimizations that just don't exist for QUIC:

- inline TLS offload (Mellanox CX6DX)

- async sendfile -- note, the above 2 mean that the kernel never even maps data from a file being sent to a client into memory, much less copies it to/from userspace like with QUIC. This cuts memory bandwidth to roughly 1/4 of what it would be with a traditional read/encrypt/write to socket server.

- TCP segmentation offload

- TCP & IP checksum offload

- TCP large receive offload

Some of these optimizations are slowly becoming available, but until all are present, QUIC will cost 2x - 4x as much to serve as TCP for a CDN workload.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: