Hacker News new | past | comments | ask | show | jobs | submit login

Google can't necessarily upstream everything because of social problems in the kernel process. For example their datacenter TCP improvements have never been accepted by the gatekeeper of the net subsystem, which was a significant motivation to develop QUIC.



I'm not sure where you have heard about this? Their DCTCP extensions have never been posted in the first place to a public list as of today. Pretty much all of the core TCP developers for the (upstream) kernel's networking subsystem are employed by Google and doing an excellent job. That said, I would love to see their extensions integrated into the upstream tcp_dctcp module.


Is Facebook running DCTCP in production these days?


They did get BBR into the kernel though, and many moons ago BQL too which was a prerequisite.


Isn't DCTCP generalized by TCP Prague and L4S? Which, if they get the IETF stamp of approval and the potential patent issues around L4S get sorted out, I'd guess would be implemented in the upstream Linux kernel pretty quickly.


Social problems, a.k.a. Linux must work for everyone and not just Google.


Reinventing TCP over UDP is sortof silly, I hope they have a better reason than "they don't want to upstream our changes" lol.


I think the inability to upstream changes into Windows and (ironically) old versions of Android are bigger motivations for using UDP.


Isn't it a pretty good reason? gRPC is terrible in a datacenter context without Google's internal TCP fixes that Linux won't adopt (and which have been advocated for in numerous conference papers since at least 2009). If they are steadfast cavemen what other workaround exists?


Apparently Microsoft is considering gRPC as an future replacement for WCF, so that might change. https://news.ycombinator.com/item?id=21055487


The standard workaround is to send short messages using UDP and long ones using TCP.


What parts of gRPC are fixed by using it over QUIC vs. TCP (presuming intra-DC traffic and equally long-lived flows)?


Latency caused by packet loss. TCP needs microsecond timestamps and the ability to tune RTOmin down to 1ms before it is suitable for use in a datacenter. With the mainline kernel TCP stack you are looking at, at a minimum, a 20ms penalty whenever a packet is dropped.


TCP over UDP seems rather silly to me, but congestion control and segmentation in userland is pretty useful. Especially so, since Google and partners have made an ecosystem where kernel updates on deployed devices don't happen.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: