I’ve been thinking/slowly building a service that hosts Kubernetes control plane...

solatic · 2024-08-25T07:02:28 1724569348

Kubernetes is really built on the assumption that workers are in the same LAN as the control plane. Long latency between the control plane and workers affects workload reliability; heartbeats need to be configured with longer timeouts, for example. Pod-to-pod communication, where pods run in regions on opposite sides of the globe, supposedly on the same pod network CIDR, is also going to be flaky. There's a long history of projects attempting to take LAN-local designs and make them resistant to regional failure by superimposing a LAN on top of a WAN and it never works as intended. Furthermore, various service meshes already present ways of helping to direct/shape traffic between clusters (i.e. between regions), when service architecture evolves to truly support multiple regions.

You run a risk of building something that seems to "work" and falls apart for non-obvious, head-scratching reasons for many or most users.

icy · 2024-08-25T08:48:11 1724575691

> Kubernetes is really built on the assumption that workers are in the same LAN as the control plane. Long latency between the control plane and workers affects workload reliability; heartbeats need to be configured with longer timeouts, for example.

Yep, and that's why I'm designing this to provision control planes as close to the user's workloads as possible (sub ~100ms latency, which is plenty acceptable for reliability; the likes of Scaleway do similar).

> Pod-to-pod communication, where pods run in regions on opposite sides of the globe, supposedly on the same pod network CIDR, is also going to be flaky

Certainly, but this is not something that I as control plane provider care about -- it's a design decision the user has to account for, and I consider this freedom of choice a nice one to have.

I suppose the main sell here (and the few folks I've spoken to seem to agree) is the flexibility a decoupled control plane offers. Being able to migrate your bare metal setup to "managed" K8s; running a homelab cluster without dealing with the ops (upgrades, cert rotations, etc.); or simply being able to use VMs from cheaper cloud providers.

But yeah, it's definitely a hard problem but solvable within the right constraints.