If it's happening so rarely that killing is a viable solution, then there's no r...

crabbone · 2024-03-24T13:59:20 1711288760

Here's a real-life example. We have a KVM server that has its storage on Ceph. It looks like KVM doesn't work well with Ceph, esp. when MD is involved, so, if a VM is powered off instead of an orderly shutdown, something bad is happening to MD metadata, and when the VM is turned on again, one MD replica can be missing. This happens infrequently, and I've never been in a situation when two replicas died at the same time (which would prevent a VM from booting), but it's obviously possible.

So... more generally, your idea with replacing VMs is rather naive when it comes to storage. Replacement incurs penalties, s.a. eg. RAID rebuilds. RAIDs don't have the promised resiliency during rebuild. And, in general, rebuilds are costly because they move a lot of data / wear the hardware by a lot. Worst yet, if you experience the same problem that caused you to start a rebuild in the first place during the rebuild, the whole system is a write-off.

In other words, it's a bad idea to fix problems without diagnosing them first if you want your system to be reliable. In extreme cases, this may start a domino effect, where the replacement will compound the problem, and, if running on rented hardware, may also be very financially damaging: there were stories about systems not coping with load-balancing and spawning more and more servers to try and mitigate the problem, where problem was, eg. a configuration that was copied to the newly spawned servers.

whirlwin · 2024-03-24T05:49:30 1711259370

That might work in some scenarios. If you're a "newer" company where each application is deployed onto individual nodes, you can do this.

But consider that the case for older companies, where it was more common to deploy several systems, often complex ones, onto the same node. You will also cause outages to system x, y and z too. Maybe some of them are inter-dependent? You have to outwhey the consequences and risks carefully in any situation before rebooting.

bschne · 2024-03-24T06:53:48 1711263228

> Cloud definitely has downsides, and isn’t a fit for all scenarios but in my experience it’s great for situations like this.

At least as I read it, this contains the assumption that that‘s not how you deploy your applications

FridgeSeal · 2024-03-24T21:24:10 1711315450

> it was more common to deploy several systems, often complex ones, onto the same node.

Yeah we do this? It doesn’t pose an issue though. Cordon the node (stop any new deployment going on), drain it to remove all current workloads (these either have replicas, or can be moved to another node, if we don’t have a suitable node, K8s spins up one automatically) and then remove the node. Most workloads either have replicas spare, or in the case of “singleton” workloads, have configs ensuring the cluster must always have 1 replica available, so it’s waits for the new one to come up before killing the old. Most machines deploy and join the cluster in a couple of minutes, and most of our containers take only like, 1 or 2 seconds to deploy and start serving on a machine, so rolling a node is a really low impact process.