Hacker News new | past | comments | ask | show | jobs | submit login

I forgot they said that (I read this when it was posted a couple days ago), thanks for noting that.

But I still would encourage even a 1s “clean” shutdown. You don’t need to wait for any of this fancy cleanup, but it’s really nice to finish your writes.

Fun story: for Preemptible VMs we started (in Alpha / EAP) with no soft shutdown just to see whether people could handle it (so just immediate power off). Turns out, that if your box is running apt-get upgrade at the time or anything like that, you easily corrupt your boot disk. So, we struggled between “a few seconds” (5, 15) and the “about 30, which is how long a GCE instance takes to boot”. That’s how we ended up with 30: we wouldn’t more than double regular instance creation times at the tail. Nowadays we boot to ssh in 15 seconds!

If you don’t reuse your state, none of this matters. But I’d guess that even just getting to the point of RST’ing the the connections is valuable (so that the clients know to take action, rather than wait a while).




We had shutdown hooks on our preemptive VMs, but we often had cases (at least weekly), where it looked like they failed to run (failing to unregister from cluster). Any explanation?


Do you mean on GKE or directly on GCE? It sounds like you mean GKE (“failed to unregister from cluster”).

We’re looking to fix up the GKE graceful node shutdown, because it’s currently “racy” and doesn’t actually respect the grace period properly (system pods / processes can be shutdown before waiting for user pods, causing you to lose logging or say the kubelet).


Yes, GKE containers, sorry about confusion... sometimes it looks like a node has disappeared without much shutdown work.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: