TFA is a bit too light on details. Boxes due, it's a fact of life. I don't really follow what "didn't come back online" is supposed to mean. Nodes aren't 100% durable. A lot depends on your particular configuration.
In any case, there's no world where all VM failures trigger automatic reboots. Expecting that to be the case just makes no sense. Automatically failing over should be handled at another layer, for which there a many possibilities.
Manually restoring nodes sounds like a "pets, not cattle" problem.
Long ago, we used to run into this on AWS all the time before we started automatically aging them out.
In any case, there's no world where all VM failures trigger automatic reboots. Expecting that to be the case just makes no sense. Automatically failing over should be handled at another layer, for which there a many possibilities.
Manually restoring nodes sounds like a "pets, not cattle" problem.
Long ago, we used to run into this on AWS all the time before we started automatically aging them out.