I don’t know why you’re getting downvotes. What you’re saying sounds true to me,...

I don’t know why you’re getting downvotes. What you’re saying sounds true to me, and I work in the core of EC2.

I am guessing you’re using newer instance types if their reliability is still questionable. Or you have a huge fleet of instances so you see a steady rate of failures every year.

Our failure rate on the commonly used instance types if fairly low. We have several types of failures and in some bad failure cases, live migration isn’t possible and your instance won’t even be restarted.

AWS already asks people to expect failures and plan around this with multi AZ deployments.

If you want stability, sign an NDA with AWS and ask for fleet wide reliability metrics for various instance types. There’s a surprisingly huge variance.