Persistent problems between Azure VMs and virtual disks causing unexpected reboo...

jiggawatts · on Jan 25, 2023

Were you using Standard HDD disks? They have a really poor SLA, and are only usable for things like stateless VM Scale Sets or otherwise redundant services.

We had to switch everything to SSD to get reliability comparable to on-prem VMware.

wrldos · on Jan 25, 2023

No entirely SSD. The problems stopped after a couple of weeks suddenly.

pwarner · on Jan 25, 2023

That sounds like what I've seen on Azure. Mystery weird problems we see, but they don't. Often in the network side. One time we were pretty sure they had a bad interface in a LAG group. Massive packet loss between hosts, but only on certain ephemeral source ports, about 1/8 of them.... Support couldn't find any issues even after a few days.

This was circa 2018 but AWS was so much more stable at that time. Ok, US-E-1 AWS had issues from time to time but they acked them and fixed them

wrldos · on Jan 25, 2023

Yes the lack of them being able to see any problems was a constant problem.

Our AWS reps are all over stuff when it goes down. I regularly get to talk to actual real product managers and engineers via our enterprise support if anything goes wrong.