if latency spikes affect the overall performance, it seems more that CephFS may have a design problem (global FS journal) rather than this being a cloud problem.
However perhaps they shouldn't try to run Ceph in the first place.
Azure has a rather powerful blob storage (e.g. block, pages and append-only blobs) that allows high performance applications. You could use that directly and it will likely be cheaper and work better than Ceph on bare metal.
Like other commenters suggest, in order to take advantage of cloud infrastructure you need to design with those constraints in mind, rather than trying to shoehorn the familiar technologies.
Bare metal can be better and cheaper, etc. but it requires even more skills and experience and a relatively large scale.
Wouldn't that make Gitlab dependent on Azure-specific services? That's quite a risk to take when you're already not sure if you want to stay with MS hosting services.
However perhaps they shouldn't try to run Ceph in the first place. Azure has a rather powerful blob storage (e.g. block, pages and append-only blobs) that allows high performance applications. You could use that directly and it will likely be cheaper and work better than Ceph on bare metal.
Like other commenters suggest, in order to take advantage of cloud infrastructure you need to design with those constraints in mind, rather than trying to shoehorn the familiar technologies.
Bare metal can be better and cheaper, etc. but it requires even more skills and experience and a relatively large scale.