On-premise and being miserable having to wait months to get a new server with poor automation, observability and worse outages?
To another major cloud provider with similar pricing and outages?
Cloud helped mostly with automation and scaling but if your system is that critical, you should consider a good CDN as load balancer and multi-cloud (or at least multi-region) for actual robustness.
AWS and GCP both have ~100% uptime in every region for VMs this month. Meanwhile the majority of Azure regions have had various outages in the same period: https://cloudharmony.com/status-of-compute
Almost certainly due to Azure's broken policy where we have critical change advisory's that block deployments for huge periods of time towards the end of the year because of Black Friday and then holidays. Every team has basically been unable to deploy since the week before Thanksgiving when a surprise CCOA was pushed out by leadership at the behest of a certain big customer... then there was the World Cup and the winter holidays. Nobody could really deploy anything from a week before Thanksgiving until a week after the New Years... almost two months worth of batched changes and every team YOLO button pressing as soon as they could in January.
And now layoffs so everyone is super unmotivated! Excellent stuff going on right now from Microsoft senior leadership.
Where I worked, the internal approval processes and controls over cloud resources are as lengthly as those for on premise hardware. So that may be the case for small companies but I don't think there is much of a difference in those large bureaucracies.
Shows that all these availability zones and regions don't really help if an outage can knock out a whole cloud provider. And that's not specific to Microsoft. The only way to really ensure uptime is to use two providers. Sadly, that's basically only possible with on-prem/colocation where traffic is cheap.
It's mostly Azure though that is badly designed to such an extent that multiple times there have been global outages. In general Azure availability, security (the only major cloud provider with not one but multiple cross-tenant security exploits) and usability are pretty terrible so it shouldn't be used for anything but saying "this is how it should not be done".
GCP had a similar thing once, where a BGP update knocked out their Asian regions.
AWS have never had a global outage. (And no, that time S3 in us-east-1 was down wasn't a global outage, the only customer code/workloads that were impacted was code interacting with S3 that didn't specify the region and had to rely on us-east-1 to determine it, and it didn't work anymore)
Do you have a link to an article about that? My google-fu is weak, and this sounds interesting - that should not happen to DNS - at all - and from the outside Route53 looks quite well managed. So what the heck did they do?
Someday someone will write a book about how AD, AAD, etc, exert the control they do at MS and go as unchecked (or at the time) as they do. AD's inability to execute made Azure a significantly less pleasant platform until they finally fixed accounts a couple of years back to properly do OAuth 2.0 with ARM.
Maybe the book is just "AD brings in the money" but wow, they sure bring it down as well. Global outages like that always stink of AD.
I interpreted this comment as more of a jab at how inefficient are outlook and teams themselves as applications.
I don't know if it's the right interpretation to have, but I kind of agreed with it, considering huge issues I had with teams (curiously some of them are only there for linux users, weird when considering the fact that I only use teams' web page) - not saying I could do better though!
That wasn't my point. But tools like Teams kill more productivity than they enable, at least in my experience. If anything, I was more productive yesterday, because I got disturbed less.
> I don't know how many of you sign in to multiple tenants in the console, but it generally involves buying a new computer.
This made me laugh out loud. I'm working in a multi-tenant, multi-subscription environment with Azure AD just now. MS force you to use 2FA and I picked the wrong 2FA app.
Now it's completely and utterly comical trying to work out which generated 2FA auth code I need to key in when auth'ing in Visual Studio because there are absolutely no visual cues as to which subscription it's trying to authenticate to. You can't tell VS that "I'm only interested in auth'ing to this particular subscription". Now it prompts me for almost every subscription we use and it's a whack-a-mole experience. They really need to fix the UI/UX in VS for this.
Of course when it comes to mandatory password change time I have to go through this pain all over again.
I’m setting up a system with multiple AAD B2C tenants, so I get the joy of switching back and forth between the primary tenant and the B2C tenants frequently (at least until I can finish automating enough of the B2C provisioning bits).
I don’t yet have enough context to fully evaluate against cognito. It may end up being nice to have B2C as a first class AAD tenant, but until I get far enough along to realize those benefits, there will be a lot more cursing under my breath about the need for another layer of identity and the lack of control plane access through azure resource manager APIs/tooling.
I have multiple chrome profiles for this. However, despite switching from one subscription to another to access each different AAD tenant across multiple chrome profiles, it seems that Azure "remembers" the subscription you last accessed, across profiles. It's as if the last subscription you accessed is tagged to your Azure user server side rather being a blob of client side state. This is deeply annoying as well, especially when your sessions expire...
Firefox containers is the solution to this headache for almost every multi tenanted service. I used to have it installed only for those tasks when I was working in consultancy.
Firefox containers aren't a patch on Chrome profiles (which I did mention I was using). I'd switch to Firefox in a New York minute if they fixed the profile management UX (about:profiles).
YES. I have the dubious honor of needing to use at least 4 different Teams tenants over the course of a week and it is enough to make me want to pitch my computer into the sea. App, browser, private browser - doesn't seem to matter. When I try to sign in, Microsoft will pick one of the tenants seemingly at random, regardless of what URL I use, and try to sign me in - of course, since there is usually no visual cue as to which tenant I'm looking at, I just put in a password and pray.
I use it, but since the azure portal uses the uri fragment, it still requires constructing the correct url in the correct container. One mistaken url will obliterate the container, and restoring it requires delete windowsazure.com, microsoftonline.com, portal.azure.com, and another one that I can't remember right now.
You'd really have to try to make it so screwy.
It kind of a shame. Like most things, Azure was better when it was smaller. I loved the first version of functions.
Microsoft has a lesser known Azure Gov Cloud specifically because of this that is disconnected from public Cloud. This includes dedicated staff with vetted secret clearances for access to those systems.
At this point most probably, yes. Especially as more and more government entities/agencies are moving to the cloud, many of them to Azure (because of MS). I live in Eastern Europe, but I suspect that this migration is happening all around Europe and North America.
There’s 365.2425 days per year, so a six hour outage is just about 0.2425 hours, which suggests that they remain able to declare 365 when considering this specific outage only.
"We are investigating reports of issues with Actions. This looks related to Azure networking issue which is impacting multiple regions. We are seeing improvements and will continue to monitor this."
Every Azure product I've had to use has been lousy in every possible way. Azure DevOps at my last employer was a nightmare and nobody in the company liked it, not even the managers who decided on it.
I've been learning / using DevOps for the past four months and find it "quite good", and have previously used Jira, although not in great detail.
I'm making the effort to learn it in increasing detail as it's the company-wide chosen system. I'm interested to know what made / makes it a nightmare for anyone else.
I have some Azure services that are not able to consistently make outbound HTTP requests to my heartbeat monitoring service so I'm getting alert after alert this morning. This is just the nudge I needed, and I'll be moving the whole thing to Linode later this afternoon.
Wouldn't it be quite simple to set up an unofficial status page that just pings some relevant services and if they have a disastrous outage at least, it shows it?
Because I think it's clear that their status page is useless and "manual".
> We've identified a potential networking issue and are reviewing telemetry to determine the next troubleshooting steps. You can find additional information on our status page at https://msft.it/6011eAYPc or on SHD under MO502273.
If you implement an allowlisting proxy, the number of required domains for M365 / Azure is something like 120 [1]. Google basically requires three, tunnel.cloudproxy.app, *.google.com and *.googleapis.com. Amazon requires *.aws.amazon.com, *.amazonaws.com, *.awsstatic.com, *.api.aws and *.aws.dev.
My point is MS uses a lot of unrelated domains that are very different from the main brand, even the one above looks dodgy (msft[.]it)
From your list, microsoftonline-p[.]com is an official domain, but it looks like a typosquat.
I think it's quite far from "great domain planning".
Also a personal favorite of mine: http://microsft.com (not entirely sure if its just to prevent typosquatting or if this is actually used in some products)
I don't know whether it's a typo but https://support.microsoft.com/en-us/topic/contact-us-91f63b4... lists "EOC: criskgro@microsft.com (For CEE and MEA)" under the Microsoft Credit Services. It feels like a typo, but who knows. If they don't have anything in place to catch this type of error, it's probably a good idea to register every domain someone could accidentally type.
And I imagine they add new domains with updates all the time.
At home I was trying to avoid random reboots from updates in a full proof way in a Windows VM that ran long processing tasks. I determined the only reasonable course of action was to remove all internet access. Stamping out the massive list of changing domains (and hard coded ip addresses?) would just be to much work that I know I would never keep up with.
A white list might work.
I mused that you could have a constantly updating Windows machine and monitor all of its connections, adding them to a block list on an external firewall but in addition to being complex to setup I bet it wouldn't even catch everything.
Yet people continue to defend Microsoft's telemetry practices. The OS won't let you opt out without it fighting you and they'll even fight you for blocking it on the network.
.it ccTLD is especially bad. Almost all of the generated SEO spam links to malicious ad networks I get on search pages are usually .it domains, all written in machine english, not italian. Thanks for reminding me and discovering -site:.it works in search queries to filter it out.
I think what the OP saying is, if you have multiple random domains, how would people know which ones are legit (or not)? Say I have mixxxrosoft.com, how would you know this is one of MS' official domains?
Auth via Microsoft ID is degraded, our platform is blipping (cache retries, message retries due to packet loss), access to the Azure portal is degraded and the Azure status page isn't loading consistently.
> The issue is causing impact in waves, peaking approximately every 30 minutes.
Does anyone have any general ideas on what kind of outage manifests itself like this? Devices retrying to authenticate every 30 minutes and finding the service is down perhaps?
Can sometimes be scaling/monitoring loops. i.e. cluster comes up, provides some limited service, gets overloaded and drops below required performance metric, gets killed by monitoring/scaling system, repeat...
Cloud is the new power grid. When it goes down, we lose power to everything. Will we learn from the grid and decentralise some of the compute and cloud services?
yes here. storage, db, apis - its not permanent but still persisting. It can be monitored at the azure status page as well https://status.azure.com/status
Anyone else remember the bad Windows Defender virus signature they put out on Friday the 13th a couple weeks ago? Microsoft is not having a good start to their year.
When anything on that page turns not-green, there are news stories about it. Not positive ones. So exec approval is needed, because the decision to flip something on that page is ultimately the decision to cause stories negative to MS to be published. The exec has to weigh whether pissing off the customers (by failing to acknowledge reality) is worth the bad press and SLA fallout.
It has nothing to do with press. This is negative press already, and journalist can use this to write their stories without waiting for the official light to go from green to yellow.
It's about contractual obligations and SLAs. Things are not officially down in most agreements until MSFT acknowledges they're down. Refunds issued because your blob storage failed to meet 99.9999 uptime to your largest customers are directly tied to these statuses.
I think it's an important enough page that it can't be automated. It needs a manual approval from a human, for the very basics, like even if the status reporting system is operating correctly, because of various downstream effects.
Then they should have a "?" status that can be triggered by automated systems that acknowledge that it looks to be an issue but that they are manually investigating.
If it's a false positive they just resolve it without it affecting SLA and if it's a real problem then us customers wouldn't have to debug our own stack for 2 hours before Microsoft informs us that they are the problem.
EDIT:
Wonder how many man-years of extra debugging work their non-working status page have caused the customers.
Which means if one were to require monitoring and status pages to be connected, one of two things happen (for each monitored component):
(1) The monitoring system would be altered to ignore tests that return false positives (at the expense of missing the alert when it represents an outage).
(2) Fixing the monitoring. It wasn't working for the sysadmins/operators, anyway, since it had so many false positives that their "mental model" was essentially based on (1), anyway.
At least, where I've forced the issue of doing just this, that's exactly what happened. At the end of the day, especially since SLAs took a hit and that affected bonus payouts, monitoring got a lot better -- as did overall team function when we truly realized how bad things were -- we stopped doing workarounds and started fixing problems at a more fundamental level which led to SLAs that were both accurate and excellent.
It helped bring attention to a hidden problem which resulted in time being allocated to fix tests that dropped constant false-positives and to evaluate each for whether or not it should exist in the first place.
Which impacts economics because some customers surely got deals guaranteeing some amount of credits based on up/downtime as reported by the status page.
And so updates to the status page become political and locked behind senior management approvals.. like AWS.
Someone has to "approve" the status pages showing what's actually happening? From a customer perspective, it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues. It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception.
> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one
Thats not the goal.
> It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception
Thats the goal. The "status page" is considered the source of truth for most of the big contracts. If status-page=OK then your contract with them isn't violated. So changing the status page is a big deal, with real financial implications. The status page isn't a view into the SRE's tickets, its a declaration that the service isn't being provided.
Don't know why this was downvoted. We've definitely been able to provide proof of an outage when the status page showed otherwise and get a refund in the form of server credits by contacting them directly. For all 3 big vendors, AWS, Azure, GCP
Hidden in the SLA details is typically hints on how you can become more resilient in the cloud. So it pays to read the SLA details and really deeply understand what they are telling you.
Exec approval for showing major outages on status dashboard is pretty much standard practice across large companies. The main differentiator is whether it’s approved within five minutes or two hours.
> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues.
I disagree. What if you're having issues and the status page is incorrectly reporting an incident? It would be easy to waste a load of time waiting for the status page to sort itself out, only to find out you've still got an issue.
As others noted, the so-called "status" pages of big service providers don't serve to reflect reality but to shape it. For actual status you need to consult independent monitoring services.
But shouldn't the individual service dots be automatically turning another color than green? I mean it's an automated service status page, right? Whether there is a human message at the top and that can take some time I understand.
No, it's not automated. I'm sure the underlying tech is automated, but once companies grow beyond a certain size, it needs a human to say "show this status change to the world" because there are lots of things depending on it (e.g. SLAs, but also bonuses, I assume), so they don't want a potential bug in the status system to influence that.
It's weird how slow they are with manual sign-off though.
No but if msft’s own status page shows downtime more than 0.01% of the time msft will struggle to argue they haven’t breached their SLA, so financial consequences to the company.
But I don’t want the page connected to their bonuses or SLA’s I just want to know whether they are having any issues anywhere. And I need to know within a minute of my own service not working so I’m not chasing the wrong thing. This can’t be an unreasonable thing to ask for?
I agree. I'm already annoyed at Hetzner with their 5 minute lag in reporting network outages where I'm regularly noticing them, investigating, checking status and then only after a few minutes see them updating and saying "it's us".
If you work with Microsoft, you might as well spend a few bucks extra and have an external monitoring system monitor Microsoft's systems so you get real-time third-party confirmation when your monitoring alerts you of issues concerning your system. It's the price you pay for scale, I guess. More money involved = more lawyers involved = more accountants involved = more MBAs involved = more corporate bullshit.
Azure is the most developer hostile cloud environment. I have zero sympathy for people being affected by this because if you voluntarily use Azure then this is what you deserve. Sorry for being so miserable, but Azure has given me soooo much grief over the last 10 years that I'm just completely done with this shitshow of a platform.
Our work switched from Slack to Teams after an acquisition, and I can confidently say that Teams is just complete garbage compared to Slack.
- The interface is laggy
- Scrolling back in long messages is buggy, it often skips around and loses its place
- No built in "whiteboarding" tools in screen sharing
- Teams will often keep ringing on my phone for up to a minute after I picked up a call on my laptop
- Sometimes I can't click reactions on messages. I click the emoji and nothing happens
Overall, it's just poorly made software. It feels like something that was made by a couple of interns in their spare time, not a keystone product from a multi-billion dollar company
Also for several weeks recently my phone was getting messages several minutes before they showed up on my laptop, and 3 or 4 of my coworkers (all remote and in various parts of the US) confirmed they were having the same issue.
Nothing is forcing companies to sign an Azure contract with Microsoft, and go with AWS or GCP instead. Perhaps they are just doing something right. But I didn't use Azure myself. I'd be curious to know what's good or bad about it compared to GCP and AWS.
For a development team, here's an example of something good about Azure: Microsoft gives us dev accounts with monthly Azure credits (e.g. $100) and you cannot spend more when those credits run out because there is no credit card etc. behind that account to charge the excess.
Azure just like other cloud services (I've used AWS but as I understand it GCP is the same) doesn't believe in timely billing. You can and will receive charges against an account for services that were turned off yesterday, the day before, even last week, as gradually billing catches up to reality. This means that there is no way to actually cap a budget. If you decide "Once this costs $100 I'm turning it off" you are not capping your expense at $100, after you turn it off charges keep arriving, I've seen a week later and I wouldn't be surprised if it can be longer. Should they do that? Well, even if they shouldn't, good luck making them stop.
But with the "free" Azure credits that have no money behind them, when it drops dead Microsoft eats all the residual charges that will be discovered days or weeks later, because there is no other party for them to bill.
I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal, but the University gets an enormous discount, for obvious reasons, and if the other cloud vendors don't want to offer actual billing it does feel like they deserve the consequences.
Sure, it's obvious why they do this. Unlike drug dealers (who don't actually give school kids free crack, that makes no economic sense) it does make sense for Microsoft to ensure every kid who knows how to do rudimentary word processing knows Word, etc.
Nobody is under any illusion that Microsoft just really likes universities for some reason. But on the other hand, we did need lots of this stuff and it's very cheap, budgets are tight and it's not as though hand-rolling even more stuff would be cheaper - we do hand roll some things where it makes sense.
For example, periodically senior people say "Why do we spend $$$$ on a supercomputer? Surely we could rent one from the cloud?" and we (well, not me, different group same department) go OK, we will cost that for you. And they get Azure, Google, etc. to quote them for what they need a supercomputer to do, and then they present this, "The Cloud providers can do that for $$$$$". Ah, that's more money. No thanks, we will continue to run our own supercomputer.
It's not even close. Cloud supercomputer is great if you need the supercomputer for six weeks to do a special project and then you're done with it, the Cloud provider saves you a lot of money. But the University needs supercomputers all the time, so the numbers do not work.
GCP gives me an invoice every first of the month, automatically.
It also offers budget caps, but indeed, those are more a warning and not a hard shutdown. That's annoying. Same at microsoft by the way, except indeed that developer credit as a failsafe.
Google gives 100k free credits to universities and startups by the way (and even to individual departmens if you are a big university). You just have to apply and let them bring in trainers and you have to actually use a percentage, otherwise they take it away the next year.
Whats the deal with the MSDOS era limitations for keyvault and storage account names. FFS it has to be unique AND within 3-24 characters consisting of lowercase letters, numbers and dashes. Storage accounts can’t use the dash. Hello? I thought current century DNS names were limited to 60 characters.
It sounds to me some legacy Windows 2000 spaghettini fettuccini is powering some parts of azure.
> I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal
For Cloud to make economic sense, you need to treat it very differently from traditional infrastructure. For example, simply shutting down our Dev environment outside of business hours saves means we're not paying for the compute the majority of the time.
This is why I absolutely avoid using Azure, AWS or GCP for my own side projects. On the company account, sure, it's your money. But I'm not going to risk my savings because I misconfigured a lambda or something.
> I'd be curious to know what's good or bad about it compared to GCP and AWS.
Documentation lies, support lies, metrics lie, bugs everywhere, and when something breaks the status page is always all green and support tries to convince you it's your fault anyway. They're only here to prevent you from enforcing the SLA. The distrust is pervasive.
I stopped suspecting my code, if something breaks outside of a planned maintenance it is _always_ Azure.
My latest support ticket: Azure App Service internal DNS server broke and there is no way to bypass it short of hardcoding IPs in /etc/hosts. Support told me that if I wanted App Service to work reliably I had to implement their DNS server myself.
To rephrase, my PaaS provider told me to spend time and money to implement the very platform I was paying them for, and it just so happened to be absolutely impossible because of an unannounced BC break a few months prior (which is another lengthy and frustrating story).
This morning I had a VM cut out of the network and 10% of my App Service traffic just disappeared. No explanation, no incident report, nothing.
These days I'm working with AWS, and it just works. If something isn't working you know it's your fault and that the answer is in the documentation. I'm not spending days on workarounds, I'm actually implementing as planned. I have no words to describe the relief I'm feeling.
I know Azure generally sucks... If you think you cannot go lower, you should try Oracle Cloud. That is a total piece of dung of a Cloud Service.
I tried it a couple of years ago. After finishing the trial, I removed all instances and disks, supposedly completely blanking the account. And also supposedly deleted the account.
To this day, I still keep receiving some kind of invoice for about $2 USD that they say I owe. And when I login into the "oracle cloud account" nothing works because my account seems to be half-deleted. (like I get error screens when accessing several of their piece of shit panels).
To make things worse, suddenly I started receiving emails from some of their sales team in Portuguese, I guess that my last name sounds kind of Portuguese so someone say, yeah, you write to him.
And while using their system I was not really impressed. Their cost structure was weirder than AWS (and that's saying something) and to mount a volume in an instance you had to do some funky commands.
I would NEVER trust business technology to that sort of system.
Ive used gcp and ive been billed like 10% of minimal wage for setting GCPs demo with like 7 very simple microservices (i dont remember exactly) 4 times and every of them was running like 5 minutes after being deployed and then project was killed
Shit is expensive as hell
For the same money I could rent some weak linux box for a year
what you show there should cost like 300/month to run. Its very transparent pricing, its just bad that the tutorial doesn't mention that.
You do realize what you setup in that tutorial right? A kubernetes cluster with 11 full scale microservices that are dimensioned so they can serve the average medium size business. For only a hobby this is huuuuuuugely overdimensioned.
If you were to do the same on azure, it would cost more. If you are comparing it to a cheap linux box, what the hell are you using kubernetes clusters for then?
I have to wonder what you were doing, I've been continuously hosting my own projects there for years and with the free tier they cost pennies per month to run.
You pay for the resources you use above the free tier limits. My bill for this month so far is 30 cents because I deployed frequently and my docker artifact storage size (with several years worth of deployments) dipped above the limit. Then I added a periodic job to clear out unused docker images older than one year and I'm running for free again.
This is one of the single most comprehensively intense demo projects I've ever seen. I did a multi day AWS Data Lab for work once and it wasn't this comprehensive.
I quite like Microsoft/Azure from a development perspective. If you're running .NET, Application Insights alone is nearly enough to put it above the competition. I appreciate how it integrates with AZD/Teams and the platform as a whole felt much more cohesive than AWS.
The monthly $60-$100 developer credit was fantastic as well. It avoided the usual fighting for approval/budget to test things out.
Yeah, I'm currently missing it very much running .NET on AWS. It's insane how much it gives you for "free". CloudWatch feels like weak tea in comparison.
We moved from AWS to Azure for other reasons, but in doing so we moved from X-Ray to AppInsights, and the difference was amazing. We're big App Insights fans.
From what I can understand choosing Azure is almost always a top-down decision, especially when it comes to government entities/agencies (I live in Europe). MS has a hell of a sales network.
It's usually a cost decision and AWS don't really care about anything smaller then say the US government enough to even attempt to engage in competitive bidding proposals so if a company/organization put out an RFP MS usually finds a way to look cheaper then AWS.
Add to that that AWS dont really engage in the normal business to business sales process but simple gives you a price list and tells you "thats what it costs" pretty much straight up and it's no surprise a lot of traditional enterprises with huge existing Microsoft bills end up with the vendor they know, understand and think they can control.
It's not that there is anything really wrong with AWS their support is good their products work but it's a messy platform where you really need to pay attention and might even engage with consultant to fully understand what your paying for and how optimization decisions is affecting your ROI as everything is priced individually in AWS where as Azure does a bit more bundling into packages.
Many companies use Active Directory. The new kid in the block is Azure Active Directory (AAD), which is the evolution of the self-hosted Windows Servers.
Since many companies rely on it, especially for role base access to internal resources, you can't avoid it as a developer/employee.
You're right, but that's not what they meant (and it's not AAD's trajectory). Microsoft's been adding more and more device management, policies, software rollout, etc. to AAD to bring it up into equal standing with AD and then, eventually, allow most deployments to use just AAD, instead of holding some bulky AD setup of on-prem & cloud.
the people buying these things obviously have no idea about that.
Migrating to Okta or something else neutral would cost the same, but hey, that's a different name
not even remotely close. okta for an enterprise is big dollars. most shops already have o365, so the AAD premium tier licensing is already paid for. aad and okta workforce are almost feature parity.
Microsoft just has found how to sell Azure: scare compliance teams that AWS and GCP are horrible, especially in EU and banking. Use their office monopoly to give huge discounts if you buy as a bundle, and be awesome on comparison charts. They check all the boxes of services they offer. For an exec, it doesn't count how well those services are executed, thats a developer problem that a system integrator will solve.
> Sorry for being so miserable, but Azure has given me soooo much grief over the last 10 years that I'm just completely done with this shitshow of a platform.
And yet it continues to rake in billions + grow 20-40% month over month (even if it is slowing)
They know how important developers are, that's why they bought Github. But developer experience is indeed one of the minor points of consideration when choosing a cloud vendor for large enterprise. Customer service, billing and integration into existing infrastructure is much more important.
Persistent problems between Azure VMs and virtual disks causing unexpected reboots. Complete outages. And don't even start me on ACI (for Windows). It doesn't even work.
In 7 years we had one AWS AZ outage and we didn't even notice because our monitoring platform in there couldn't reach the network (learned something!). But nothing broke. Even the us-east-1 outages didn't affect us.
Were you using Standard HDD disks? They have a really poor SLA, and are only usable for things like stateless VM Scale Sets or otherwise redundant services.
We had to switch everything to SSD to get reliability comparable to on-prem VMware.
That sounds like what I've seen on Azure. Mystery weird problems we see, but they don't. Often in the network side. One time we were pretty sure they had a bad interface in a LAG group. Massive packet loss between hosts, but only on certain ephemeral source ports, about 1/8 of them.... Support couldn't find any issues even after a few days.
This was circa 2018 but AWS was so much more stable at that time. Ok, US-E-1 AWS had issues from time to time but they acked them and fixed them
Yes the lack of them being able to see any problems was a constant problem.
Our AWS reps are all over stuff when it goes down. I regularly get to talk to actual real product managers and engineers via our enterprise support if anything goes wrong.
I'm glad I work for a company that uses Office 365 instead of the equivalent of Google or others. I really like Office products, for all their faults they allow me to work more productively than the alternatives. So I don't know why I have less in my head just because I can work well with Excel and Outlook.
It doesn't really matter, at least if you're in the EU.
While Google, Amazon and others were busy complaining about GDPR, Microsoft was busy working on being compliant, with the result that today they're pretty much the only legal/compliant solution in most of the EU.
The more regulated the industry (health, finance, etc), the more you can be certain that it's running on Azure if it's EU based and running in the cloud.
The irony is the amount of money they have thrown at "Dev Advocates" who don't do a god damn thing to advocate for how developers use their platform. Frankly that's because folks that care burn out. I still remember the time a high-up rail-roaded me and lied repeatedly to a VP about the design of a product as I desperately tried to save them from the 5+ years of having to educate users on two different ways to do [basic ops]. Those basic cloud objects of course have major differences in functionality and ecosystem viability depending on what you choose, but this isn't really explained up front either, you find out by building a solution for months and then finding out you have to backtrack and start over the Azure integration. Maybe again.
All to say, I agree wholeheartedly with every word.
Windows ME did in fact work mostly fine here too, lol. Relatively speaking for Windows 9x performance, of course. I only used it for a year, not because I couldn't stand it but because such was the pace of major Windows updates back then.
Windows Vista was honestly worse for me, not due to bugs but for being two years ahead the curve of hardware, and GPU vendors seemingly rolling their thumbs during betas and once WDDM¹ went live, they panicked and rolled out alpha quality work. So many driver crashes compounded with the heavy RAM requirements... Other than that, and with less of an UAC nazi, I could see an OS that was similar to what Windows 7 became if I squinted. Hardware had caught up, drivers were mature, and on top Microsoft optimized its performance.
In hindsight, WDDM should've been an update to Windows XP that could be rolled out well in advance and let developers focus on a single thing rather than new OS compatibility on top, and deep changes like UAC.
Windows ME was one of my favorite versions of Windows, not being ironic about it either. Its infamy has more to do with how Joe Average uses computers in general.
As for Vista, while I did not use it in its day I can tell its problems were far more to do with crapass hardware manufacturers and their crapass drivers. Vista with access to 7's drivers and hardware runs just fine.
This place I work at has actively fought against using Azure, but we use them because it's advantageous to the business. (or it's perceived to be).
We have actively pushed for AWS or even GCP but it's futile when it doesn't align with business. I'd imagine a lot of developers are facing the same company issues.