Hacker News new | past | comments | ask | show | jobs | submit login
Microsoft Azure Outage (twitter.com/msft365status)
300 points by maxaigner on Jan 25, 2023 | hide | past | favorite | 234 comments



It's good that Microsoft saved money via layoffs so that it balances out when customers leave Azure. Very forward thinking company.


Leave to go where?

On-premise and being miserable having to wait months to get a new server with poor automation, observability and worse outages? To another major cloud provider with similar pricing and outages?

Cloud helped mostly with automation and scaling but if your system is that critical, you should consider a good CDN as load balancer and multi-cloud (or at least multi-region) for actual robustness.


AWS and GCP both have ~100% uptime in every region for VMs this month. Meanwhile the majority of Azure regions have had various outages in the same period: https://cloudharmony.com/status-of-compute


Almost certainly due to Azure's broken policy where we have critical change advisory's that block deployments for huge periods of time towards the end of the year because of Black Friday and then holidays. Every team has basically been unable to deploy since the week before Thanksgiving when a surprise CCOA was pushed out by leadership at the behest of a certain big customer... then there was the World Cup and the winter holidays. Nobody could really deploy anything from a week before Thanksgiving until a week after the New Years... almost two months worth of batched changes and every team YOLO button pressing as soon as they could in January.

And now layoffs so everyone is super unmotivated! Excellent stuff going on right now from Microsoft senior leadership.


Interesting. Can I see this longer than a month?


Right after an outage of course it will show like that.

After an AWS outage it would also look non favourably on AWS right?


Wow I didn't expect the difference to be so obvious.


It is weird that this answer was downvoted. I agree. What a great page!


Where I worked, the internal approval processes and controls over cloud resources are as lengthly as those for on premise hardware. So that may be the case for small companies but I don't think there is much of a difference in those large bureaucracies.


Shows that all these availability zones and regions don't really help if an outage can knock out a whole cloud provider. And that's not specific to Microsoft. The only way to really ensure uptime is to use two providers. Sadly, that's basically only possible with on-prem/colocation where traffic is cheap.


It's mostly Azure though that is badly designed to such an extent that multiple times there have been global outages. In general Azure availability, security (the only major cloud provider with not one but multiple cross-tenant security exploits) and usability are pretty terrible so it shouldn't be used for anything but saying "this is how it should not be done".

GCP had a similar thing once, where a BGP update knocked out their Asian regions.

AWS have never had a global outage. (And no, that time S3 in us-east-1 was down wasn't a global outage, the only customer code/workloads that were impacted was code interacting with S3 that didn't specify the region and had to rely on us-east-1 to determine it, and it didn't work anymore)


To be fair, AWS once had a global Route53 outage, which was effectively a global outage for anyone using AWS for DNS.


That outage was limited to Route 53 DNS record editing and not DNS lookups.


Do you have a link to an article about that? My google-fu is weak, and this sounds interesting - that should not happen to DNS - at all - and from the outside Route53 looks quite well managed. So what the heck did they do?


It was back in 2019.

https://twitter.com/AWSSupport/status/1186735657387003904

I forget the details. I do remember half of our internal tools not working at the time due to DNS issues, though. Good times.


Someday someone will write a book about how AD, AAD, etc, exert the control they do at MS and go as unchecked (or at the time) as they do. AD's inability to execute made Azure a significantly less pleasant platform until they finally fixed accounts a couple of years back to properly do OAuth 2.0 with ARM.

Maybe the book is just "AD brings in the money" but wow, they sure bring it down as well. Global outages like that always stink of AD.


There have been several cloudfront outages that have effectively been semi global outages


I'm hearing from four different friends from four different companies in Germany that they can't really work right now.


If they were relying on Outlook and Teams to be productive, they probably couldn't really work before either.


What a naive comment. As if the only truly important jobs exist in engineering and require nothing but git and a book on C.


I interpreted this comment as more of a jab at how inefficient are outlook and teams themselves as applications.

I don't know if it's the right interpretation to have, but I kind of agreed with it, considering huge issues I had with teams (curiously some of them are only there for linux users, weird when considering the fact that I only use teams' web page) - not saying I could do better though!


Yeah, what BS. Everyone knows that if you have a book on C, you can always quickly implement git yourself.


I'm unsure what being in engineering has to do with using outlook and teams?


That wasn't my point. But tools like Teams kill more productivity than they enable, at least in my experience. If anything, I was more productive yesterday, because I got disturbed less.


This makes you wonder if some centralization patterns, i.e. Azure AD, are not a national security problem?


Azure AD is a nightmare. I don't know how many of you sign in to multiple tenants in the console, but it generally involves buying a new computer.


> I don't know how many of you sign in to multiple tenants in the console, but it generally involves buying a new computer.

This made me laugh out loud. I'm working in a multi-tenant, multi-subscription environment with Azure AD just now. MS force you to use 2FA and I picked the wrong 2FA app.

Now it's completely and utterly comical trying to work out which generated 2FA auth code I need to key in when auth'ing in Visual Studio because there are absolutely no visual cues as to which subscription it's trying to authenticate to. You can't tell VS that "I'm only interested in auth'ing to this particular subscription". Now it prompts me for almost every subscription we use and it's a whack-a-mole experience. They really need to fix the UI/UX in VS for this.

Of course when it comes to mandatory password change time I have to go through this pain all over again.


I’m setting up a system with multiple AAD B2C tenants, so I get the joy of switching back and forth between the primary tenant and the B2C tenants frequently (at least until I can finish automating enough of the B2C provisioning bits).

I don’t yet have enough context to fully evaluate against cognito. It may end up being nice to have B2C as a first class AAD tenant, but until I get far enough along to realize those benefits, there will be a lot more cursing under my breath about the need for another layer of identity and the lack of control plane access through azure resource manager APIs/tooling.


I have multiple chrome profiles for this. However, despite switching from one subscription to another to access each different AAD tenant across multiple chrome profiles, it seems that Azure "remembers" the subscription you last accessed, across profiles. It's as if the last subscription you accessed is tagged to your Azure user server side rather being a blob of client side state. This is deeply annoying as well, especially when your sessions expire...


Firefox containers is the solution to this headache for almost every multi tenanted service. I used to have it installed only for those tasks when I was working in consultancy.


Firefox containers aren't a patch on Chrome profiles (which I did mention I was using). I'd switch to Firefox in a New York minute if they fixed the profile management UX (about:profiles).


YES. I have the dubious honor of needing to use at least 4 different Teams tenants over the course of a week and it is enough to make me want to pitch my computer into the sea. App, browser, private browser - doesn't seem to matter. When I try to sign in, Microsoft will pick one of the tenants seemingly at random, regardless of what URL I use, and try to sign me in - of course, since there is usually no visual cue as to which tenant I'm looking at, I just put in a password and pray.


Use browser profiles, choose a different profile picture for each, then use one profile per tenant. Done.


Firefox Multi-Account Containers extension. I couldn't live without it.


I use it, but since the azure portal uses the uri fragment, it still requires constructing the correct url in the correct container. One mistaken url will obliterate the container, and restoring it requires delete windowsazure.com, microsoftonline.com, portal.azure.com, and another one that I can't remember right now.

You'd really have to try to make it so screwy.

It kind of a shame. Like most things, Azure was better when it was smaller. I loved the first version of functions.


It involves a lot of private browsing sessions which is actually MS's recommendation!

What a PITA.


Nah I just set up a 2nd browser profile, and they both stay signed in. It’s a breeze.


Or Firefox containers?


Almost no one, probably.


Critical infrastructure cannot be reliant on a cloud (or internet availability, if possible). In most EU countries that's a law.


Do you have any good resource summarizing these laws?


No, sorry. If you are adventorous enough, look at Latvia with Google translate and search for "critical infrastructure" at https://likumi.lv


Microsoft has a lesser known Azure Gov Cloud specifically because of this that is disconnected from public Cloud. This includes dedicated staff with vetted secret clearances for access to those systems.

https://azure.microsoft.com/en-us/explore/global-infrastruct...


Can anyone make use of this?


No this is specifically for official government business


At this point most probably, yes. Especially as more and more government entities/agencies are moving to the cloud, many of them to Azure (because of MS). I live in Eastern Europe, but I suspect that this migration is happening all around Europe and North America.


In the azure portal, it shows a "Routine Unplanned outage" - ??


Well points for honesty, at least. :)


I guess thats the 0.0001% of outage for an advertised 99.9999% uptime


At least they have a sense of humor


Does this mean they need to rebrand, because it's not up 365 days of the year? Maybe rebrand it to Microsoft 364.5?


There’s 365.2425 days per year, so a six hour outage is just about 0.2425 hours, which suggests that they remain able to declare 365 when considering this specific outage only.


I think the joke always went that they should rename it Microsoft 360.


Not sure if it's directly related, but GitHub is also experiencing issues: https://www.githubstatus.com/


"We are investigating reports of issues with Actions. This looks related to Azure networking issue which is impacting multiple regions. We are seeing improvements and will continue to monitor this."


GH has been a Microsoft company since 2018...


Good to see GH is eating the dog food


At work, we all got kicked out of a teams meeting an hour back and sending/receiving e-mails on Outlook seems to be slow.

Location: Chennai, India


This is going to be the most productive day ever


Every Azure product I've had to use has been lousy in every possible way. Azure DevOps at my last employer was a nightmare and nobody in the company liked it, not even the managers who decided on it.


I've been learning / using DevOps for the past four months and find it "quite good", and have previously used Jira, although not in great detail.

I'm making the effort to learn it in increasing detail as it's the company-wide chosen system. I'm interested to know what made / makes it a nightmare for anyone else.

(And I'm no fan of Microsoft as a whole)


I use Azure DevOps daily and honestly have no issues, it works well. What didn't work for you?


I have some Azure services that are not able to consistently make outbound HTTP requests to my heartbeat monitoring service so I'm getting alert after alert this morning. This is just the nudge I needed, and I'll be moving the whole thing to Linode later this afternoon.


Wouldn't it be quite simple to set up an unofficial status page that just pings some relevant services and if they have a disastrous outage at least, it shows it?

Because I think it's clear that their status page is useless and "manual".


It comes and goes. Teams and Azure DevOps some times works perfectly for a few minutes, then responds with all 503's for a few minutes.


> We've identified a potential networking issue and are reviewing telemetry to determine the next troubleshooting steps. You can find additional information on our status page at https://msft.it/6011eAYPc or on SHD under MO502273.


I'm so surprised by MS's strategy for using random domains and TLD's, this certainly don't make it easy for phishing avoidance.


If you implement an allowlisting proxy, the number of required domains for M365 / Azure is something like 120 [1]. Google basically requires three, tunnel.cloudproxy.app, *.google.com and *.googleapis.com. Amazon requires *.aws.amazon.com, *.amazonaws.com, *.awsstatic.com, *.api.aws and *.aws.dev.

Microsoft has some great domain planning.

[1] https://learn.microsoft.com/en-us/microsoft-365/enterprise/u...


My point is MS uses a lot of unrelated domains that are very different from the main brand, even the one above looks dodgy (msft[.]it) From your list, microsoftonline-p[.]com is an official domain, but it looks like a typosquat. I think it's quite far from "great domain planning".


> I think it's quite far from "great domain planning".

The poster saying they have 120 of them would imply that being sarcasm


They appear to be being sarcastic. I don't think anyone would be seriously saying 120 is better than 3 or 6 domains.


Luckily Microsoft also provides a service for that: Safelinks https://learn.microsoft.com/en-us/microsoft-365/security/off...

Also a personal favorite of mine: http://microsft.com (not entirely sure if its just to prevent typosquatting or if this is actually used in some products)


I don't know whether it's a typo but https://support.microsoft.com/en-us/topic/contact-us-91f63b4... lists "EOC: criskgro@microsft.com (For CEE and MEA)" under the Microsoft Credit Services. It feels like a typo, but who knows. If they don't have anything in place to catch this type of error, it's probably a good idea to register every domain someone could accidentally type.


There's no MX record on the domain so it seems to typo


microsft.com was used specifically for telemetry to bypass web proxy blocks for *.microsoft.com put in by administrators of secure networks.

I know this because I was one of those admins trying to plug the leaks.

Windows 10 + Office uses 200+ domains just for Microsoft stuff, of which something like 120 are for telemetry.


And I imagine they add new domains with updates all the time.

At home I was trying to avoid random reboots from updates in a full proof way in a Windows VM that ran long processing tasks. I determined the only reasonable course of action was to remove all internet access. Stamping out the massive list of changing domains (and hard coded ip addresses?) would just be to much work that I know I would never keep up with.

A white list might work.

I mused that you could have a constantly updating Windows machine and monitor all of its connections, adding them to a block list on an external firewall but in addition to being complex to setup I bet it wouldn't even catch everything.


Yet people continue to defend Microsoft's telemetry practices. The OS won't let you opt out without it fighting you and they'll even fight you for blocking it on the network.

Windows is spyware.


.it ccTLD is especially bad. Almost all of the generated SEO spam links to malicious ad networks I get on search pages are usually .it domains, all written in machine english, not italian. Thanks for reminding me and discovering -site:.it works in search queries to filter it out.


Makes sense to use a different domain if everything is down because it could also effect DNS for the main domain.


I think what the OP saying is, if you have multiple random domains, how would people know which ones are legit (or not)? Say I have mixxxrosoft.com, how would you know this is one of MS' official domains?


It is often very difficult to test networking changes in production. For example, firewall rules. What sort of tools do people use for this?


Does the Internet Archive use Azure? archive.org is throwing 503s


Two weeks ago they were affected by the Elasticsearch outage[1], too.

[1] https://news.ycombinator.com/item?id=34337518


DuckDuckGo is also affected (blank search results).


https://downdetector.dk/ indicates several MS products and services are having problems. Here is the status from MS on Twitter: https://twitter.com/MSFT365Status/status/1618149579341369345 Edit: Added this link which apparently is the new status page and seems to be updated: https://status.office365.com/


Auth via Microsoft ID is degraded, our platform is blipping (cache retries, message retries due to packet loss), access to the Azure portal is degraded and the Azure status page isn't loading consistently.


Nothing is working for me, Oceania/Australia.

Including O365, Azure, Azure Devops.


Ah - so that's why GitHub Actions are unreliable right now.


Glad it wasn't just me. I was waiting over 10 minutes for a hosted runner.


Such a late 2010s / 2020s problem :-(


Office359 strikes again


You mean Office364


Everyone deserves a break between Christmas and New Years, even the folks at MS! /s


0<Office<365


> The issue is causing impact in waves, peaking approximately every 30 minutes.

Does anyone have any general ideas on what kind of outage manifests itself like this? Devices retrying to authenticate every 30 minutes and finding the service is down perhaps?


Can sometimes be scaling/monitoring loops. i.e. cluster comes up, provides some limited service, gets overloaded and drops below required performance metric, gets killed by monitoring/scaling system, repeat...


Many games that use Azure PlayFab are down as well due to this. Both PlayFab services and PlayFab MPS game-server hosting are currently broken.

https://status.playfab.com/


ThousandEyes public outage map shows the scale of the Office365 outage: https://www.thousandeyes.com/outages/


DuckDuckGo.com - no search results showing up at all... are they on Azure?


Yes.

    $ dig +short duckduckgo.com | xargs whois | grep Organization
    Organization:   Microsoft Corporation (MSFT)


They get their search results from Bing


Minecraft, Asure, Office 365, etc... MS cloud services have issue


Cloud is the new power grid. When it goes down, we lose power to everything. Will we learn from the grid and decentralise some of the compute and cloud services?


Seeing problems with Azure DevOps in Western Europe here, can't open most pages/log in. Teams and Office appear to be working fine.


Teams and Outlook not working fine here.


Same here in Germany. Even microsoft.com times out at the moment.


Reported issues with Teams, Microsoft 365, etc


Teams is working now for me. However, all my notification preferences got reset!


Had a few dropped calls in Teams over here this morning (South Africa), otherwise our devops stuff is currently fine.


Anyone having problems with Azure too?


yes here. storage, db, apis - its not permanent but still persisting. It can be monitored at the azure status page as well https://status.azure.com/status


LinkedIn seems to be struggling as well. Lots of latency, page loads are taking 10-20 seconds for me.


I did notice chatgpt was down earlier, but it could have been heavy usage caused


duckduckgo is not showing any results right now... are they on Azure, too?


Most likely, because duckduckgo partially depends on Bing


russia? shut down a service and halt the productivity of most companies in the west...because most companies moved to azure ad and teams.


> russia?

Oh please. Azure is plenty capable of taking themselves offline on their own.


I'm not sure Russia is as capable as you've all spent the last few decades making out....


Anyone else remember the bad Windows Defender virus signature they put out on Friday the 13th a couple weeks ago? Microsoft is not having a good start to their year.


Did we finally exhaust IPv4? /jk


Hope the Leopard tanks aren’t running azure…


What's the point of having a status page if it doesn't indicate the issues? https://status.azure.com/en-us/status

Azure, Teams, Outlook are almost down from Greece and Germany, and their status page shows that everything is fine :-)


When anything on that page turns not-green, there are news stories about it. Not positive ones. So exec approval is needed, because the decision to flip something on that page is ultimately the decision to cause stories negative to MS to be published. The exec has to weigh whether pissing off the customers (by failing to acknowledge reality) is worth the bad press and SLA fallout.


It has nothing to do with press. This is negative press already, and journalist can use this to write their stories without waiting for the official light to go from green to yellow.

It's about contractual obligations and SLAs. Things are not officially down in most agreements until MSFT acknowledges they're down. Refunds issued because your blob storage failed to meet 99.9999 uptime to your largest customers are directly tied to these statuses.


I'm not going out of my way to be hyperbolic or anything here, but that sounds suspiciously like "fraud" to me.


I don't think they're committing fraud.

I think it's an important enough page that it can't be automated. It needs a manual approval from a human, for the very basics, like even if the status reporting system is operating correctly, because of various downstream effects.


Which means it's not a status page any more. Defeating the supposed purpose.


"SLA refund page"?


The point is PR. Never trust a status page if it's not directly connected to the monitoring system.


They never attach it to the monitoring because monitoring systems usually generate a lot of false positives which affect their published SLA.


Then they should have a "?" status that can be triggered by automated systems that acknowledge that it looks to be an issue but that they are manually investigating.

If it's a false positive they just resolve it without it affecting SLA and if it's a real problem then us customers wouldn't have to debug our own stack for 2 hours before Microsoft informs us that they are the problem.

EDIT: Wonder how many man-years of extra debugging work their non-working status page have caused the customers.


They never attach it to the monitoring because monitoring systems usually generate a lot of correct positives which affect their published SLA.

Works equally well. See the point?


Which means if one were to require monitoring and status pages to be connected, one of two things happen (for each monitored component):

(1) The monitoring system would be altered to ignore tests that return false positives (at the expense of missing the alert when it represents an outage).

(2) Fixing the monitoring. It wasn't working for the sysadmins/operators, anyway, since it had so many false positives that their "mental model" was essentially based on (1), anyway.

At least, where I've forced the issue of doing just this, that's exactly what happened. At the end of the day, especially since SLAs took a hit and that affected bonus payouts, monitoring got a lot better -- as did overall team function when we truly realized how bad things were -- we stopped doing workarounds and started fixing problems at a more fundamental level which led to SLAs that were both accurate and excellent.

It helped bring attention to a hidden problem which resulted in time being allocated to fix tests that dropped constant false-positives and to evaluate each for whether or not it should exist in the first place.


Which impacts economics because some customers surely got deals guaranteeing some amount of credits based on up/downtime as reported by the status page.

And so updates to the status page become political and locked behind senior management approvals.. like AWS.


Yeah, that's why SLA reports never include <30m downtimes, convenient truth bending.


It's updated now - updates for service outages at this level generally need signoff form someone higher up the chain


Someone has to "approve" the status pages showing what's actually happening? From a customer perspective, it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues. It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception.


> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one

Thats not the goal.

> It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception

Thats the goal. The "status page" is considered the source of truth for most of the big contracts. If status-page=OK then your contract with them isn't violated. So changing the status page is a big deal, with real financial implications. The status page isn't a view into the SRE's tickets, its a declaration that the service isn't being provided.


Utter rubbish. Major contracts have account managers and it all gets hashed out 1-1.


Don't know why this was downvoted. We've definitely been able to provide proof of an outage when the status page showed otherwise and get a refund in the form of server credits by contacting them directly. For all 3 big vendors, AWS, Azure, GCP


Agree here as well. It's usually not that hard to provide based on the many, many metrics Azure resources emit that their SLA was breached.

What might be happening is that there is fine print you have to read and be in compliance with in order to be eligible for the SLA.

For example, look at all the conditions which have to be met for a breach of VM SLA in Azure:

https://azure.microsoft.com/en-us/support/legal/sla/virtual-...

Hidden in the SLA details is typically hints on how you can become more resilient in the cloud. So it pays to read the SLA details and really deeply understand what they are telling you.


Exec approval for showing major outages on status dashboard is pretty much standard practice across large companies. The main differentiator is whether it’s approved within five minutes or two hours.


> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues.

I disagree. What if you're having issues and the status page is incorrectly reporting an incident? It would be easy to waste a load of time waiting for the status page to sort itself out, only to find out you've still got an issue.


You can't approve a fact.


As others noted, the so-called "status" pages of big service providers don't serve to reflect reality but to shape it. For actual status you need to consult independent monitoring services.


well.... if that fact can be delayed by just a tiny bit... that's enough


But shouldn't the individual service dots be automatically turning another color than green? I mean it's an automated service status page, right? Whether there is a human message at the top and that can take some time I understand.


No, it's not automated. I'm sure the underlying tech is automated, but once companies grow beyond a certain size, it needs a human to say "show this status change to the world" because there are lots of things depending on it (e.g. SLAs, but also bonuses, I assume), so they don't want a potential bug in the status system to influence that.

It's weird how slow they are with manual sign-off though.


I haven't seen any SLA deal that says the status page must show 99.9% uptime...


No but if msft’s own status page shows downtime more than 0.01% of the time msft will struggle to argue they haven’t breached their SLA, so financial consequences to the company.


But I don’t want the page connected to their bonuses or SLA’s I just want to know whether they are having any issues anywhere. And I need to know within a minute of my own service not working so I’m not chasing the wrong thing. This can’t be an unreasonable thing to ask for?


I agree. I'm already annoyed at Hetzner with their 5 minute lag in reporting network outages where I'm regularly noticing them, investigating, checking status and then only after a few minutes see them updating and saying "it's us".

If you work with Microsoft, you might as well spend a few bucks extra and have an external monitoring system monitor Microsoft's systems so you get real-time third-party confirmation when your monitoring alerts you of issues concerning your system. It's the price you pay for scale, I guess. More money involved = more lawyers involved = more accountants involved = more MBAs involved = more corporate bullshit.


An automated one would be red 100% the time.


Maybe they couldn't update the status page due to the network outage.

I'm joking, but...


or, they could be automated and transparent


Then some group scrapes the uptime of their competitors page and reports that "competitor is x times more reliable than transparent co"


That's what happens when you don't have an independent party that keeps tabs on this.


We've concluded that status pages are a complete joke.


status.office.com had been down for 15 mins, but it's back up now...


We have been here before...HN is the only status page that matters.


Its public relations page.


Azure is the most developer hostile cloud environment. I have zero sympathy for people being affected by this because if you voluntarily use Azure then this is what you deserve. Sorry for being so miserable, but Azure has given me soooo much grief over the last 10 years that I'm just completely done with this shitshow of a platform.


> I have zero sympathy for people being affected by this because if you voluntarily use Azure then this is what you deserve

I guess many developers do not use Azure voluntarily but are forced to by their companies (or customers).


We're migrating on Teams because of that kind of reasoning.

It's utter shit of a service. Even worse if you need to write integrations for it


What will you use instead? What don't you like about teams (or wiring integrations)


Our work switched from Slack to Teams after an acquisition, and I can confidently say that Teams is just complete garbage compared to Slack.

- The interface is laggy

- Scrolling back in long messages is buggy, it often skips around and loses its place

- No built in "whiteboarding" tools in screen sharing

- Teams will often keep ringing on my phone for up to a minute after I picked up a call on my laptop

- Sometimes I can't click reactions on messages. I click the emoji and nothing happens

Overall, it's just poorly made software. It feels like something that was made by a couple of interns in their spare time, not a keystone product from a multi-billion dollar company


Also for several weeks recently my phone was getting messages several minutes before they showed up on my laptop, and 3 or 4 of my coworkers (all remote and in various parts of the US) confirmed they were having the same issue.


Thanks (both of you). (Then I understand better)


And the companies are forced to due to huge contracts with Microsoft


Nothing is forcing companies to sign an Azure contract with Microsoft, and go with AWS or GCP instead. Perhaps they are just doing something right. But I didn't use Azure myself. I'd be curious to know what's good or bad about it compared to GCP and AWS.


For a development team, here's an example of something good about Azure: Microsoft gives us dev accounts with monthly Azure credits (e.g. $100) and you cannot spend more when those credits run out because there is no credit card etc. behind that account to charge the excess.

Azure just like other cloud services (I've used AWS but as I understand it GCP is the same) doesn't believe in timely billing. You can and will receive charges against an account for services that were turned off yesterday, the day before, even last week, as gradually billing catches up to reality. This means that there is no way to actually cap a budget. If you decide "Once this costs $100 I'm turning it off" you are not capping your expense at $100, after you turn it off charges keep arriving, I've seen a week later and I wouldn't be surprised if it can be longer. Should they do that? Well, even if they shouldn't, good luck making them stop.

But with the "free" Azure credits that have no money behind them, when it drops dead Microsoft eats all the residual charges that will be discovered days or weeks later, because there is no other party for them to bill.

I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal, but the University gets an enormous discount, for obvious reasons, and if the other cloud vendors don't want to offer actual billing it does feel like they deserve the consequences.


> For a development team, here's an example of something good about Azure: Microsoft gives us dev accounts with monthly Azure credits (e.g. $100)

First analogy I thought of were stories about drug dealers giving away free samples to schoolchildren to hook them up before asking for money.


Sure, it's obvious why they do this. Unlike drug dealers (who don't actually give school kids free crack, that makes no economic sense) it does make sense for Microsoft to ensure every kid who knows how to do rudimentary word processing knows Word, etc.

Nobody is under any illusion that Microsoft just really likes universities for some reason. But on the other hand, we did need lots of this stuff and it's very cheap, budgets are tight and it's not as though hand-rolling even more stuff would be cheaper - we do hand roll some things where it makes sense.

For example, periodically senior people say "Why do we spend $$$$ on a supercomputer? Surely we could rent one from the cloud?" and we (well, not me, different group same department) go OK, we will cost that for you. And they get Azure, Google, etc. to quote them for what they need a supercomputer to do, and then they present this, "The Cloud providers can do that for $$$$$". Ah, that's more money. No thanks, we will continue to run our own supercomputer.

It's not even close. Cloud supercomputer is great if you need the supercomputer for six weeks to do a special project and then you're done with it, the Cloud provider saves you a lot of money. But the University needs supercomputers all the time, so the numbers do not work.


GCP gives me an invoice every first of the month, automatically.

It also offers budget caps, but indeed, those are more a warning and not a hard shutdown. That's annoying. Same at microsoft by the way, except indeed that developer credit as a failsafe.

Google gives 100k free credits to universities and startups by the way (and even to individual departmens if you are a big university). You just have to apply and let them bring in trainers and you have to actually use a percentage, otherwise they take it away the next year.


Whats the deal with the MSDOS era limitations for keyvault and storage account names. FFS it has to be unique AND within 3-24 characters consisting of lowercase letters, numbers and dashes. Storage accounts can’t use the dash. Hello? I thought current century DNS names were limited to 60 characters.

It sounds to me some legacy Windows 2000 spaghettini fettuccini is powering some parts of azure.


> I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal

For Cloud to make economic sense, you need to treat it very differently from traditional infrastructure. For example, simply shutting down our Dev environment outside of business hours saves means we're not paying for the compute the majority of the time.


This is why I absolutely avoid using Azure, AWS or GCP for my own side projects. On the company account, sure, it's your money. But I'm not going to risk my savings because I misconfigured a lambda or something.


> I'd be curious to know what's good or bad about it compared to GCP and AWS.

Documentation lies, support lies, metrics lie, bugs everywhere, and when something breaks the status page is always all green and support tries to convince you it's your fault anyway. They're only here to prevent you from enforcing the SLA. The distrust is pervasive. I stopped suspecting my code, if something breaks outside of a planned maintenance it is _always_ Azure.

My latest support ticket: Azure App Service internal DNS server broke and there is no way to bypass it short of hardcoding IPs in /etc/hosts. Support told me that if I wanted App Service to work reliably I had to implement their DNS server myself. To rephrase, my PaaS provider told me to spend time and money to implement the very platform I was paying them for, and it just so happened to be absolutely impossible because of an unannounced BC break a few months prior (which is another lengthy and frustrating story).

This morning I had a VM cut out of the network and 10% of my App Service traffic just disappeared. No explanation, no incident report, nothing.

These days I'm working with AWS, and it just works. If something isn't working you know it's your fault and that the answer is in the documentation. I'm not spending days on workarounds, I'm actually implementing as planned. I have no words to describe the relief I'm feeling.


On the ground, the chatter I've heard from cloud customers and techies who have worked for various cloud companies:

- If you need scale, you pick AWS or Azure (GCP doesn't have the same scale, and is catching up)

- If you are a retailer, you don't pick AWS, because you're a competitor and they'll use whatever nasty (but legal) tricks to eat your lunch money

- Windows stack workloads seem to run better on the AWS virtualization stack

- Linux stack workloads seem to run better on the Azure virtualization stack

- GCP has great integrations/automation/api, AWS is pretty good too

- AWS has great support

- GCP has terrible support

- Azure is somewhere in between the two above in terms of support

It depends what is important to you.

Bonus chatter: Oracle Exadata is an unmatched force to be reckoned with, but OCI as a whole doesn't have their shit together.


Literally the whole reason my last org got into Azure.

Lots of MSSQL and PowerBI licenses, lots of other Windows env features. Great deals to bundle those in w/ Azure deployments.

Great pricing too -- for the first 3 years. But at 4 years...


Governments - local, state and federal, pretty much are captive Microsoft customers, and are eye-balls deep in Microsoft 365 + Azure services.


I know Azure generally sucks... If you think you cannot go lower, you should try Oracle Cloud. That is a total piece of dung of a Cloud Service.

I tried it a couple of years ago. After finishing the trial, I removed all instances and disks, supposedly completely blanking the account. And also supposedly deleted the account.

To this day, I still keep receiving some kind of invoice for about $2 USD that they say I owe. And when I login into the "oracle cloud account" nothing works because my account seems to be half-deleted. (like I get error screens when accessing several of their piece of shit panels).

To make things worse, suddenly I started receiving emails from some of their sales team in Portuguese, I guess that my last name sounds kind of Portuguese so someone say, yeah, you write to him.

And while using their system I was not really impressed. Their cost structure was weirder than AWS (and that's saying something) and to mount a volume in an instance you had to do some funky commands.

I would NEVER trust business technology to that sort of system.


Ive used gcp and ive been billed like 10% of minimal wage for setting GCPs demo with like 7 very simple microservices (i dont remember exactly) 4 times and every of them was running like 5 minutes after being deployed and then project was killed

Shit is expensive as hell

For the same money I could rent some weak linux box for a year

Or something decent for a month

Edit 10ms

https://github.com/GoogleCloudPlatform/microservices-demo


what you show there should cost like 300/month to run. Its very transparent pricing, its just bad that the tutorial doesn't mention that.

You do realize what you setup in that tutorial right? A kubernetes cluster with 11 full scale microservices that are dimensioned so they can serve the average medium size business. For only a hobby this is huuuuuuugely overdimensioned.

If you were to do the same on azure, it would cost more. If you are comparing it to a cheap linux box, what the hell are you using kubernetes clusters for then?


>what you show there should cost like 300/month to run

Ive ran it for 4 times miltiplied by 5minutes + time needed for it to wake up

All im saying is that it is expensive for such a small usage


> You do realize what you setup in that tutorial right?

Sure, buyer beware but is it reasonable that a clearly marked demo project is set up with services to that level of resourcing?

Nobody is going to take a demo like that and start running a business off it tomorrow.


I have to wonder what you were doing, I've been continuously hosting my own projects there for years and with the free tier they cost pennies per month to run.


Ive linked the GCPs demo repo that Ive been messing with


"and with the free tier they cost pennies per month"

Is it free or not?


You pay for the resources you use above the free tier limits. My bill for this month so far is 30 cents because I deployed frequently and my docker artifact storage size (with several years worth of deployments) dipped above the limit. Then I added a periodic job to clear out unused docker images older than one year and I'm running for free again.


Dude, you blindly ran some random code on a metered cloud service.

If somebody gives you keys to a Ferrari, don't blame the manufacturer when you drive it off a cliff at 120 miles/hour...


Its not random code, it is GCPs demo code. And im just saying that it is expensive for such a small usage

Ferrari analogy would be something like being billed 100usd for 1min ride


This is one of the single most comprehensively intense demo projects I've ever seen. I did a multi day AWS Data Lab for work once and it wasn't this comprehensive.


I quite like Microsoft/Azure from a development perspective. If you're running .NET, Application Insights alone is nearly enough to put it above the competition. I appreciate how it integrates with AZD/Teams and the platform as a whole felt much more cohesive than AWS.

The monthly $60-$100 developer credit was fantastic as well. It avoided the usual fighting for approval/budget to test things out.


Application insights is amazing - I didn't realise how amazing until I had to try and achieve the same thing in the JavaScript/Node ecosystem.


Yeah, I'm currently missing it very much running .NET on AWS. It's insane how much it gives you for "free". CloudWatch feels like weak tea in comparison.


We moved from AWS to Azure for other reasons, but in doing so we moved from X-Ray to AppInsights, and the difference was amazing. We're big App Insights fans.


From what I can understand choosing Azure is almost always a top-down decision, especially when it comes to government entities/agencies (I live in Europe). MS has a hell of a sales network.


It's usually a cost decision and AWS don't really care about anything smaller then say the US government enough to even attempt to engage in competitive bidding proposals so if a company/organization put out an RFP MS usually finds a way to look cheaper then AWS.

Add to that that AWS dont really engage in the normal business to business sales process but simple gives you a price list and tells you "thats what it costs" pretty much straight up and it's no surprise a lot of traditional enterprises with huge existing Microsoft bills end up with the vendor they know, understand and think they can control.

It's not that there is anything really wrong with AWS their support is good their products work but it's a messy platform where you really need to pay attention and might even engage with consultant to fully understand what your paying for and how optimization decisions is affecting your ROI as everything is priced individually in AWS where as Azure does a bit more bundling into packages.


As I wrote above. Azure has much better compliance story, especially in smaller countries.


Many companies use Active Directory. The new kid in the block is Azure Active Directory (AAD), which is the evolution of the self-hosted Windows Servers.

Since many companies rely on it, especially for role base access to internal resources, you can't avoid it as a developer/employee.


Azure Active Directory is not Active Directory but on Azure.


You're right, but that's not what they meant (and it's not AAD's trajectory). Microsoft's been adding more and more device management, policies, software rollout, etc. to AAD to bring it up into equal standing with AD and then, eventually, allow most deployments to use just AAD, instead of holding some bulky AD setup of on-prem & cloud.


the people buying these things obviously have no idea about that. Migrating to Okta or something else neutral would cost the same, but hey, that's a different name


not even remotely close. okta for an enterprise is big dollars. most shops already have o365, so the AAD premium tier licensing is already paid for. aad and okta workforce are almost feature parity.


That's just not true. If you know what your doing and using most of microsoft's stack (.NET, etc) it's often quite a breeze.


as if developers have a choice.

Microsoft just has found how to sell Azure: scare compliance teams that AWS and GCP are horrible, especially in EU and banking. Use their office monopoly to give huge discounts if you buy as a bundle, and be awesome on comparison charts. They check all the boxes of services they offer. For an exec, it doesn't count how well those services are executed, thats a developer problem that a system integrator will solve.


> Sorry for being so miserable, but Azure has given me soooo much grief over the last 10 years that I'm just completely done with this shitshow of a platform.

And yet it continues to rake in billions + grow 20-40% month over month (even if it is slowing)


Because it's not the developers that choose the platform and Microsoft knows that.


They know how important developers are, that's why they bought Github. But developer experience is indeed one of the minor points of consideration when choosing a cloud vendor for large enterprise. Customer service, billing and integration into existing infrastructure is much more important.


>They know how important developers are,

Developers developers developers!


I’d choose azure tbh - have used it because of work but am happy to stick with it.


Migrating stuff off it now to AWS (not my stuff). Couldn't agree more. Total shit show.


Curious to hear about the specifics...


Persistent problems between Azure VMs and virtual disks causing unexpected reboots. Complete outages. And don't even start me on ACI (for Windows). It doesn't even work.

In 7 years we had one AWS AZ outage and we didn't even notice because our monitoring platform in there couldn't reach the network (learned something!). But nothing broke. Even the us-east-1 outages didn't affect us.


Were you using Standard HDD disks? They have a really poor SLA, and are only usable for things like stateless VM Scale Sets or otherwise redundant services.

We had to switch everything to SSD to get reliability comparable to on-prem VMware.


No entirely SSD. The problems stopped after a couple of weeks suddenly.


That sounds like what I've seen on Azure. Mystery weird problems we see, but they don't. Often in the network side. One time we were pretty sure they had a bad interface in a LAG group. Massive packet loss between hosts, but only on certain ephemeral source ports, about 1/8 of them.... Support couldn't find any issues even after a few days.

This was circa 2018 but AWS was so much more stable at that time. Ok, US-E-1 AWS had issues from time to time but they acked them and fixed them


Yes the lack of them being able to see any problems was a constant problem.

Our AWS reps are all over stuff when it goes down. I regularly get to talk to actual real product managers and engineers via our enterprise support if anything goes wrong.


What about the people who work for companies use Azure and were not involved in the decision?


They all have little hair left on their heads and I feel somewhat sorry for them.


I welcome your sympathy because Azure constantly makes my life hell.

So many half baked features and legitimate bugs in their platform that they either don’t fix or take years to fix.


This is why we call it the Triangle of Sadness:

              Azure
             /     \
        Azure       \
        DevOps----- Teams


I'm glad I work for a company that uses Office 365 instead of the equivalent of Google or others. I really like Office products, for all their faults they allow me to work more productively than the alternatives. So I don't know why I have less in my head just because I can work well with Excel and Outlook.


You can say that about 50% of the tools used by it people.


It doesn't really matter, at least if you're in the EU.

While Google, Amazon and others were busy complaining about GDPR, Microsoft was busy working on being compliant, with the result that today they're pretty much the only legal/compliant solution in most of the EU.

The more regulated the industry (health, finance, etc), the more you can be certain that it's running on Azure if it's EU based and running in the cloud.


The irony is the amount of money they have thrown at "Dev Advocates" who don't do a god damn thing to advocate for how developers use their platform. Frankly that's because folks that care burn out. I still remember the time a high-up rail-roaded me and lied repeatedly to a VP about the design of a product as I desperately tried to save them from the 5+ years of having to educate users on two different ways to do [basic ops]. Those basic cloud objects of course have major differences in functionality and ecosystem viability depending on what you choose, but this isn't really explained up front either, you find out by building a solution for months and then finding out you have to backtrack and start over the Azure integration. Maybe again.

All to say, I agree wholeheartedly with every word.


Works for me (not right now, but in general).


Of course, there's always someone who will say that -.-

Did Windows ME and Windows Vista also work really great for you?


Windows ME did in fact work mostly fine here too, lol. Relatively speaking for Windows 9x performance, of course. I only used it for a year, not because I couldn't stand it but because such was the pace of major Windows updates back then.

Windows Vista was honestly worse for me, not due to bugs but for being two years ahead the curve of hardware, and GPU vendors seemingly rolling their thumbs during betas and once WDDM¹ went live, they panicked and rolled out alpha quality work. So many driver crashes compounded with the heavy RAM requirements... Other than that, and with less of an UAC nazi, I could see an OS that was similar to what Windows 7 became if I squinted. Hardware had caught up, drivers were mature, and on top Microsoft optimized its performance.

In hindsight, WDDM should've been an update to Windows XP that could be rolled out well in advance and let developers focus on a single thing rather than new OS compatibility on top, and deep changes like UAC.

¹ It was necessary work though: https://en.wikipedia.org/wiki/Windows_Display_Driver_Model


Windows ME was one of my favorite versions of Windows, not being ironic about it either. Its infamy has more to do with how Joe Average uses computers in general.

As for Vista, while I did not use it in its day I can tell its problems were far more to do with crapass hardware manufacturers and their crapass drivers. Vista with access to 7's drivers and hardware runs just fine.


This place I work at has actively fought against using Azure, but we use them because it's advantageous to the business. (or it's perceived to be).

We have actively pushed for AWS or even GCP but it's futile when it doesn't align with business. I'd imagine a lot of developers are facing the same company issues.

Azure is a chore compared to AWS.


One word: compliance.


Office 365 is also affected. And I find Outlook to be one of the better alternatives for business email.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: