Microsoft Azure Outage

spoils19 · on Jan 25, 2023

It's good that Microsoft saved money via layoffs so that it balances out when customers leave Azure. Very forward thinking company.

cryptonym · on Jan 25, 2023

Leave to go where?

On-premise and being miserable having to wait months to get a new server with poor automation, observability and worse outages? To another major cloud provider with similar pricing and outages?

Cloud helped mostly with automation and scaling but if your system is that critical, you should consider a good CDN as load balancer and multi-cloud (or at least multi-region) for actual robustness.

jakewins · on Jan 25, 2023

AWS and GCP both have ~100% uptime in every region for VMs this month. Meanwhile the majority of Azure regions have had various outages in the same period: https://cloudharmony.com/status-of-compute

azfubar · on Jan 25, 2023

Almost certainly due to Azure's broken policy where we have critical change advisory's that block deployments for huge periods of time towards the end of the year because of Black Friday and then holidays. Every team has basically been unable to deploy since the week before Thanksgiving when a surprise CCOA was pushed out by leadership at the behest of a certain big customer... then there was the World Cup and the winter holidays. Nobody could really deploy anything from a week before Thanksgiving until a week after the New Years... almost two months worth of batched changes and every team YOLO button pressing as soon as they could in January.

And now layoffs so everyone is super unmotivated! Excellent stuff going on right now from Microsoft senior leadership.

fomine3 · on Jan 26, 2023

Interesting. Can I see this longer than a month?

heinternets · on Jan 26, 2023

Right after an outage of course it will show like that.

After an AWS outage it would also look non favourably on AWS right?

barbazoo · on Jan 25, 2023

Wow I didn't expect the difference to be so obvious.

throwaway2037 · on Jan 27, 2023

It is weird that this answer was downvoted. I agree. What a great page!

cm2187 · on Jan 25, 2023

Where I worked, the internal approval processes and controls over cloud resources are as lengthly as those for on premise hardware. So that may be the case for small companies but I don't think there is much of a difference in those large bureaucracies.

dx034 · on Jan 25, 2023

Shows that all these availability zones and regions don't really help if an outage can knock out a whole cloud provider. And that's not specific to Microsoft. The only way to really ensure uptime is to use two providers. Sadly, that's basically only possible with on-prem/colocation where traffic is cheap.

sofixa · on Jan 25, 2023

It's mostly Azure though that is badly designed to such an extent that multiple times there have been global outages. In general Azure availability, security (the only major cloud provider with not one but multiple cross-tenant security exploits) and usability are pretty terrible so it shouldn't be used for anything but saying "this is how it should not be done".

GCP had a similar thing once, where a BGP update knocked out their Asian regions.

AWS have never had a global outage. (And no, that time S3 in us-east-1 was down wasn't a global outage, the only customer code/workloads that were impacted was code interacting with S3 that didn't specify the region and had to rely on us-east-1 to determine it, and it didn't work anymore)

Andys · on Jan 25, 2023

To be fair, AWS once had a global Route53 outage, which was effectively a global outage for anyone using AWS for DNS.

snorkel · on Jan 25, 2023

That outage was limited to Route 53 DNS record editing and not DNS lookups.

eurg · on Jan 25, 2023

Do you have a link to an article about that? My google-fu is weak, and this sounds interesting - that should not happen to DNS - at all - and from the outside Route53 looks quite well managed. So what the heck did they do?

codalan · on Jan 25, 2023

It was back in 2019.

https://twitter.com/AWSSupport/status/1186735657387003904

I forget the details. I do remember half of our internal tools not working at the time due to DNS issues, though. Good times.

wereallterrrist · on Jan 25, 2023

Someday someone will write a book about how AD, AAD, etc, exert the control they do at MS and go as unchecked (or at the time) as they do. AD's inability to execute made Azure a significantly less pleasant platform until they finally fixed accounts a couple of years back to properly do OAuth 2.0 with ARM.

Maybe the book is just "AD brings in the money" but wow, they sure bring it down as well. Global outages like that always stink of AD.

throwawaythekey · on Jan 26, 2023

There have been several cloudfront outages that have effectively been semi global outages

nosebear · on Jan 25, 2023

I'm hearing from four different friends from four different companies in Germany that they can't really work right now.

steve1977 · on Jan 25, 2023

If they were relying on Outlook and Teams to be productive, they probably couldn't really work before either.

hnarn · on Jan 25, 2023

What a naive comment. As if the only truly important jobs exist in engineering and require nothing but git and a book on C.

OscarDC · on Jan 25, 2023

I interpreted this comment as more of a jab at how inefficient are outlook and teams themselves as applications.

I don't know if it's the right interpretation to have, but I kind of agreed with it, considering huge issues I had with teams (curiously some of them are only there for linux users, weird when considering the fact that I only use teams' web page) - not saying I could do better though!

choeger · on Jan 25, 2023

Yeah, what BS. Everyone knows that if you have a book on C, you can always quickly implement git yourself.

vikramkr · on Jan 25, 2023

I'm unsure what being in engineering has to do with using outlook and teams?

steve1977 · on Jan 26, 2023

That wasn't my point. But tools like Teams kill more productivity than they enable, at least in my experience. If anything, I was more productive yesterday, because I got disturbed less.

dsign · on Jan 25, 2023

This makes you wonder if some centralization patterns, i.e. Azure AD, are not a national security problem?

ibejoeb · on Jan 25, 2023

Azure AD is a nightmare. I don't know how many of you sign in to multiple tenants in the console, but it generally involves buying a new computer.

teh_klev · on Jan 25, 2023

> I don't know how many of you sign in to multiple tenants in the console, but it generally involves buying a new computer.

This made me laugh out loud. I'm working in a multi-tenant, multi-subscription environment with Azure AD just now. MS force you to use 2FA and I picked the wrong 2FA app.

Now it's completely and utterly comical trying to work out which generated 2FA auth code I need to key in when auth'ing in Visual Studio because there are absolutely no visual cues as to which subscription it's trying to authenticate to. You can't tell VS that "I'm only interested in auth'ing to this particular subscription". Now it prompts me for almost every subscription we use and it's a whack-a-mole experience. They really need to fix the UI/UX in VS for this.

Of course when it comes to mandatory password change time I have to go through this pain all over again.

cjcampbell · on Jan 25, 2023

I’m setting up a system with multiple AAD B2C tenants, so I get the joy of switching back and forth between the primary tenant and the B2C tenants frequently (at least until I can finish automating enough of the B2C provisioning bits).

I don’t yet have enough context to fully evaluate against cognito. It may end up being nice to have B2C as a first class AAD tenant, but until I get far enough along to realize those benefits, there will be a lot more cursing under my breath about the need for another layer of identity and the lack of control plane access through azure resource manager APIs/tooling.

teh_klev · on Jan 25, 2023

I have multiple chrome profiles for this. However, despite switching from one subscription to another to access each different AAD tenant across multiple chrome profiles, it seems that Azure "remembers" the subscription you last accessed, across profiles. It's as if the last subscription you accessed is tagged to your Azure user server side rather being a blob of client side state. This is deeply annoying as well, especially when your sessions expire...

turbokatsu20 · on Jan 26, 2023

Firefox containers is the solution to this headache for almost every multi tenanted service. I used to have it installed only for those tasks when I was working in consultancy.

teh_klev · on Jan 27, 2023

Firefox containers aren't a patch on Chrome profiles (which I did mention I was using). I'd switch to Firefox in a New York minute if they fixed the profile management UX (about:profiles).

Moissanite · on Jan 25, 2023

YES. I have the dubious honor of needing to use at least 4 different Teams tenants over the course of a week and it is enough to make me want to pitch my computer into the sea. App, browser, private browser - doesn't seem to matter. When I try to sign in, Microsoft will pick one of the tenants seemingly at random, regardless of what URL I use, and try to sign me in - of course, since there is usually no visual cue as to which tenant I'm looking at, I just put in a password and pray.

Godel_unicode · on Jan 25, 2023

Use browser profiles, choose a different profile picture for each, then use one profile per tenant. Done.

herio · on Jan 25, 2023

Firefox Multi-Account Containers extension. I couldn't live without it.

ibejoeb · on Jan 25, 2023

I use it, but since the azure portal uses the uri fragment, it still requires constructing the correct url in the correct container. One mistaken url will obliterate the container, and restoring it requires delete windowsazure.com, microsoftonline.com, portal.azure.com, and another one that I can't remember right now.

You'd really have to try to make it so screwy.

It kind of a shame. Like most things, Azure was better when it was smaller. I loved the first version of functions.

gerdesj · on Jan 25, 2023

It involves a lot of private browsing sessions which is actually MS's recommendation!

What a PITA.

throwaheyy · on Jan 25, 2023

Nah I just set up a 2nd browser profile, and they both stay signed in. It’s a breeze.

magicalhippo · on Jan 25, 2023

Or Firefox containers?

alar44 · on Jan 25, 2023

Almost no one, probably.

laacz · on Jan 25, 2023

Critical infrastructure cannot be reliant on a cloud (or internet availability, if possible). In most EU countries that's a law.

deusex_ · on Jan 25, 2023

Do you have any good resource summarizing these laws?

laacz · on Jan 25, 2023

No, sorry. If you are adventorous enough, look at Latvia with Google translate and search for "critical infrastructure" at https://likumi.lv

Art9681 · on Jan 25, 2023

Microsoft has a lesser known Azure Gov Cloud specifically because of this that is disconnected from public Cloud. This includes dedicated staff with vetted secret clearances for access to those systems.

https://azure.microsoft.com/en-us/explore/global-infrastruct...

emptysongglass · on Jan 26, 2023

Can anyone make use of this?

Art9681 · on Jan 26, 2023

No this is specifically for official government business

paganel · on Jan 25, 2023

At this point most probably, yes. Especially as more and more government entities/agencies are moving to the cloud, many of them to Azure (because of MS). I live in Eastern Europe, but I suspect that this migration is happening all around Europe and North America.

ruffrey · on Jan 25, 2023

In the azure portal, it shows a "Routine Unplanned outage" - ??

rossdavidh · on Jan 25, 2023

Well points for honesty, at least. :)

ugh123 · on Jan 25, 2023

I guess thats the 0.0001% of outage for an advertised 99.9999% uptime

funnymony · on Jan 25, 2023

At least they have a sense of humor

idk1 · on Jan 25, 2023

Does this mean they need to rebrand, because it's not up 365 days of the year? Maybe rebrand it to Microsoft 364.5?

altairprime · on Jan 25, 2023

There’s 365.2425 days per year, so a six hour outage is just about 0.2425 hours, which suggests that they remain able to declare 365 when considering this specific outage only.

ericpauley · on Jan 25, 2023

I think the joke always went that they should rename it Microsoft 360.

hobofan · on Jan 25, 2023

Not sure if it's directly related, but GitHub is also experiencing issues: https://www.githubstatus.com/

marvinblum · on Jan 25, 2023

"We are investigating reports of issues with Actions. This looks related to Azure networking issue which is impacting multiple regions. We are seeing improvements and will continue to monitor this."

ricc · on Jan 25, 2023

GH has been a Microsoft company since 2018...

quickthrower2 · on Jan 25, 2023

Good to see GH is eating the dog food

kgdinesh · on Jan 25, 2023

At work, we all got kicked out of a teams meeting an hour back and sending/receiving e-mails on Outlook seems to be slow.

Location: Chennai, India

midasz · on Jan 25, 2023

This is going to be the most productive day ever

sli · on Jan 25, 2023

Every Azure product I've had to use has been lousy in every possible way. Azure DevOps at my last employer was a nightmare and nobody in the company liked it, not even the managers who decided on it.

BLKNSLVR · on Jan 25, 2023

I've been learning / using DevOps for the past four months and find it "quite good", and have previously used Jira, although not in great detail.

I'm making the effort to learn it in increasing detail as it's the company-wide chosen system. I'm interested to know what made / makes it a nightmare for anyone else.

(And I'm no fan of Microsoft as a whole)

telcal · on Jan 26, 2023

I use Azure DevOps daily and honestly have no issues, it works well. What didn't work for you?

reset-password · on Jan 25, 2023

I have some Azure services that are not able to consistently make outbound HTTP requests to my heartbeat monitoring service so I'm getting alert after alert this morning. This is just the nudge I needed, and I'll be moving the whole thing to Linode later this afternoon.

alkonaut · on Jan 25, 2023

Wouldn't it be quite simple to set up an unofficial status page that just pings some relevant services and if they have a disastrous outage at least, it shows it?

Because I think it's clear that their status page is useless and "manual".

alkonaut · on Jan 25, 2023

It comes and goes. Teams and Azure DevOps some times works perfectly for a few minutes, then responds with all 503's for a few minutes.

saikatsg · on Jan 25, 2023

> We've identified a potential networking issue and are reviewing telemetry to determine the next troubleshooting steps. You can find additional information on our status page at https://msft.it/6011eAYPc or on SHD under MO502273.

ricardobayes · on Jan 25, 2023

I'm so surprised by MS's strategy for using random domains and TLD's, this certainly don't make it easy for phishing avoidance.

noinsight · on Jan 25, 2023

If you implement an allowlisting proxy, the number of required domains for M365 / Azure is something like 120 [1]. Google basically requires three, tunnel.cloudproxy.app, *.google.com and *.googleapis.com. Amazon requires *.aws.amazon.com, *.amazonaws.com, *.awsstatic.com, *.api.aws and *.aws.dev.

Microsoft has some great domain planning.

[1] https://learn.microsoft.com/en-us/microsoft-365/enterprise/u...

ricardobayes · on Jan 25, 2023

My point is MS uses a lot of unrelated domains that are very different from the main brand, even the one above looks dodgy (msft[.]it) From your list, microsoftonline-p[.]com is an official domain, but it looks like a typosquat. I think it's quite far from "great domain planning".

adql · on Jan 25, 2023

> I think it's quite far from "great domain planning".

The poster saying they have 120 of them would imply that being sarcasm

robertlagrant · on Jan 25, 2023

They appear to be being sarcastic. I don't think anyone would be seriously saying 120 is better than 3 or 6 domains.

tenplusfive · on Jan 25, 2023

Luckily Microsoft also provides a service for that: Safelinks https://learn.microsoft.com/en-us/microsoft-365/security/off...

Also a personal favorite of mine: http://microsft.com (not entirely sure if its just to prevent typosquatting or if this is actually used in some products)

luckylion · on Jan 25, 2023

I don't know whether it's a typo but https://support.microsoft.com/en-us/topic/contact-us-91f63b4... lists "EOC: criskgro@microsft.com (For CEE and MEA)" under the Microsoft Credit Services. It feels like a typo, but who knows. If they don't have anything in place to catch this type of error, it's probably a good idea to register every domain someone could accidentally type.

fomine3 · on Jan 26, 2023

There's no MX record on the domain so it seems to typo

jiggawatts · on Jan 25, 2023

microsft.com was used specifically for telemetry to bypass web proxy blocks for *.microsoft.com put in by administrators of secure networks.

I know this because I was one of those admins trying to plug the leaks.

Windows 10 + Office uses 200+ domains just for Microsoft stuff, of which something like 120 are for telemetry.

ridgered4 · on Jan 25, 2023

And I imagine they add new domains with updates all the time.

At home I was trying to avoid random reboots from updates in a full proof way in a Windows VM that ran long processing tasks. I determined the only reasonable course of action was to remove all internet access. Stamping out the massive list of changing domains (and hard coded ip addresses?) would just be to much work that I know I would never keep up with.

A white list might work.

I mused that you could have a constantly updating Windows machine and monitor all of its connections, adding them to a block list on an external firewall but in addition to being complex to setup I bet it wouldn't even catch everything.

zerohp · on Jan 25, 2023

Yet people continue to defend Microsoft's telemetry practices. The OS won't let you opt out without it fighting you and they'll even fight you for blocking it on the network.

Windows is spyware.

joecool1029 · on Jan 25, 2023

.it ccTLD is especially bad. Almost all of the generated SEO spam links to malicious ad networks I get on search pages are usually .it domains, all written in machine english, not italian. Thanks for reminding me and discovering -site:.it works in search queries to filter it out.

Tepix · on Jan 25, 2023

Makes sense to use a different domain if everything is down because it could also effect DNS for the main domain.

wiradikusuma · on Jan 25, 2023

I think what the OP saying is, if you have multiple random domains, how would people know which ones are legit (or not)? Say I have mixxxrosoft.com, how would you know this is one of MS' official domains?

latchkey · on Jan 25, 2023

It is often very difficult to test networking changes in production. For example, firewall rules. What sort of tools do people use for this?

ChickeNES · on Jan 25, 2023

Does the Internet Archive use Azure? archive.org is throwing 503s

voytec · on Jan 25, 2023

Two weeks ago they were affected by the Elasticsearch outage[1], too.

[1] https://news.ycombinator.com/item?id=34337518

braymundo · on Jan 25, 2023

DuckDuckGo is also affected (blank search results).

ochrist · on Jan 25, 2023

https://downdetector.dk/ indicates several MS products and services are having problems. Here is the status from MS on Twitter: https://twitter.com/MSFT365Status/status/1618149579341369345 Edit: Added this link which apparently is the new status page and seems to be updated: https://status.office365.com/

danjc · on Jan 25, 2023

Auth via Microsoft ID is degraded, our platform is blipping (cache retries, message retries due to packet loss), access to the Azure portal is degraded and the Azure status page isn't loading consistently.

LilBytes · on Jan 25, 2023

Nothing is working for me, Oceania/Australia.

Including O365, Azure, Azure Devops.

kornish · on Jan 25, 2023

Ah - so that's why GitHub Actions are unreliable right now.

Benjamin_Dobell · on Jan 25, 2023

Glad it wasn't just me. I was waiting over 10 minutes for a hosted runner.

quickthrower2 · on Jan 25, 2023

Such a late 2010s / 2020s problem :-(

adql · on Jan 25, 2023

Office359 strikes again

Yuioup · on Jan 25, 2023

You mean Office364

DoctorDabadedoo · on Jan 25, 2023

Everyone deserves a break between Christmas and New Years, even the folks at MS! /s

wrldos · on Jan 25, 2023

0<Office<365

cube00 · on Jan 25, 2023

> The issue is causing impact in waves, peaking approximately every 30 minutes.

Does anyone have any general ideas on what kind of outage manifests itself like this? Devices retrying to authenticate every 30 minutes and finding the service is down perhaps?

urbandw311er · on Jan 25, 2023

Can sometimes be scaling/monitoring loops. i.e. cluster comes up, provides some limited service, gets overloaded and drops below required performance metric, gets killed by monitoring/scaling system, repeat...

osivertsson · on Jan 25, 2023

Many games that use Azure PlayFab are down as well due to this. Both PlayFab services and PlayFab MPS game-server hosting are currently broken.

https://status.playfab.com/

kemals · on Jan 25, 2023

ThousandEyes public outage map shows the scale of the Office365 outage: https://www.thousandeyes.com/outages/

hansamann · on Jan 25, 2023

DuckDuckGo.com - no search results showing up at all... are they on Azure?

pred_ · on Jan 25, 2023

Yes.

    $ dig +short duckduckgo.com | xargs whois | grep Organization
    Organization:   Microsoft Corporation (MSFT)

atom058 · on Jan 25, 2023

They get their search results from Bing

jupiterblues- · on Jan 25, 2023

Minecraft, Asure, Office 365, etc... MS cloud services have issue

asim · on Jan 25, 2023

Cloud is the new power grid. When it goes down, we lose power to everything. Will we learn from the grid and decentralise some of the compute and cloud services?

neversaydie · on Jan 25, 2023

Seeing problems with Azure DevOps in Western Europe here, can't open most pages/log in. Teams and Office appear to be working fine.

brodo · on Jan 25, 2023

Teams and Outlook not working fine here.

ntp85 · on Jan 25, 2023

Same here in Germany. Even microsoft.com times out at the moment.

maxaigner · on Jan 25, 2023

Reported issues with Teams, Microsoft 365, etc

saikatsg · on Jan 25, 2023

Teams is working now for me. However, all my notification preferences got reset!

skc · on Jan 25, 2023

Had a few dropped calls in Teams over here this morning (South Africa), otherwise our devops stuff is currently fine.

klaude · on Jan 25, 2023

Anyone having problems with Azure too?

ensocode · on Jan 25, 2023

yes here. storage, db, apis - its not permanent but still persisting. It can be monitored at the azure status page as well https://status.azure.com/status

oars · on Jan 25, 2023

LinkedIn seems to be struggling as well. Lots of latency, page loads are taking 10-20 seconds for me.

stephencoyner · on Jan 25, 2023

I did notice chatgpt was down earlier, but it could have been heavy usage caused

hansamann · on Jan 25, 2023

duckduckgo is not showing any results right now... are they on Azure, too?

zidad · on Jan 25, 2023

Most likely, because duckduckgo partially depends on Bing

markuman123 · on Jan 25, 2023

russia? shut down a service and halt the productivity of most companies in the west...because most companies moved to azure ad and teams.

deathanatos · on Jan 25, 2023

> russia?

Oh please. Azure is plenty capable of taking themselves offline on their own.

swarnie · on Jan 25, 2023

I'm not sure Russia is as capable as you've all spent the last few decades making out....

generalizations · on Jan 25, 2023

Anyone else remember the bad Windows Defender virus signature they put out on Friday the 13th a couple weeks ago? Microsoft is not having a good start to their year.

quickthrower2 · on Jan 25, 2023

Did we finally exhaust IPv4? /jk

mensetmanusman · on Jan 25, 2023

Hope the Leopard tanks aren’t running azure…

NKosmatos · on Jan 25, 2023

What's the point of having a status page if it doesn't indicate the issues? https://status.azure.com/en-us/status

Azure, Teams, Outlook are almost down from Greece and Germany, and their status page shows that everything is fine :-)

VyseofArcadia · on Jan 25, 2023

When anything on that page turns not-green, there are news stories about it. Not positive ones. So exec approval is needed, because the decision to flip something on that page is ultimately the decision to cause stories negative to MS to be published. The exec has to weigh whether pissing off the customers (by failing to acknowledge reality) is worth the bad press and SLA fallout.

ctvo · on Jan 25, 2023

It has nothing to do with press. This is negative press already, and journalist can use this to write their stories without waiting for the official light to go from green to yellow.

It's about contractual obligations and SLAs. Things are not officially down in most agreements until MSFT acknowledges they're down. Refunds issued because your blob storage failed to meet 99.9999 uptime to your largest customers are directly tied to these statuses.

Enginerrrd · on Jan 25, 2023

I'm not going out of my way to be hyperbolic or anything here, but that sounds suspiciously like "fraud" to me.

ctvo · on Jan 25, 2023

I don't think they're committing fraud.

I think it's an important enough page that it can't be automated. It needs a manual approval from a human, for the very basics, like even if the status reporting system is operating correctly, because of various downstream effects.

PenguinCoder · on Jan 25, 2023

Which means it's not a status page any more. Defeating the supposed purpose.

cutemonster · on Jan 25, 2023

"SLA refund page"?

maushu · on Jan 25, 2023

The point is PR. Never trust a status page if it's not directly connected to the monitoring system.

wrldos · on Jan 25, 2023

They never attach it to the monitoring because monitoring systems usually generate a lot of false positives which affect their published SLA.

polack · on Jan 25, 2023

Then they should have a "?" status that can be triggered by automated systems that acknowledge that it looks to be an issue but that they are manually investigating.

If it's a false positive they just resolve it without it affecting SLA and if it's a real problem then us customers wouldn't have to debug our own stack for 2 hours before Microsoft informs us that they are the problem.

EDIT: Wonder how many man-years of extra debugging work their non-working status page have caused the customers.

jhoechtl · on Jan 25, 2023

They never attach it to the monitoring because monitoring systems usually generate a lot of correct positives which affect their published SLA.

Works equally well. See the point?

mdip · on Jan 25, 2023

Which means if one were to require monitoring and status pages to be connected, one of two things happen (for each monitored component):

(1) The monitoring system would be altered to ignore tests that return false positives (at the expense of missing the alert when it represents an outage).

(2) Fixing the monitoring. It wasn't working for the sysadmins/operators, anyway, since it had so many false positives that their "mental model" was essentially based on (1), anyway.

At least, where I've forced the issue of doing just this, that's exactly what happened. At the end of the day, especially since SLAs took a hit and that affected bonus payouts, monitoring got a lot better -- as did overall team function when we truly realized how bad things were -- we stopped doing workarounds and started fixing problems at a more fundamental level which led to SLAs that were both accurate and excellent.

It helped bring attention to a hidden problem which resulted in time being allocated to fix tests that dropped constant false-positives and to evaluate each for whether or not it should exist in the first place.

steveBK123 · on Jan 25, 2023

Which impacts economics because some customers surely got deals guaranteeing some amount of credits based on up/downtime as reported by the status page.

And so updates to the status page become political and locked behind senior management approvals.. like AWS.

mirekrusin · on Jan 25, 2023

Yeah, that's why SLA reports never include <30m downtimes, convenient truth bending.

edf13 · on Jan 25, 2023

It's updated now - updates for service outages at this level generally need signoff form someone higher up the chain

saghm · on Jan 25, 2023

Someone has to "approve" the status pages showing what's actually happening? From a customer perspective, it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues. It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception.

vineyardmike · on Jan 25, 2023

> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one

Thats not the goal.

> It's hard to see how the goal here could be anything other than trying to add plausible deniability for what would otherwise be obvious deception

Thats the goal. The "status page" is considered the source of truth for most of the big contracts. If status-page=OK then your contract with them isn't violated. So changing the status page is a big deal, with real financial implications. The status page isn't a view into the SRE's tickets, its a declaration that the service isn't being provided.

mattclarkdotnet · on Jan 25, 2023

Utter rubbish. Major contracts have account managers and it all gets hashed out 1-1.

squeaky-clean · on Jan 25, 2023

Don't know why this was downvoted. We've definitely been able to provide proof of an outage when the status page showed otherwise and get a refund in the form of server credits by contacting them directly. For all 3 big vendors, AWS, Azure, GCP

RajT88 · on Jan 25, 2023

Agree here as well. It's usually not that hard to provide based on the many, many metrics Azure resources emit that their SLA was breached.

What might be happening is that there is fine print you have to read and be in compliance with in order to be eligible for the SLA.

For example, look at all the conditions which have to be met for a breach of VM SLA in Azure:

https://azure.microsoft.com/en-us/support/legal/sla/virtual-...

Hidden in the SLA details is typically hints on how you can become more resilient in the cloud. So it pays to read the SLA details and really deeply understand what they are telling you.

oefrha · on Jan 25, 2023

Exec approval for showing major outages on status dashboard is pretty much standard practice across large companies. The main differentiator is whether it’s approved within five minutes or two hours.

remus · on Jan 25, 2023

> it seems far worse to have status pages fail to reflect actual outages than to have them accidentally report an outage when there isn't one because no one really cares about what the status page says if they're not having issues.

I disagree. What if you're having issues and the status page is incorrectly reporting an incident? It would be easy to waste a load of time waiting for the status page to sort itself out, only to find out you've still got an issue.

UK-AL · on Jan 25, 2023

You can't approve a fact.

hdjjhhvvhga · on Jan 25, 2023

As others noted, the so-called "status" pages of big service providers don't serve to reflect reality but to shape it. For actual status you need to consult independent monitoring services.

2Gkashmiri · on Jan 25, 2023

well.... if that fact can be delayed by just a tiny bit... that's enough

alkonaut · on Jan 25, 2023

But shouldn't the individual service dots be automatically turning another color than green? I mean it's an automated service status page, right? Whether there is a human message at the top and that can take some time I understand.

luckylion · on Jan 25, 2023

No, it's not automated. I'm sure the underlying tech is automated, but once companies grow beyond a certain size, it needs a human to say "show this status change to the world" because there are lots of things depending on it (e.g. SLAs, but also bonuses, I assume), so they don't want a potential bug in the status system to influence that.

It's weird how slow they are with manual sign-off though.

adql · on Jan 25, 2023

I haven't seen any SLA deal that says the status page must show 99.9% uptime...

cm2187 · on Jan 25, 2023

No but if msft’s own status page shows downtime more than 0.01% of the time msft will struggle to argue they haven’t breached their SLA, so financial consequences to the company.

alkonaut · on Jan 25, 2023

But I don’t want the page connected to their bonuses or SLA’s I just want to know whether they are having any issues anywhere. And I need to know within a minute of my own service not working so I’m not chasing the wrong thing. This can’t be an unreasonable thing to ask for?

luckylion · on Jan 25, 2023

I agree. I'm already annoyed at Hetzner with their 5 minute lag in reporting network outages where I'm regularly noticing them, investigating, checking status and then only after a few minutes see them updating and saying "it's us".

If you work with Microsoft, you might as well spend a few bucks extra and have an external monitoring system monitor Microsoft's systems so you get real-time third-party confirmation when your monitoring alerts you of issues concerning your system. It's the price you pay for scale, I guess. More money involved = more lawyers involved = more accountants involved = more MBAs involved = more corporate bullshit.

copperroof · on Jan 25, 2023

An automated one would be red 100% the time.

steve1977 · on Jan 25, 2023

Maybe they couldn't update the status page due to the network outage.

I'm joking, but...

867-5309 · on Jan 25, 2023

or, they could be automated and transparent

nikau · on Jan 25, 2023

Then some group scrapes the uptime of their competitors page and reports that "competitor is x times more reliable than transparent co"

Yuioup · on Jan 25, 2023

That's what happens when you don't have an independent party that keeps tabs on this.

SkyPuncher · on Jan 25, 2023

We've concluded that status pages are a complete joke.

berkut · on Jan 25, 2023

status.office.com had been down for 15 mins, but it's back up now...

belter · on Jan 25, 2023

We have been here before...HN is the only status page that matters.

funnymony · on Jan 25, 2023

Its public relations page.

dustedcodes · on Jan 25, 2023

Azure is the most developer hostile cloud environment. I have zero sympathy for people being affected by this because if you voluntarily use Azure then this is what you deserve. Sorry for being so miserable, but Azure has given me soooo much grief over the last 10 years that I'm just completely done with this shitshow of a platform.

zufallsheld · on Jan 25, 2023

> I have zero sympathy for people being affected by this because if you voluntarily use Azure then this is what you deserve

I guess many developers do not use Azure voluntarily but are forced to by their companies (or customers).

adql · on Jan 25, 2023

We're migrating on Teams because of that kind of reasoning.

It's utter shit of a service. Even worse if you need to write integrations for it

cutemonster · on Jan 25, 2023

What will you use instead? What don't you like about teams (or wiring integrations)

dekerta · on Jan 25, 2023

Our work switched from Slack to Teams after an acquisition, and I can confidently say that Teams is just complete garbage compared to Slack.

- The interface is laggy

- Scrolling back in long messages is buggy, it often skips around and loses its place

- No built in "whiteboarding" tools in screen sharing

- Teams will often keep ringing on my phone for up to a minute after I picked up a call on my laptop

- Sometimes I can't click reactions on messages. I click the emoji and nothing happens

Overall, it's just poorly made software. It feels like something that was made by a couple of interns in their spare time, not a keystone product from a multi-billion dollar company

HideousKojima · on Jan 25, 2023

Also for several weeks recently my phone was getting messages several minutes before they showed up on my laptop, and 3 or 4 of my coworkers (all remote and in various parts of the US) confirmed they were having the same issue.

cutemonster · on Jan 26, 2023

Thanks (both of you). (Then I understand better)

antihero · on Jan 25, 2023

And the companies are forced to due to huge contracts with Microsoft

ngrilly · on Jan 25, 2023

Nothing is forcing companies to sign an Azure contract with Microsoft, and go with AWS or GCP instead. Perhaps they are just doing something right. But I didn't use Azure myself. I'd be curious to know what's good or bad about it compared to GCP and AWS.

tialaramex · on Jan 25, 2023

For a development team, here's an example of something good about Azure: Microsoft gives us dev accounts with monthly Azure credits (e.g. $100) and you cannot spend more when those credits run out because there is no credit card etc. behind that account to charge the excess.

Azure just like other cloud services (I've used AWS but as I understand it GCP is the same) doesn't believe in timely billing. You can and will receive charges against an account for services that were turned off yesterday, the day before, even last week, as gradually billing catches up to reality. This means that there is no way to actually cap a budget. If you decide "Once this costs $100 I'm turning it off" you are not capping your expense at $100, after you turn it off charges keep arriving, I've seen a week later and I wouldn't be surprised if it can be longer. Should they do that? Well, even if they shouldn't, good luck making them stop.

But with the "free" Azure credits that have no money behind them, when it drops dead Microsoft eats all the residual charges that will be discovered days or weeks later, because there is no other party for them to bill.

I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal, but the University gets an enormous discount, for obvious reasons, and if the other cloud vendors don't want to offer actual billing it does feel like they deserve the consequences.

voytec · on Jan 25, 2023

> For a development team, here's an example of something good about Azure: Microsoft gives us dev accounts with monthly Azure credits (e.g. $100)

First analogy I thought of were stories about drug dealers giving away free samples to schoolchildren to hook them up before asking for money.

tialaramex · on Jan 25, 2023

Sure, it's obvious why they do this. Unlike drug dealers (who don't actually give school kids free crack, that makes no economic sense) it does make sense for Microsoft to ensure every kid who knows how to do rudimentary word processing knows Word, etc.

Nobody is under any illusion that Microsoft just really likes universities for some reason. But on the other hand, we did need lots of this stuff and it's very cheap, budgets are tight and it's not as though hand-rolling even more stuff would be cheaper - we do hand roll some things where it makes sense.

For example, periodically senior people say "Why do we spend $$$$ on a supercomputer? Surely we could rent one from the cloud?" and we (well, not me, different group same department) go OK, we will cost that for you. And they get Azure, Google, etc. to quote them for what they need a supercomputer to do, and then they present this, "The Cloud providers can do that for $$$$$". Ah, that's more money. No thanks, we will continue to run our own supercomputer.

It's not even close. Cloud supercomputer is great if you need the supercomputer for six weeks to do a special project and then you're done with it, the Cloud provider saves you a lot of money. But the University needs supercomputers all the time, so the numbers do not work.

dachryn · on Jan 25, 2023

GCP gives me an invoice every first of the month, automatically.

It also offers budget caps, but indeed, those are more a warning and not a hard shutdown. That's annoying. Same at microsoft by the way, except indeed that developer credit as a failsafe.

Google gives 100k free credits to universities and startups by the way (and even to individual departmens if you are a big university). You just have to apply and let them bring in trainers and you have to actually use a percentage, otherwise they take it away the next year.

irusensei · on Jan 27, 2023

Whats the deal with the MSDOS era limitations for keyvault and storage account names. FFS it has to be unique AND within 3-24 characters consisting of lowercase letters, numbers and dashes. Storage accounts can’t use the dash. Hello? I thought current century DNS names were limited to 60 characters.

It sounds to me some legacy Windows 2000 spaghettini fettuccini is powering some parts of azure.

InsomniacL · on Jan 25, 2023

> I work for a University, I suspect that if you paid full price for these services it makes no economic sense, a $100 Azure credit that cost $100 is a bad deal

For Cloud to make economic sense, you need to treat it very differently from traditional infrastructure. For example, simply shutting down our Dev environment outside of business hours saves means we're not paying for the compute the majority of the time.

danjac · on Jan 25, 2023

This is why I absolutely avoid using Azure, AWS or GCP for my own side projects. On the company account, sure, it's your money. But I'm not going to risk my savings because I misconfigured a lambda or something.

l-p · on Jan 25, 2023

> I'd be curious to know what's good or bad about it compared to GCP and AWS.

Documentation lies, support lies, metrics lie, bugs everywhere, and when something breaks the status page is always all green and support tries to convince you it's your fault anyway. They're only here to prevent you from enforcing the SLA. The distrust is pervasive. I stopped suspecting my code, if something breaks outside of a planned maintenance it is _always_ Azure.

My latest support ticket: Azure App Service internal DNS server broke and there is no way to bypass it short of hardcoding IPs in /etc/hosts. Support told me that if I wanted App Service to work reliably I had to implement their DNS server myself. To rephrase, my PaaS provider told me to spend time and money to implement the very platform I was paying them for, and it just so happened to be absolutely impossible because of an unannounced BC break a few months prior (which is another lengthy and frustrating story).

This morning I had a VM cut out of the network and 10% of my App Service traffic just disappeared. No explanation, no incident report, nothing.

These days I'm working with AWS, and it just works. If something isn't working you know it's your fault and that the answer is in the documentation. I'm not spending days on workarounds, I'm actually implementing as planned. I have no words to describe the relief I'm feeling.

RajT88 · on Jan 25, 2023

On the ground, the chatter I've heard from cloud customers and techies who have worked for various cloud companies:

- If you need scale, you pick AWS or Azure (GCP doesn't have the same scale, and is catching up)

- If you are a retailer, you don't pick AWS, because you're a competitor and they'll use whatever nasty (but legal) tricks to eat your lunch money

- Windows stack workloads seem to run better on the AWS virtualization stack

- Linux stack workloads seem to run better on the Azure virtualization stack

- GCP has great integrations/automation/api, AWS is pretty good too

- AWS has great support

- GCP has terrible support

- Azure is somewhere in between the two above in terms of support

It depends what is important to you.

Bonus chatter: Oracle Exadata is an unmatched force to be reckoned with, but OCI as a whole doesn't have their shit together.

red-iron-pine · on Jan 25, 2023

Literally the whole reason my last org got into Azure.

Lots of MSSQL and PowerBI licenses, lots of other Windows env features. Great deals to bundle those in w/ Azure deployments.

Great pricing too -- for the first 3 years. But at 4 years...

Alupis · on Jan 25, 2023

Governments - local, state and federal, pretty much are captive Microsoft customers, and are eye-balls deep in Microsoft 365 + Azure services.

xtracto · on Jan 25, 2023

I know Azure generally sucks... If you think you cannot go lower, you should try Oracle Cloud. That is a total piece of dung of a Cloud Service.

I tried it a couple of years ago. After finishing the trial, I removed all instances and disks, supposedly completely blanking the account. And also supposedly deleted the account.

To this day, I still keep receiving some kind of invoice for about $2 USD that they say I owe. And when I login into the "oracle cloud account" nothing works because my account seems to be half-deleted. (like I get error screens when accessing several of their piece of shit panels).

To make things worse, suddenly I started receiving emails from some of their sales team in Portuguese, I guess that my last name sounds kind of Portuguese so someone say, yeah, you write to him.

And while using their system I was not really impressed. Their cost structure was weirder than AWS (and that's saying something) and to mount a volume in an instance you had to do some funky commands.

I would NEVER trust business technology to that sort of system.

hardware2win · on Jan 25, 2023

Ive used gcp and ive been billed like 10% of minimal wage for setting GCPs demo with like 7 very simple microservices (i dont remember exactly) 4 times and every of them was running like 5 minutes after being deployed and then project was killed

Shit is expensive as hell

For the same money I could rent some weak linux box for a year

Or something decent for a month

Edit 10ms

https://github.com/GoogleCloudPlatform/microservices-demo

dachryn · on Jan 25, 2023

what you show there should cost like 300/month to run. Its very transparent pricing, its just bad that the tutorial doesn't mention that.

You do realize what you setup in that tutorial right? A kubernetes cluster with 11 full scale microservices that are dimensioned so they can serve the average medium size business. For only a hobby this is huuuuuuugely overdimensioned.

If you were to do the same on azure, it would cost more. If you are comparing it to a cheap linux box, what the hell are you using kubernetes clusters for then?

hardware2win · on Jan 25, 2023

>what you show there should cost like 300/month to run

Ive ran it for 4 times miltiplied by 5minutes + time needed for it to wake up

All im saying is that it is expensive for such a small usage

cube00 · on Jan 25, 2023

> You do realize what you setup in that tutorial right?

Sure, buyer beware but is it reasonable that a clearly marked demo project is set up with services to that level of resourcing?

Nobody is going to take a demo like that and start running a business off it tomorrow.

hobo_mark · on Jan 25, 2023

I have to wonder what you were doing, I've been continuously hosting my own projects there for years and with the free tier they cost pennies per month to run.

hardware2win · on Jan 25, 2023

Ive linked the GCPs demo repo that Ive been messing with

gerdesj · on Jan 25, 2023

"and with the free tier they cost pennies per month"

Is it free or not?

hobo_mark · on Jan 25, 2023

You pay for the resources you use above the free tier limits. My bill for this month so far is 30 cents because I deployed frequently and my docker artifact storage size (with several years worth of deployments) dipped above the limit. Then I added a periodic job to clear out unused docker images older than one year and I'm running for free again.

Yuioup · on Jan 25, 2023

Dude, you blindly ran some random code on a metered cloud service.

If somebody gives you keys to a Ferrari, don't blame the manufacturer when you drive it off a cliff at 120 miles/hour...

hardware2win · on Jan 25, 2023

Its not random code, it is GCPs demo code. And im just saying that it is expensive for such a small usage

Ferrari analogy would be something like being billed 100usd for 1min ride

ZeroCool2u · on Jan 25, 2023

This is one of the single most comprehensively intense demo projects I've ever seen. I did a multi day AWS Data Lab for work once and it wasn't this comprehensive.

Jochim · on Jan 25, 2023

I quite like Microsoft/Azure from a development perspective. If you're running .NET, Application Insights alone is nearly enough to put it above the competition. I appreciate how it integrates with AZD/Teams and the platform as a whole felt much more cohesive than AWS.

The monthly $60-$100 developer credit was fantastic as well. It avoided the usual fighting for approval/budget to test things out.

bennyelv · on Jan 25, 2023

Application insights is amazing - I didn't realise how amazing until I had to try and achieve the same thing in the JavaScript/Node ecosystem.

Jochim · on Jan 25, 2023

Yeah, I'm currently missing it very much running .NET on AWS. It's insane how much it gives you for "free". CloudWatch feels like weak tea in comparison.

CWuestefeld · on Jan 25, 2023

We moved from AWS to Azure for other reasons, but in doing so we moved from X-Ray to AppInsights, and the difference was amazing. We're big App Insights fans.

paganel · on Jan 25, 2023

From what I can understand choosing Azure is almost always a top-down decision, especially when it comes to government entities/agencies (I live in Europe). MS has a hell of a sales network.

Stranger43 · on Jan 25, 2023

It's usually a cost decision and AWS don't really care about anything smaller then say the US government enough to even attempt to engage in competitive bidding proposals so if a company/organization put out an RFP MS usually finds a way to look cheaper then AWS.

Add to that that AWS dont really engage in the normal business to business sales process but simple gives you a price list and tells you "thats what it costs" pretty much straight up and it's no surprise a lot of traditional enterprises with huge existing Microsoft bills end up with the vendor they know, understand and think they can control.

It's not that there is anything really wrong with AWS their support is good their products work but it's a messy platform where you really need to pay attention and might even engage with consultant to fully understand what your paying for and how optimization decisions is affecting your ROI as everything is priced individually in AWS where as Azure does a bit more bundling into packages.

shp0ngle · on Jan 25, 2023

As I wrote above. Azure has much better compliance story, especially in smaller countries.