Hacker News new | past | comments | ask | show | jobs | submit login
Multiple airlines disrupted due to Microsoft Azure outage (nytimes.com)
371 points by panarky 6 months ago | hide | past | favorite | 121 comments



This rollercoaster is not over yet. There's a crowdstrike issue causing windows machines/servers to brick globally and this industry is heavily windows dependent. It may or may not be related to the Azure issue but it's suspicious to me.

https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_e...


HN discussion here - needs to be on the front page - here in NZ ATMs, supermarkets, satellite TV channels are all down

https://news.ycombinator.com/item?id=41002195


Almost certainly Azure using Crowdstrike on Windows in one way or another.

Not surprising that AWS and GCP don't seem to be hit as they wouldn't run anything on Windows, unlike Azure, who I'm sure are forced to do so under MS' infamous interdepartmental structure.


Ugh.

Though I can image there's a Azure market for "Citrix server" kinda thing in the cloud

(or maybe it's SaaS - Solitaire as a Service)


Europe will wake up to flood of problems as well. This needs to be at the top of HN. We are experiencing multiple issues here in EU.

This issue feels extremely widely spread.

Maybe we don't hear ppl complaining because we're offline? :)


Berlin airport is down


Just in time for the beginning of school holidays in Berlin. Today, they expected 86.000 travelers in BER alone (https://www.airliners.de/flughafen-ber-erwartet-rekordzahl-p...)


A few others as well, it seems.


Ryanair is unable to check in passengers online. You can check in at the airport.


I had to physically stand in a queue for about 8 hours for a Ryanair customer support desk in an airport when the airport runway was closed by 1-2cm of snow.

I forget the exact timing and can't be bothered to look up my notes, but it was something like 11pm to 7am at the origin airport for a flight that was supposed to have landed at the destination around 8pm, as we were also stuck on the runway for an hour or so and even getting that far had been delayed.

The replacement flight the next day was also cancelled even though the airport was open.

I ended up taking a ferry and a train, and that was still simultaneously faster than the next available Ryanair replacement flight and cheaper than any other provider on short notice. Fortunately I had an understanding boss who didn't mind me arriving 4 days later than expected, and also a place to crash for free while working out the best route home.


it's already noon in central europe and yes everythings fucked. except for the linux powered companies


Maybe they have rolled back the update and Windows boxes in Europe are no longer pulling it?


They did around 8:30 CEST (6:30 GTM) as I understand it. Some of our servers managed to unbork themselves after a number of boot loops, but not all.


I have never heard of crowdstrike. Is that some kind of antivirus? How is that related to PCs not booting? And why does it affect so many PCs if I've never heard of it? I'm so confused


It's enterprise anti-malware that [in addition to other bits] has a client component installed on all PCs in the corporate network. An update to that client component (called an "endpoint") is causing those Windows machines to BSOD.

It's unlikely you'd have heard of it unless you've worked at a large enterprise that runs primarily Microsoft IT.

Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.


> Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

The problem seems to be in a device driver installed by Crowdstrike - so I'm guessing whatever the bug is, it's specific. to their Windows product.


Windows complains about some page fault or something in a file name csagent.sys. On my machine this file hasn't changed in several days, but the issue only happened this morning like for everyone else.

This looks suspiciosly a case of "let's download random crap from the web and run it in kernel space. what could possibly go wrong?"


I've never seen a non-Windows machine tbh. But our IT just send out an update that we don't use crowdstrike. Strange that I never heard of it if it's so widespread. But thanks


You'll see this software more in highly regulated areas. Think Government, finance, travel. It exists mainly to check a compliance box.

The Windows claim is a little misleading. We used Linux where I last encountered this. I expect Windows is where problems are manifesting this time; BSOD and kernel panics with this aren't new!

CrowdStrike seemingly came out of nowhere but has existed for a while... I think it's suspicious.

Have we not learned from SolarWinds and company? The vendors become part of your posture. Consolidating far too much


>Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

We have this crap running on our computers, and only Windows boxes seem affected.

On Linux this isn't running in kernel mode (our kernels are too up-to-date) and we don't seem to have any issue there.

Haven't heard anything about macs though.


MacOS seems to be fine (or I was too late to get an update)


MacOS does not allow kernel extensions anymore luckily


My company MacBook with the falcon client does not seem to be affected by this.


The problem is seemingly specifically in the Windows driver, you're unlikely to see an issue if you're not running Windows.


MacOS does not allow kernel extensions anymore so these kinds of crashes cannot happen. The falcon client on Mac hooks into another layer


They make security software that is really popular in various industries.


They make malware that steals funds from corporations (willingly!) so these corps can tick a security checkbox for some certification investors have been told is paramount; it's just disguised as security software.


So in other words, they make "security" software just like any other security software company.


Baffling name. Sounds more apt for a DDoS service.


On the wireless they are reporting a bad Crowdstrike update and a major Azure failover in central USA as separate events, are they they the same or different?


A whole lot of people are running Crowdstrike in the cloud and on local PC An crowdstrike update last night caused a windows kernel panic Azure/Crowdstrike personel have spend rolled back the update in the cloud Local IT people will have to revert it from local machines manually


They seem either unrelated, or the Azure one was caused by CrowdStrike.


Too many black swans the same day, I'd guess Azure is running Crowdstrike software.


Azure having problems is not a black swan event.


Fair point.


Yeah, it's far from just airlines affected.

https://www.abc.net.au/news/2024-07-19/global-it-outage-crow...


true, got some insights why this happened https://medium.com/@confusedcyberwarrior/when-security-becom..., but how they didn't had an update process like testing or QA?


Companies using Windoze for anything touching customer business should get sued by their customers.


Better described as a worldwide IT outage https://www.bbc.com/news/live/cnk4jdwp49et

Sky News UK is off the air. Some UK train companies are having an IT outage. Berlin, Melbourne airports flights disrupted...


Waitrose tills and card machines aren't working


OK now the outage is getting real. How will Tarquin get his cinnamon and gooseberry yoghurt?


It's 2024. Using Azure should be a firing offense....

https://www.lastweekinaws.com/blog/azures_vulnerabilities_ar...


Obviously you've never been working at enterprise purchasing department and participating in business meetings with Microsoft sales people.


I see no relationship between understanding that Azure has a poor security posture and participating in business meetings with salespeople.


Then you haven't met salespeople with high enough budget yet.


Yeah, you're completely out of touch of the industry. As horrendous as it is Windows on Azure (and the entire ecosystem it comes with) is an easy sell for the dinosaur IT leaders that haven't seen a Linux server in their life. You'd be surprised how common that is.


The dawn of the hybrid model: on premise will be back soon.

The impact is vast: imagine being blocked by an hostile administration in this way. Disaster recovery of this magnitude is like a global pandemic.

AI won’t save us. Network topology and admins will have their comeback.


Unless attacks take down infrastructure regularly, we won't go back to decentralized model. The internet itself was created decentralized to withstand a a war .


No, the ‘internet’ was created to allow US defence researchers to access shared computers more easily. It adopted the packet switched model that had been developed theoretically to support command and control applications, but which was never actually implemented by the extant C2 providers (in a dentist’s waiting room now so don’t have links)


I dunno, I have been around and I never seen an on-prem infra being more reliable than your average cloud.

The only difference is that when on-prem goes down you can shout to your infra engineers, when cloud goes down you shout at your enterprise cloud representative. The first is more effective, but even with that it still doesn't achieve the same reliability and disaster-recovery of your average cloud provider.


It also has the property of being less correlated with other failures, which may or may not be an advantage.


With small events cloud engineers scale better, with large events local engineers scale better.


I have never seen an on-premise environment down as much as Azure is.


It’s a general political trend too: de-globalization. Everyone sold the idea of globalisation and off-premise ethereal globalized cloud services. Both good when they work, but a total disaster when they don’t.

Decentralization is the way.


There’s a reason that I admire the federal system of the US:

For all the US’s problems, devolving critical functions to layers of differing granularity has proven surprisingly robust to many faults.

I suspect we’ll see economic equivalents, where critical functions are spread around at various scales. We dont need to be either totally globalized or totally domestic.


Do you admire federated system of Germany too? They famously have many department operating semi-autonomous, causing immense friction in adopting any new changes. Do you admire federated system EU? Where countries can run unchecked for years and it is very hard to fix issues in any specific member country.

I'm not criticizing any country or union here, and not praising them. I merely highlighting that maybe federation on its own is not the main cause of success of USA and there are some other more important factors at play.


When you centralize updates you get the outage which is the subject of discussion.

Or the Four Pests campaign.


Except that anything local will be more expensive than anything globally scaled, and the same people complaining about globalization "suddenly" don't want to pay more for the same.

It's not like alternatives for MS services doesn't exist for decades. There smaller and more people friendly hostings, email services, file shares, office software etc. The problem is that people complaining about MS services don't wan't to use them or pay for them.


“More expensive” - and herein lies the root of most problems - the world is ruined in the name of two things: profit and cost cutting.


“More expensive” - and herein lies the root of most problems - the world is ruined in the name of two things: profit and cost cutting.


Yup. Resiliency and redundancy cost a little bit more money.


I wish. Only (mostly small) tech savvy companies might maybe make that move at some point. Herd mentality and short term convenience have already won that battle. AI will only add to that since for 99.9% of people LLMs and cloud are synonyms.


I love it! Just keep putting all your eggs in one basket, because at the end of the day, it's not your fault, but Azure's.


I've had large customers seriously see that as a big appeal, when we were selling them some IT. They loved being able to blame someone else!


The corporate training always says you are responsible for your chosen cloud vendor's problems, but in reality everyone assumes they can successfully blame the vendor.


"No-one ever got fired for choosing Microsoft".


Well, the older version is about IBM, and it would probably even be true (today) if we were talking about mainframes, because they are one hell of a stable basket ;)


Although many got locked out for choosing them.


Antivirus company causing exponentially more harm in hours than it's prevented during its entire existence.



A side issue, but since we’re on the subject of global tech being generally fscked: I’m currently on holiday in Italy and just discovered the entire archive.ph domain is blocked by the government, apparently due to kiddie porn. Shrug emoji…


I'm in italy and I can access archive.ph

Perhaps just the ISP you're roaming with blocks the site?


There was a law passed that allows ISPs to block sites that host copyrighted content illegally. It's not just Italy, also the Netherlands and many other countries. You can still access it with a vpn or tor. Tor browser on the phone works fine for mobile carriers that block the sites.


Yes that seems to be the case - the blocking page is headed with "Ministero dell'Interno - Dipartimento della Pubblica Sicurezza - Direzione Centrale per la Polizia Scientifica e la Sicurezza Cibernetica - Servizio Polizia Postale e Sicurezza Cibernetica" along with official looking government logos but text underneath (in Italian and English) talks about a collaboration between govt agencies and ISPs. From mobile cell service it was not blocked.

The text does however mention it's a specific measure against child pornography, not re-hosting copyright content.


You don't even need a VPN. Just replace ISP's DNS servers with, say, Cloudflare's

    1.1.1.1
    1.0.0.1
or maybe OpenDNS's.

[Edit: fixed wrong fallback address.]


what does archive.ph have to do with that? Isn't it just a way to get around paywall?


It is haters' secret weapon to take archive.today down:

https://x.com/archiveis/status/1809081807452696711


That’s why it’s important to demonopolize desktop OS market. Microsoft should be divided long time ago


How would that change anything about Windows market share?


You don't think the bundling of windows with pcs has anything to do with their market share?



nobody is forcing these companies to depend on MS products and services


Many in regulated industries actually are forced to use crapware like Crowdstrike, to tick a box on the security checkbox.


if they'd used crowdstrike on linux they wouldn't be hit with BSOD bootloops now


I wonder if these airlines were really affected by that Azure problem or if they were affected by the CrowdStrike issue and were just mixing things up.


Don't buy tills that run Windows. You must be crazy.


No flights at Germanys Berlin BER airport as well


I bet BER staff is screaming at the passengers right now "you wanted this, it's your own fault".


Oh, you know them as well.


BBC live coverage, no paywall. https://www.bbc.com/news/live/cnk4jdwp49et


Waiting in a queue at an airport in Palma, Mallorca, Spain right now and the check in staff are currently flipping through printed sheets of paper to check us all in. It's going to be a very long wait.


The article is pay walled. Seems like this would be the fault of the airlines though. There is a reason to be distributed between different geographic areas.


But if the Azure outage is due to Windows machines crashing because of the currently ongoing CrowdStrike crash/reboot loop issue, then such servers might end up being down in all regions. Looks like there might be some advanced lessons to be learned about blast radius here...


Azure VM host machines are running CrowdStrike...??


Maybe because Windows Defender Advanced Threat Protection is an enormous resource hog that scans every byte of memory and storage accessed by the Hypervisor and performs a quadratic time computation on the data? I am just guessing because my “fastest” Windows laptop CPU money could buy feels like a hot smelting furnace and a sloth at the same time when I use VMWare Workstation. What the &$@* is it scanning the VMWare guests for?


Oh, Crowdstrike is also a massive resource hog.


More likely crash looping of so many VMs overloading some system with insufficient back pressure, possibly combined with unfortunate cluster management scheduler behavior at this scale of crash looping (e.g. too eager to retry scheduling instances, maybe even on new hosts which causes more infrastructure load).


Probably not, but maybe they rely on some Active Directory server that is running it?


VM storage is probably on Windows Server, plus AD. I'd bet out of band management is all in the impact zone too. Might be back to someone pushing physical switches and hooking up a KVM.


And this explains why the (Linux) HPC that I'm using is having so many troubles today, and keeps complaining that I don't exist.


IF they're running windows, they'll be running it.


It's more coincidental than likely, since these are managed services that were down, while the crowdstrike issue is closer to company deployments.

But never say impossible, it's best to wait and see what the actual problem is rather than throwing shade so early.


Update. Throw shade. They've confirmed it. I was wrong.


How is this update released at all? Something that affects literally every machine does not get caught in testing? From a 80B company no less.


Berlin airport is down. I guess it’s related


To those who remember the Y2K bug scare, Microsoft delivered it: https://www.youtube.com/watch?v=WhF7dQl4Ico


Does Microsoft use canary deployments? or was it deployed on everything?


Sounds like it's more of a problem caused by Crowdstrike. I'm sure azure fell over when millions of servers all freaked out simultaneously.


I guess a canary rollout of the Crowdstrike patch would have contained the impact of this.


Good lesson in rollout and heterogenous software stacks.


ALB airport down. Check in systems are in BSOD ‘Recovery Mode’.


Interesting that I happened to transit 3 different Asian airports today and had zero issues. I haven't seen anything at all related to the outage over here.


Hmm, maybe they've migrated their systems? I know SK had a plan to migrate govt machines to Linux.


This is why I sometimes think on prem hosting is better.


So, anti-malware software turns out to be malware? How odd. SecOps out on anti-open source training, I presume?


Real life operations of hospitals were affected (eg scheduled surgeries not taking place).


I'm so old that I was like "Do that many people really play Crowdstrike?" then I realised that's Counter Strike, then I looked up Counter Strike and it came out 24 years ago.

yells at cloud


And yes, they do! Counterstrike 2 is out now, to a mixed reception.


Which is actually Counter Strike 4


Heh, can't wait for Google to release Angular 3.


Well this is Valve, they absolutely refuses to use release numbers above 2.

Could at least stylized it as CS 2²




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: