Hacker News new | past | comments | ask | show | jobs | submit | crazytony's comments login

One other compounding problem is that Delta's headquarters and main traffic patterns are on the east coast. Crowdstrike affected all the airlines at roughly the same time. This gave them roughly one to two fewer hours to respond before they hit their morning peak flights.

As someone else pointed out, they probably weren't ready by the time they needed their systems for the morning rush so they went to their business continuity strategy (manual). This has a throughput and recovery time penalty and obviously it compounds the longer they are in that mode.

I think what we're finding with the Southwest meltdown and now the Delta meltdown is that the big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity. I do think this should be investigated. Hopefully financial penalties incentivize action but time will tell.


They prioritized stock buy backs instead of investing in a robust it operation


As well they should!

Which one profits the CEO more? Stock buy-backs or robust IT? Robust IT is only good for the company in the long term; however, with stock buy-backs or other skimping on IT, if disaster like this happens, the CEO just takes his golden parachute and leaves, but if no disaster happens, he gets a huge bonus to buy another private yacht.


    > big airlines just don't have the manpower or scheduling slack to accommodate going into business continuity
Do small airlines have it? And, how much higher are you willing to pay in ticket prices to have this ability?


> Hopefully financial penalties incentivize action

Delta already took a huge financial hit for this.


Funny you should mention WN. Delta's meltdown is the exact same scenario as Southwest. Crew scheduling is messed up, they don't have a way of tracking where employees are, if the employee is legal, etc and so the operation grinds to a halt


To clarify, Southwest's meltdown last year, which was all about the difficulties of crew scheduling and the knock-on effects of same.


Have spent all my afternoon and all evening on a bridge trying to support flailing systems. Was supposed to be on a plane in 5 hours to start my vacation. Guaranteed it's not gonna happen.

With hearing 911 and other safety critical systems going down, I hope that the worst that comes out of this is a couple delayed flights and a couple missed bank payments.


Good news (um, as in better than the bad news today) is the plane won't be taking off anyway, so you're golden.


This rollercoaster is not over yet. There's a crowdstrike issue causing windows machines/servers to brick globally and this industry is heavily windows dependent. It may or may not be related to the Azure issue but it's suspicious to me.

https://www.reddit.com/r/crowdstrike/comments/1e6vmkf/bsod_e...


HN discussion here - needs to be on the front page - here in NZ ATMs, supermarkets, satellite TV channels are all down

https://news.ycombinator.com/item?id=41002195


Almost certainly Azure using Crowdstrike on Windows in one way or another.

Not surprising that AWS and GCP don't seem to be hit as they wouldn't run anything on Windows, unlike Azure, who I'm sure are forced to do so under MS' infamous interdepartmental structure.


Ugh.

Though I can image there's a Azure market for "Citrix server" kinda thing in the cloud

(or maybe it's SaaS - Solitaire as a Service)


Europe will wake up to flood of problems as well. This needs to be at the top of HN. We are experiencing multiple issues here in EU.

This issue feels extremely widely spread.

Maybe we don't hear ppl complaining because we're offline? :)


Berlin airport is down


Just in time for the beginning of school holidays in Berlin. Today, they expected 86.000 travelers in BER alone (https://www.airliners.de/flughafen-ber-erwartet-rekordzahl-p...)


A few others as well, it seems.


Ryanair is unable to check in passengers online. You can check in at the airport.


I had to physically stand in a queue for about 8 hours for a Ryanair customer support desk in an airport when the airport runway was closed by 1-2cm of snow.

I forget the exact timing and can't be bothered to look up my notes, but it was something like 11pm to 7am at the origin airport for a flight that was supposed to have landed at the destination around 8pm, as we were also stuck on the runway for an hour or so and even getting that far had been delayed.

The replacement flight the next day was also cancelled even though the airport was open.

I ended up taking a ferry and a train, and that was still simultaneously faster than the next available Ryanair replacement flight and cheaper than any other provider on short notice. Fortunately I had an understanding boss who didn't mind me arriving 4 days later than expected, and also a place to crash for free while working out the best route home.


it's already noon in central europe and yes everythings fucked. except for the linux powered companies


Maybe they have rolled back the update and Windows boxes in Europe are no longer pulling it?


They did around 8:30 CEST (6:30 GTM) as I understand it. Some of our servers managed to unbork themselves after a number of boot loops, but not all.


I have never heard of crowdstrike. Is that some kind of antivirus? How is that related to PCs not booting? And why does it affect so many PCs if I've never heard of it? I'm so confused


It's enterprise anti-malware that [in addition to other bits] has a client component installed on all PCs in the corporate network. An update to that client component (called an "endpoint") is causing those Windows machines to BSOD.

It's unlikely you'd have heard of it unless you've worked at a large enterprise that runs primarily Microsoft IT.

Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.


> Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

The problem seems to be in a device driver installed by Crowdstrike - so I'm guessing whatever the bug is, it's specific. to their Windows product.


Windows complains about some page fault or something in a file name csagent.sys. On my machine this file hasn't changed in several days, but the issue only happened this morning like for everyone else.

This looks suspiciosly a case of "let's download random crap from the web and run it in kernel space. what could possibly go wrong?"


I've never seen a non-Windows machine tbh. But our IT just send out an update that we don't use crowdstrike. Strange that I never heard of it if it's so widespread. But thanks


You'll see this software more in highly regulated areas. Think Government, finance, travel. It exists mainly to check a compliance box.

The Windows claim is a little misleading. We used Linux where I last encountered this. I expect Windows is where problems are manifesting this time; BSOD and kernel panics with this aren't new!

CrowdStrike seemingly came out of nowhere but has existed for a while... I think it's suspicious.

Have we not learned from SolarWinds and company? The vendors become part of your posture. Consolidating far too much


>Crowdstrike does have Mac/Linux "endpoints" also (IIRC) but I'm unsure if they're affected as well.

We have this crap running on our computers, and only Windows boxes seem affected.

On Linux this isn't running in kernel mode (our kernels are too up-to-date) and we don't seem to have any issue there.

Haven't heard anything about macs though.


MacOS seems to be fine (or I was too late to get an update)


MacOS does not allow kernel extensions anymore luckily


My company MacBook with the falcon client does not seem to be affected by this.


The problem is seemingly specifically in the Windows driver, you're unlikely to see an issue if you're not running Windows.


MacOS does not allow kernel extensions anymore so these kinds of crashes cannot happen. The falcon client on Mac hooks into another layer


They make security software that is really popular in various industries.


They make malware that steals funds from corporations (willingly!) so these corps can tick a security checkbox for some certification investors have been told is paramount; it's just disguised as security software.


So in other words, they make "security" software just like any other security software company.


Baffling name. Sounds more apt for a DDoS service.


On the wireless they are reporting a bad Crowdstrike update and a major Azure failover in central USA as separate events, are they they the same or different?


A whole lot of people are running Crowdstrike in the cloud and on local PC An crowdstrike update last night caused a windows kernel panic Azure/Crowdstrike personel have spend rolled back the update in the cloud Local IT people will have to revert it from local machines manually


They seem either unrelated, or the Azure one was caused by CrowdStrike.


Too many black swans the same day, I'd guess Azure is running Crowdstrike software.


Azure having problems is not a black swan event.


Fair point.


Yeah, it's far from just airlines affected.

https://www.abc.net.au/news/2024-07-19/global-it-outage-crow...


true, got some insights why this happened https://medium.com/@confusedcyberwarrior/when-security-becom..., but how they didn't had an update process like testing or QA?


Companies using Windoze for anything touching customer business should get sued by their customers.


Looking at flightradar24, the only real upset I see is the turn over the southern Sierra Nevada range which is also ~55 minutes before arrival into OAK.

Wonder if mountain waves play any part in this incident? It's the only thing I can think of that could possibly generate a dutch roll without continuing flight stability issues. (They could have also turned on/off something like the yaw damper in the cockpit but that's not specifically called out int he report)

I can't imagine the standby PCU causing this incident unless there was some kind of electrical or pneumatic issue causing it to engage.

But 737 + PCU issues (yes, I know it's been redesigned) is never a good day.

Hopefully we'll get a NTSB report on this one.


This is extremely disingenuous. A dutch roll if identified and corrected is not structurally dangerous to an aircraft but it typically signifies something wrong with the control surfaces which is a larger issue. In the majority of the cases, it's the yaw damper that's a problem (my suspicion in the 737 case).

JAL 123 crashed because hydraulic pressure on all 4 hydraulic lines and 90% of the vertical stabilizer and 100% of the rudder were lost due to an explosive decompression of the aft pressure bulkhead. Dutch rolls ensued because of the loss of lateral direction control.

The KC-135 crashed because the rudder power control unit on the rudder was faulty and the pilots failed to identify the problem. They then used alternating rudder inputs to recover which caused the structural limits of the vertical stabilizer to be exceeded its structural limits and separate (along with the rest of the tail).

The Air Transat flight (961) had the entire rudder separate from the aircraft most likely due to stress fractures. This caused the aircraft to have extremely limited lateral directional control which caused the dutch rolls.


Radar, planning software and routing have all gotten much, much better in the last decade.

I believe even as late as 2007 or 2008, ATC was limited in the deviations it could do due to the track system still in use in certain areas. Once the ARTCC/TRACONs were updated, ATC in the US now has way more capability and capacity to re-route traffic around storms. I forget when the last ARTCC refresh/rebuild happened.


I'm really at a loss on this news. All the employees at airlines in the US I know of have this drilled into them on a regular basis and it's just taken for granted that you report incidents when they happen (even when someone falls: report it!) and the incident will get investigated.

It just confounds me (but explains a lot) that the manufacturer of the aircraft the airlines operate does not share a similar safety culture given that they are in a similar ecosystem (airlines report issues to the manufacturer and the FAA/NTSB all the time)


Alignment of incentives. Airlines have fewer, smaller conflicts of alignment. Boeing is in a hurry to cash in on huge demand for single-aisle passenger planes before A320/1 or a stretch A220 fill that demand. It gets worse: The longer expansion of 737MAX production is delayed, the less demand. It doesn't just expire, it declines over time. Every sale delayed is also maintenance income delayed.

On top of that, Spirit Aerosystems was spun off so Boeing could demand higher production and lower prices, and fragment their assembly line workforce.

In this environment, when management has been hostile to their workers' unions, how are workers going to feel safe raising a red flag over "minor" production issues? You can't train for correct behavior when the incentives are so far out of alignment.


yes. It wasn't about cost at all. It's the same reason there are no cameras on the flight deck.


and the manufacturer can be held liable for lost revenue so even if the AC is grounded, you can get some $$$


Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: