Hacker News new | past | comments | ask | show | jobs | submit login

Not sure what questions Microsoft have to answer. A third-party vendor shipped defective software.

I guess the only question they could answer is why they don't provide a framework like Apple do with Endpoint Security for third-party vendors to use.




Because an essential enterprise security application was /able/ to bring down an entire OS like this. The issue is that Microsoft doesn't provide an interface for an application to operate in user-space to have the functionality it requires.

Linux has eBPF which can provide most of the capability that Crowdstrike needs, by using an "in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively". If MS had this functionality, it is likely this incident would not have happened.

That said, from personal experience on Linux it's been an extremely long time since a bad kernel module has rendered a system entirely FUBAR'd.

(To Microsoft credit, they have begun copying the eBPF methodoloy to Windows, but it is still in it's infancy https://github.com/Microsoft/ebpf-for-windows/ ).


It's possible for a badly-written eBPF policy to prevent any application from starting up, AIUI, so that's more or less the same situation isn't it?


Crowdstrike brought linux machines down earlier this year in April. There* are several posts in this thread about it.


> Linux has eBPF which can provide most of the capability that Crowdstrike needs, by using an "in-kernel verifier which performs static code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively". If MS had this functionality, it is likely this incident would not have happened.

It didn't stop Linux machines from being down so it is clearly not as easy as you put it. The reality is that writing software is hard yet devs often trivialise it to their own detriment


The issue I am raising is /design/, not /development/. The current model of unconstrained unforgiving highly privileged execution space is a bad design, that is what eBPF tries to address.


It didn't make a different though. Linux still went down so clearly the design is enough


It is a different issue[0]. The Linux issue from April was a Linux Kernel bug[1], that CS Falcon happened to trigger. The design to use eBPF is sound, but the implementation on the kernel side had a bug.

Also, CS Falcon didn't support RHEL 9.4 (only up to 9.3), so for this specific bug you highlighted, CS should not be held accountable for regression testing, because it was a platform they did not support.

With Windows, the design is currently poor to not be able to run code in a safe manner. Most recently, it appears MS is blaming the EU for forcing them to create an interface for services such as CS to run[2]. Rather than lean into the problem and create a good design, they didn't create security boundaries - risking the entire system.

Bugs happen, and Linux will continue to harden and be more resilient - but unless MS focussed on secure design in this area, things like this will continue to happen (same as they have with AV before).

  [0] https://access.redhat.com/solutions/7068083
  [1] https://access.redhat.com/errata/RHSA-2024:3306
  [2] https://www.forbes.com/sites/davidphelan/2024/07/22/crowdstrike-outage-microsoft-blames-eu-while-macs-remain-immune/


Might be editorialised by op or sky changed the title, it is currently:

"Serious questions to answer after what could be the biggest IT outage in history"


Assuming Sky since the URL slug shows "Microsoft"


>Not sure what questions Microsoft have to answer.

The only thing I could think of is if it was a driver update, the driver has to be "WHQL" signed. WHQL stands for "Windows Hardware Quality Lab" -- what quality are they ensuring? (spoiler alert from my time at Microsoft: it's not terribly robust :p )

It's not realistic for Microsoft to test drivers in a manner that represents real-world usage, but perhaps they need to start doing some basic "it works with whatever integrated agent/etc is required" testing as a requirement for signing a driver.

If it was a user-mode update? Yeah no real fault on Microsoft here.


From what I heard Crowdstrike just updated their DB file, which means the bug was alreadyq there, waiting for someone to trigger it with a "low risk" quick roll out.


So kind of like the xz exploit, carefully placed and laying in wait.

I only hope this was a good guy move by someone to knock a placed chess piece off the board.


You're confusing the Crowdstrike issue with Azure being down. Microsoft is ultimately responsible for anything regarding Azure even if it was a vendor that did something wrong because they choose their vendors


The article is about CrowdStrike incident and not the Azure configuration issue.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: