The syscalls involved are in a lot of sandboxes, so worst (or best, depending on your point of view) case scenario it's a pretty universal privesc. There's a lot of steps to get there though. I'm not super familiar with the mbuf subsystem specifically but I'm going to guess mbufs are in their own allocator zone. That means you're guaranteed to overwrite an adjacent m_hdr structure. Those contains pointers that form a linked list and at first glance I don't see linked list hardening or zone checks in the MBUF macros. One could envision being able to turn this bug into a kASLR leak as well as a kernel r/w primitive and while that isn't the silver bullet it used to be on XNU (because of a whole host of hardening Apple put in) it's still pretty powerful.
Posting the hash to twitter as a proof that "something" exists reveals no actual information, so it's not considered making the exploit "public" in any meaningful way.
From the blog's timeline, it's been visible in code diffs since ~April, but only called out as a CVE since 10 days ago, so I'd consider this one hot off the presses.
There is a bigger chance that a toddler smashing a keyboard finds a bug than gpt5. LLMs can't understand intent, so they literally work like `grep` with little to no understanding of the context, so most of the time it will false flag good code.
There are already a lot of tools already to find bugs, like fuzzers, but I am sure that LLMs won't be one of them.
they don't need to understand intent, they just need to find exploits. they don't even need to do it by reading code alone - give them a vm running the code and let them throw excrement at it until something sticks!
which links to a Register Article[0], which links to a paper[1]:
"In this work, we show that LLM agents can autonomously exploit one-day
vulnerabilities in real-world systems. To show this, we collected a dataset of
15 one-day vulnerabilities that include ones categorized as critical severity
in the CVE description. When given the CVE description, GPT-4 is capable
of exploiting 87% of these vulnerabilities compared to 0% for every other
model we test (GPT-3.5, open-source LLMs) and open-source vulnerability
scanners (ZAP and Metasploit). Fortunately, our GPT-4 agent requires the
CVE description for high performance: without the description, GPT-4 can
exploit only 7% of the vulnerabilities."[1]
Yeah, that works for web vulns where the vuln description is practically the exploit anyway. I could write a perl script that parses out variable names and writes sql injections for it.
For comparison, in the native world program is considered vulnerable when someone finds arbitrary write primitive (even without leak), use after free, and even double free. There is a huge gap between these and actually having a working RCE exploit. Most CVEs in this space are given without a working exploit ever written.
GPT-5, maybe not. But somebody somewhere is building something that can do that. And if they can't do it _now_ they have a plan that tells them what's missing. TLDR; it's coming, soon.
and lots of people are spending lots of time and money on AI Coding Assitants... which is more or less the knowledge base you need.
If they could use that structural training to answer queries like "Is there any code path where some_dangerous_func() is called without it's return value being checked"...
You can do this today by querying the AST output by a compiler. Regardless, the parent comment was talking about exploits, not vulnerabilities/bugs. Vulns are a dime-a-dozen compared to even PoC exploits let alone shippable exploits.
You're either being sarcastic or wildly underestimating how hard it is to write an exploit. I haven't written about exploit dev publicly for a _long_ time, but I invite you to read https://fail0verflow.com/blog/2014/hubcap-chromecast-root-pt... for what I consider to be a pretty trivial exploit of a very "squishy" (industry term) target.
XNU isn't the hardest target to pop but it is far from the easiest.
There's nobody more confident in the world, than HN poster wiring about a topic they have no experience with.
There is a huge gap (in the binary exploitation world) between identifying a problematic code pattern and having a workable bug (a reproduction), and even larger one between a reproducible crash and a working exploit (because we're not in the 90s anymore and complier/hardware mitigations are literally always enabled). Current LLMs can cross neither gap, and are not even close to bridging the second one.