More

djmdjm · 2025-08-12T00:35:26 1754958926

FIPS certification is given to an entire "cryptographic module" that includes hardware and software. "FIPS compliant OpenSSH" is therefore a misnomer, you have to certify OpenSSH running on a particular OS on particular hardware.

FIPS compliance does require use of specific algorithms. ML-KEM is NIST approved and AFAIK NIST is on record saying that hybrid KEMs are fine. My understanding is therefore that it would be possible for mlkem768x25519-sha256 (supported by OpenSSH) to be certified.

caveat: IANAFA (I am not a FIPS auditor)

thayne · 2025-08-12T04:12:54 1754971974

> you have to certify OpenSSH running on a particular OS on particular hardware

Right, but if you use the certified version of OpenSSH, it will only allow you to use certain algorithms.

> ML-KEM is NIST approved and AFAIK NIST is on record saying that hybrid KEMs are fine. My understanding is therefore that it would be possible for mlkem768x25519-sha256 (supported by OpenSSH) to be certifie

ML-KEM is allowed, and SHA-256 is allowed. But AFAIK, x25519 is not, although finding a definitive list is a lot more difficult for 140-3 than it was for 140-3, so I'm not positive. So I don't think (but IANAFA as well) mlkem768x25519-sha256 would be allowed, although I would expect a hybrid that used ECDSA instead of x25519 would probably be ok. But again, IANAFA, and would be happy if I was wrong.

djmdjm · 2025-08-12T05:59:36 1754978376

My understanding is that a hybrid using x25519 as the classical KEM is fine on the basis that the security of the construction rests (for the purposes of approval) on ML-KEM and can't be made worse by the other part of the hybrid algorithm.

I don't have a definitive reference for this though.

throw0101a · 2025-08-12T12:06:52 1755000412

> * ML-KEM is NIST approved and AFAIK NIST is on record saying that hybrid KEMs are fine.*

See perhaps §3.2, PQC-Classical Hybrid Protocols from interim report "Transition to Post-Quantum Cryptography Standards" (draft):

* https://nvlpubs.nist.gov/nistpubs/ir/2024/NIST.IR.8547.ipd.p...

No algorithm explicitly mentioned, but the general idea/technique discussed.

djmdjm · 2025-08-12T00:24:41 1754958281

Those are completely disjoint threats.

A captured SSH session should never be able to decrypted by an adversary regardless of whether it uses passwords or keys, or how weak the password is.

djmdjm · 2025-08-12T00:19:16 1754957956

>In light of the recent hilarious paper around the current state of quantum cryptography

I assumed that paper was intended as a joke. If it's supposed to be serious criticism of the concept of quantum computing then it's pretty off-base, akin to complaining that transistors couldn't calculate Pi in 1951.

> how big is the need for the current pace of post quantum crypto adoption?

It comes down to:

1) do you believe that no cryptographically-relevant quantum computer will be realised within your lifespan

2) how much you value the data that are trusting to conventional cryptography

If you believe that no QC will arrive in a timeframe you care about or you don't care about currently-private data then you'd be justified in thinking PQC is a waste of time.

OTOH if you're a maintainer of a cryptographic application, then IMO you don't have the luxury of ignoring (2) on behalf of your users, irrespective of (1).

djmdjm · 2025-08-12T00:07:41 1754957261

> - development time to switch things over

This is a one time cost, and generally the implementations we're switching to are better quality than the classical algorithms they replace. For instance, the implementation of ML-KEM we use in OpenSSH comes from Cryspen's libcrux[1], which is formally-verified and quite fast.

[1] https://github.com/cryspen/libcrux

> - more computation, and thus more energy, because PQC algorithms aren't as efficient as classical ones

ML-KEM is very fast. In OpenSSH it's much faster than classic DH at the same security level and only slightly slower than ECDH/X25519.

> - more bandwidth, because PQC algorithms require larger keys

For key agreement, it's barely noticeable. ML-KEM public keys are slightly over 1Kb. Again this is larger than ECDH but comparable to classic DH.

PQ signatures are larger, e.g. a ML-DSA signature is about 3Kb but again this only happens once or twice per SSH connection and is totally lost in the noise.

djmdjm · 2025-08-11T23:46:41 1754956001

They don't endorse hybrid constructions but they also don't ban them. From the same document:

> However, product availability and interoperability requirements may lead to adopting hybrid solutions.

djmdjm · 2025-08-11T23:45:21 1754955921

Yeah, key agreement in the context of SSH is quite forgiving of timing side channels as SSH uses ephemeral keys. There's no prospect of repeatedly re-doing the key agreement to gather more statistics on the counterparty's timing.

djmdjm · on July 2, 2024

You can help by testing/reviewing https://github.com/djmdjm/openssh-wip/pull/29/commits/659cbc...

nh2 · on July 11, 2024

Awesome!

djmdjm · on July 1, 2024

No, it's a fix. It completely removes the signal race as well as introducing a mitigation for similar future bugs

djmdjm · on July 1, 2024

Ubuntu isn't affected _by this exploit_

jgalt212 · on July 1, 2024

as opposed to the other exploits not being discussed.

djmdjm · on July 1, 2024

Theo de Raadt made an, I think, cogent observation about this bug and how to prevent similar ones: no signal handler should call any function that isn't a signal-safe syscall. The rationale is that, over time, it's too way easy for any transitive call (where it's not always clear that it can be reached in signal context) to pick up some call that isn't async signal safe.

ralferoo · on July 1, 2024

I'm kind of surprised advocating calling any syscall other than signal to add the handler back again. It's been a long time since I looked at example code, but back in the mid 90s, everything I saw (and so informed my habits) just set a flag, listened to the signal again if it was something like SIGUSR1 and then you'd pick up the flag on the next iteration of your main loop. Maybe that's also because I think of a signal like an interrupt, and something you want to get done as soon as possible to not cause any stalls to the main program.

I notice that nowadays signalfd() looks like a much better solution to the signal problem, but I've never tried using it. I think I'll give it a go in my next project.

qhwudbebd · on July 1, 2024

In practice when I tried it, I wasn't sold on signalfd's benefits over the 90s style self-pipe, which is reliably portable too. Either way, being able to handle signals in a poll loop is much nicer than trying to do any real work in an async context.

formerly_proven · on July 1, 2024

This isn't the case for OpenSSH but because a lot of environments (essentially all managed runtimes) actually do this transparently for you when you register a signal "handler" it might be that less people are aware that actual signal handlers require a ton of care. On the other hand "you can't even call strcmp in a signal handler or you'll randomly corrupt program state" used to be a favorite among practicing C lawyers.

lilyball · on July 1, 2024

Why can't you call strcmp? I think a general practice of "only call functions that are explicitly blessed as async-signal-safe" is a good idea, which means not calling strcmp as it hasn't been blessed, but surely it doesn't touch any global (or per-thread) state so how can it corrupt program state?

Update: according to https://man7.org/linux/man-pages/man7/signal-safety.7.html strcmp() actually is async-signal-safe as of POSIX.1-2008 TC2.

tedunangst · on July 2, 2024

That's the point. They weren't added until TC2 in 2016.

lilyball · on July 2, 2024

Right, it wasn't promised to be safe until then. That doesn't mean it was definitively unsafe before, just that you couldn't rely on it being safe. My question is how would a function like strcmp() actually end up being unsafe in practice given the trivial nature of what it does.

formerly_proven · on July 2, 2024

> but surely it doesn't touch any global (or per-thread) state so how can it corrupt program state?

On x86 and some other (mostly extinct) architectures that have string instructions, the string functions are usually best implemented using those (you might get a generation where there's a faster way and then microcode catches back up). And specifically (not just?) on x86 there was/is some confusion about who should or would restore some of the flags that control what these do. So you could end up with e.g. a memcpy or some other string instruction being interrupted by a signal handler and then it would continue doing what it did, but in the opposite direction, giving you wrong results or even resulting in buffer overflows (imagine interrupting a 1 MB memcpy that just started and then resuming it in the opposite direction).

SomeoneFromCA · on July 2, 2024

Make no sense to me. OS restores all the registers incl. flags after leaving the signal handler. Besides, your scenario is not related to the handler _itself_ calling memcpy; it is about interrupting the main code. And it never ever destroys flags.

lmm · on July 2, 2024

> surely it doesn't touch any global (or per-thread) state

Not necessarily. An implementation might choose to e.g. use some kind of cache similar to what the JVM does with interned strings, and then a function like strcmp() might behave badly if it happened to run while that cache was halfway through being rebuilt.

lilyball · on July 2, 2024

A function like strcmp() cannot assume that if it sees the same pointer multiple times that this pointer contains the same data, so there's no opportunity for doing any sort of caching of results. The JVM has a lot more flexibility here in that it's working with objects, not raw pointers to arbitrary memory.

lmm · on July 3, 2024

> A function like strcmp() cannot assume that if it sees the same pointer multiple times that this pointer contains the same data

For arbitrary pointers no. But it could special-case e.g. string constants in the source code and/or pointers returned by some intern function (which is also how the JVM does it - for arbitrary strings, even though they're objects, it's always possible that the object has been GCed and another string allocated at the same location).

SomeoneFromCA · on July 2, 2024

Contrived example, never seen in reality.

jcul · on July 2, 2024

You could be surprised.

For example, recently I wanted to call `gettid()` in a signal handler. Which I guessed was just a simple wrapper around the syscall.

However, it seems this can cache the thread ID in thread local storage (can't remember exact details).

I switched to making a syscall instead.

SomeoneFromCA · on July 3, 2024

Well, this is possible for sure, but not for primitive functions, such as strcmp. It just does not happen in practice.

jcul · on July 4, 2024

For functions like strcmp, I think they must be signal safe, to be POSIX compliant.

https://man7.org/linux/man-pages/man7/signal-safety.7.html

If it's on this list I generally trust it is safe.

I guess my point is, that if it's not, even a simple function may appear safe, but could do surprising things.

lmm · on July 2, 2024

The JVM does it in reality, I can't see why a C runtime wouldn't.

SomeoneFromCA · on July 3, 2024

It in theory may, but no one does.

fanf2 · on July 1, 2024

Exactly, yes :-) Signal handlers have so many hazards it's vital to keep them as simple as possible.

rwmj · on July 1, 2024

A rule I try to follow: either set a global variable or write to a self pipe (using the write syscall), and handle the signal in the main loop.

cesarb · on July 1, 2024

> either set a global variable

IIRC, the rule is also that said global variable must have the type "volatile sig_atomic_t".

lokar · on July 2, 2024

You should read the sigsev handler at google, it’s great, doing all kinds of things. Of course it’s going to crash at some point anyway….

growse · on July 1, 2024

I'm not overly familiar with the language and tooling ecosystem, but how trivial is this to detect on a static analysis?

kccqzy · on July 2, 2024

Quite easy.