Secure transport protocols all do some kind of handshake to set up a session, agree on keys, etc. Modern secure transport protocols (everything since SSL2) authenticate the handshake, so a MITM can't just edit the messages and, like, stick both sides on the NULL cipher (don't have a NULL cipher, though).
Ever since Kocher and SSL 3.0, the gold standard for handshake authentication has been to keep a transcript of all the handshake messages, and just hash them to fingerprint the entire handshake. You can look at Noise for a streamlined, slick version of the same concept.
SSH does something else: it looks at the handshake as a vehicle for setting up a DH-style key exchange; that's all it's for, everything else happens inside the secure transport that key exchange provides. So instead of doing a transcript hash, SSH picks out the handshake values that will end up being inputs to the key exchange, and hashes those.
The problem is: SSH also does implicit sequence numbers; receivers keep track of how many messages they've received, senders keep track of how many they've sent. Not only that, but SSH has (for reasons passing understanding) a NOP message (`IGNORE`). `IGNORE` carries no data used to do key generation, so it has no impact on the handshake authentication --- but it does impact sequence numbers.
Result: MITM attackers can set sequence numbers to arbitrary values (by injecting `IGNORE`s in the handshake), and then edit out subsequent messages (by just not sending them). If you're using ChaPoly (and often if you're using CBC), the protocol will sync up and keep going. You can use this to, for instance, snipe out extension messages (for things like keystroke timing mitigation) from the beginning of an SSH session.
This is a pretty obvious problem! It's absolutely not something you can just accept from a secure transport protocol. And you could look at SSH and SSL3 and see "SSH is doing something really different and less sophisticated than SSL3". But it took until 2023 for someone to do the legwork to figure out how broken it was.
I think that's a really good question. The way this worked out is worth studying in detail. What was the process with which the AES-GCM cipher suites for SSH were developed? What was the process with which the ChaCha20-Poly1305 cipher suites were developed? How did the difference in processes lead to the difference in results? Will anybody change their process based on these results?
Super, super interesting. What a cool bit of research, and as you said in your other comment also a interesting bit of living history as well.
As far as mitigations and Noise, I've been tunneling all my SSH connections through WireGuard or Nebula anyway primarily just because they're such easy reliable ways to reach hosts behind NAT with in secure fashion, and while there is certainly overhead in putting SSH through something else all the tunnels are fat and fast enough that for just console control it's been fine, haven't had to use mosh (does mosh have the same issue?). Even through Starlink it's never a problem. But one does wonder a bit anyway with all the really old protocols at this point, just feels like there have been a lot of fundamental shifts in thinking around security (simplicity of implementations, not having lots of buttons and switches and flexibility, etc) such that there are less likely to be hidden bugbears now. There is more scrutiny not just day 1 but in the whole process of design.
Not that SSH isn't still important to fix but I wonder if just tunneling everything is a decent default at this point. I use internal VPNs for everything management related but not air gapped at this point, not just external. Maybe that's overkill or foolish doubling up? But it's convenient, performant, and bypasses a lot of complexity in other layers.
Secure transport protocols all do some kind of handshake to set up a session, agree on keys, etc. Modern secure transport protocols (everything since SSL2) authenticate the handshake, so a MITM can't just edit the messages and, like, stick both sides on the NULL cipher (don't have a NULL cipher, though).
Ever since Kocher and SSL 3.0, the gold standard for handshake authentication has been to keep a transcript of all the handshake messages, and just hash them to fingerprint the entire handshake. You can look at Noise for a streamlined, slick version of the same concept.
SSH does something else: it looks at the handshake as a vehicle for setting up a DH-style key exchange; that's all it's for, everything else happens inside the secure transport that key exchange provides. So instead of doing a transcript hash, SSH picks out the handshake values that will end up being inputs to the key exchange, and hashes those.
The problem is: SSH also does implicit sequence numbers; receivers keep track of how many messages they've received, senders keep track of how many they've sent. Not only that, but SSH has (for reasons passing understanding) a NOP message (`IGNORE`). `IGNORE` carries no data used to do key generation, so it has no impact on the handshake authentication --- but it does impact sequence numbers.
Result: MITM attackers can set sequence numbers to arbitrary values (by injecting `IGNORE`s in the handshake), and then edit out subsequent messages (by just not sending them). If you're using ChaPoly (and often if you're using CBC), the protocol will sync up and keep going. You can use this to, for instance, snipe out extension messages (for things like keystroke timing mitigation) from the beginning of an SSH session.
This is a pretty obvious problem! It's absolutely not something you can just accept from a secure transport protocol. And you could look at SSH and SSL3 and see "SSH is doing something really different and less sophisticated than SSL3". But it took until 2023 for someone to do the legwork to figure out how broken it was.