Hacker News new | past | comments | ask | show | jobs | submit login
We need better support for SSH host certificates (mjg59.dreamwidth.org)
325 points by mattrighetti on March 27, 2023 | hide | past | favorite | 156 comments



Yes! SSH certificates are awesome, both for host- and client-verification.

Avoiding Trust on First Use is potentially a big benefit, but the workflow improvements for developers, and especially non-technical people, is a huge win too.

At work, we switched to Step CA [1] about 2 years ago. The workflow for our developers looks like:

  1. `ssh client-hosts-01`

  2. Browser window opens prompting for AzureAD login

  3. SSH connection is accepted
It really is that simple, and is extremely secure. During those 3 steps, we've verified the host key (and not just TOFU'd it!), verified the user identity, and verified that the user should have access to this server.

In the background, we're using `@cert-authority` for host cert verification. A list of "allowed principals" is embedded in the users' cert, which are checked against the hosts' authorized_principals [2] file, so we have total control over who can access which hosts (we're doing this through Azure security groups, so it's all managed at our Azure portal). The generated user cert lasts for 24 hours, so we have some protection against stolen laptops. And finally, the keys are stored in `ssh-agent`, so they work seamlessly with any app that supports `ssh-agent` (either the new Windows named pipe style, or "pageant" style via winssh-pageant [3]) - for us, that means VSCode, DBeaver, and GitLab all work nicely.

My personal wishlist addition for GitHub: Support for `@cert-authority` as an alternative to SSH/GPG keys. That would effectively allow us to delegate access control to our own CA, independent of GitHub.

[1] https://smallstep.com/docs/step-ca

[2] https://man.openbsd.org/sshd_config#AuthorizedPrincipalsFile

[3] https://github.com/ndbeals/winssh-pageant


GitHub does have support for SSH CAs, but it's an Enterprise feature: https://docs.github.com/en/enterprise-cloud@latest/organizat...


That's very interesting, thank you for linking!


At work, we switched to Step CA [1] about 2 years ago. The workflow for our developers looks like:

  1. `ssh client-hosts-01`

  2. Browser window opens prompting for AzureAD login

  3. SSH connection is accepted

How is that simple, compared to `ssh -i .ssh/my-cert.rsa someone@destination --> connection is accepted, here's your prompt` ?


The former is discoverable: it doesn't require developers having ANY knowledge of command switches (no matter how basic) nor following a set of out-of-band instructions; the "how to" is included within the workflow.


ssh-add (once per session) gives users back that incredible convenience. If you wanted to rotate certs, you’d have to add each new one, of course.


The server could display that info when a user tries to log in via interactive authentication.


It’s the exact same command as a regular SSH prompt and it generates and uses the cert. that seems very simple.

Your command is disingenuous in that it only works if the certificate has already been issued to you. If you were to include issuance, your command would very much turn non-simple.


If I'm reading it right then there's a non-insignificant amount of setup necessary for the proposed approach anyway, generating and sharing a public key is much easier even for the customer/client.


This is simple because it doesn’t require you to take any specific actions to make new/different hosts accessible.

If you deactivate someone in AD, poof, all their access is magically gone, instead of having to remove their public key from every server.


What if you're ssh-ing from a headless client, like a raspberry pi or a VPS?


Then it doesn’t work but their developers are ssh-ing from their work laptops so it doesn’t matter. Something doesn’t have to be a solution for all use cases to be a good solution.


That is also the flow for Tailscale SSH

https://tailscale.com/kb/1193/tailscale-ssh/


If you are in the terminal and don't have access to a browser?


Not the OP but if anyone doesn’t have access to a browser in my org then I can safely say they’re not accessing from a company laptop and thus should be denied access.


You really never ssh from one remote server to another?


Not GP, but:

I do, however when I do this I make sure the certificate is signed with permit-agent-forwarding and demand people just forward their ssh agent on their laptops.

This also discourages people from leaving their SSH private key on a server just for ssh-ing into other servers in CRON instead of using a proper machine-key.


Agent forwarding has its own security issues, you're exposing all your credentials to the remote.

It's better to configure jump hosts in your local ssh config.


There's SSH agent restriction now.

[1] https://www.openssh.com/agent-restrict.html


In general for systems like this, you can open the browser link from a different host.

For example, if I've SSHed from my laptop to Host A to Host B to Host C then need to authenticate a CLI program I'm running on Host C, the program can show a link in the terminal which I can open on my laptop.


Having to interact with the browser every time I need to ssh to a machine would be extremely painful.

If key forwarding works, that might be workable.

I'm extremely wary of non-standard ssh login processes as they tend to break basic scripting and tooling.


These tools usually cache your identity, so you might only need to go through a browser once a day.


I suppose this could be solved by using the first server as an SSH jump host -- see SSH(1) for the -J flag. Useful e.g. when the target server requires public key authentication and you don't want to copy the key to the jump host. Not sure it would work in this scenario though.


SSHing from one remote server to another won’t be possible in a lot of environments due to network segmentation. For example, it shouldn’t be possible to hop from one host to another via SSH in a prod network supporting a SaaS service. Network access controls in that type of environment should limit network access to only what’s needed for the services to run.


I've seen the exact opposite configuration where it's not possible to avoid SSHing from one remote server to another due to network segmentation, as on the network level it's impossible to access any production system directly via SSH but only through a jumphost, which obviously does not have a browser installed.


You don't need the jumphost to do the auth for the target host. You use -J and the auth happens locally and is proxied through.


I can count on 1 hand the number of reasons I might need to do that and on each occasion there’s usually a better approach.

To be clear, I’m not suggesting the GPs approach is “optimal”. But if you’ve gone to the trouble of setting that up then you should have already solved the problems of data sharing (mitigating the need for rsync), network segregation and secure access (negating the need for jump boxes), etc.

SSH is a fantastic tool but mature enterprise systems should have more robust solutions in place (and with more detailed audit logs than an rsync connection would produce) by the time you’re looking at using AD as your server auth.


The CA CLI tool we use supports a few auth methods, including a passphrase-like one. It likely could be set up with TOTP or a hardware token also. We only use OAuth because it's convenient and secure-enough for our use case.


Never. I’ve been at this company for 8 years and owned literally thousands of hosts and we have a policy of no agent forwarding. I’ve always wondered when I would be limited by it but it simply hasn’t come up. It’s a huge security problem, so I’m quite happy with this.


Not sure why you'd get downvoted for this comment. This is likely very applicable for many orgs that have operator workstation standards -- they're some kind of window/osx/linux box with a defined/enforced endpoint protection measures, and they all have a browser. Any device I can imagine ssh'ing from that doesn't have a browser is definitely out of policy.


because both of you narrow visioned the scenario to what you do daily. it is a common use case to ssh from a jump server, use ssh based CLI tools and debugging. the issue stems from windows users who are coupled to GUIs. the behavior pattern increases IT and DevOps costs unnecessarily.

an alternative example: our org solves the issue with TOTP, required every 8 hours for any operation; from ssh/git CLI based actions (prompted at the terminal) to SSO integrations. decoupling security from unrelated programs. simple and elegant.


The -J parameter to say will transparently use a jump server and doesn't require the ssh key being on the third party server. I can't speak for tooling on step-ca but my employers in house tooling works similarly and loads the short lived signed cert into your ssh-agent so once you do the initial auth you can do whatever SSH things.


There are better ways to access remote servers than using a jump box. If you’ve gone to the lengths to tie SSH auth into a web based SSO then you should have at least set up you’re other infra to manage network access already (since that’s a far bigger security concern).

Plus, as others have said, you can jump through SSH sessions with on the client ssh command (ie without having to manually invoke ssh on the jump box).


As pointed out, whether or not you go through a jump host isn’t relevant. We all go through jump hosts as well.

Besides, neither me nor GP is saying this needs to be a universal pattern. We are saying that it’s a viable pattern for a lot of orgs.



With e.g Azure's CLI az you can specify a flag something like "--use-device-code" which shows a copy-pastable URL that you then can just visit in the browser (on a different device even).


This is a bit off topic, but does anyone know how the mechanism that triggers the web page prompt from an ssh connection actually works? Is it some kind of alternate ssh authentication method (like password/publickey) or something entirely out-of-band coming directly from the VPN app intercepting the connection?

Ever since I saw it in action with Tailscale I've always wondered how it actually works, and I guess if anyone would know they'd be on HN


OOB: ".. during the SSH protocol’s authentication phase, the Tailscale SSH server already knows who the remote party is and takes over, not requiring the SSH client to provide further proof (using the SSH authentication type none)." https://tailscale.com/kb/1193/tailscale-ssh/#how-does-it-wor...


Smallstep uses ProxyCommand [0]. Not sure how Tailscale does it.

0: https://smallstep.com/docs/ssh/how-it-works


> we've verified the host key (and not just TOFU'd it!),

How.

Specifically, what I cannot determine from their docs is how the VM obtains a host key/cert signed by the CA. How does the CA know the VM is who the VM says it is? (I.e., the bootstrap problem.)

(I assume that you also need your clients to trust the CA … and that has its own issues, but those are mostly human-space ones, to me. In theory, you can hand a dev a laptop pre-initialized with it.)


StepCA supports quite a few authentication methods, including an "admin provisioner" (basically a passphrase that can be pasted into the CLI tools' stdin).

Because each of our servers are bespoke, we can use the admin provisioner when the server is first being set up (and actually, Ansible handles this part).

I don't have experience with it, but StepCA also has Kubernetes support, and I imagine the control plane could authenticate the pod when a cert needs to be issued or renewed.


I can't say in the general sense, but with GCP you can retrieve the EKpub of a VM's TPM via the control plane, and then use https://github.com/google/go-attestation to verify that an application key is associated with that TPM and hence that VM


I like this solution, thanks for sharing. Just need to swap it with my own OIDC compliant federated authentication server.


One thing I've never understood about SSH certificates for client identification - it looks like it causes the requirement that _at some point_ ssh private keys and the certificate private key need to both be in the same place? And if this is the case, then doesn't that imply that you need to have a service where users upload their private key?

Which would mean you have one single point of attack/DOS/failure that needs to be kept utterly secure at all costs?


You give your public key (typically into ~/.ssh/authorized_keys) and then prove you have access to the matching private key as the essential part of the challenge. You always keep the private key.


I thought the way it worked was that the certificate signed with the certificate private key only contains the public key, and the ssh server, after checking the certificate is valid, validates that the client has the private key corresponding to the public key in the certificate.


Also - key forwarding. Private key is on your local, you can forward it through ssh so you can hop around from your next destination


Vault also supports both client and server ssh certificates [1]. I use terraform and vault to sign server certificates at creation time.

[1] https://developer.hashicorp.com/vault/docs/secrets/ssh/signe...


Requiring the use of a browser, though, limits the usefulness a bit.


Now try to automate that.


How do you get the browser to open? Does it work on all operating systems and ssh clients, such as Android's JuiceSSH?


Since the blog author didn't mention it, or doesn't know about it, fun fact, you can store host keys in DNS (make sure to use DNSSEC too of course!)

https://en.wikipedia.org/wiki/SSHFP_record

Of course this has its own challenges, but if you automate your DNS, this can be neat!

Cheers!


I’d been wanting to use this for a while, but support is lacking for most DNS providers. Maybe it should go the way of SPF and reuse TXT instead.

CloudFlare added support in 2018: https://blog.cloudflare.com/additional-record-types-availabl...

AWS still doesn’t support it: http://web.archive.org/web/20210429183447/https://forums.aws...

Namecheap doesn’t: https://www.namecheap.com/support/knowledgebase/article.aspx...

GCP doesn’t: https://cloud.google.com/dns/docs/records

Azure doesn’t: https://learn.microsoft.com/en-us/azure/dns/dns-zones-record...


It specifically needed to be its own record as the record requires DNSSEC to verify that the returned SSH key is trusted. Using TXT would be ridiculously insecure as it cannot force the DNSSEC verification step that SSHFP as a unique record gives.

Definitely annoying, but does ensure that the returned SSH key is always correct and not from a forged record.


> Using TXT would be ridiculously insecure as it cannot force the DNSSEC verification step that SSHFP as a unique record gives.

I don't believe that that is true. DNSSEC RRSIG records are created over the entire result set. So even if there are numerous records returned, you should still be able to verify the signature. Also, there is nothing stopping you from also returning multiple SSHFP records in a single query.

However, SPF does have a design flaw (amongst many other) that the record is placed under the domain root, which is often already polluted with other records. This is why other standards that use TXT (DMARC, DKIM, BIMI, MTA-STS, TLSRPT, etc.) use a specific label prefix, or a selector. But this is not because of DNSSEC.


What are you talking about? DNSSEC can sign TXT records perfectly well, just as any other RR type. Of course, it’s much cleaner, design-wise, to have its own RR type, and any resolvers which cannot tolerate unknown types are seriously obsolete and should be replaced.


Can you clarify the issue? Are you saying that DNSSEC doesn't allow verification of TXT records, but does support SSHFP records? I'm not seeing a reference to that online.


This is such a nice and pragmatic solution. And yet for some reason people would rather prefer the CA madness we know in the HTTPS ecosystem to be established in the SSH world. I understand that DANE was ahead of its time which is why we settled with CAs for SSL certificates but can we please take a different route this time for SSH?


As far as I can tell, DNS-based authentication puts all the trust eggs in the DNS root basket, and then again, in the TLD basket. This seems incredibly brittle and fraught with peril. Am I missing something?


The usual counter-argument for https, is that the eggs are already in that basket since anyone with control of dns can just get a new certificate.

For ssh that doesn't really apply as clearly though.


Getting a cert requires someone to either compromise the primary DNS server for the domain, or to compromise DNS in multiple independent locations to serve consistent false answers to the probes. It's true that much of the TLS ecosystem is somewhat bound to DNS being trustworthy, but not to the same extent that SSHFP is.


> or to compromise DNS in multiple independent locations to serve consistent false answers to the probes

Do all CAs implement multi-prespective validation these days? Let's Encrypt implemented that only in 2020 and they believed they were the first ones:

https://letsencrypt.org/2020/02/19/multi-perspective-validat...


That also shows up on Certificate Transparency logs.


People used HTTPS before there were Certificate Transparency logs, so there's no reason why those couldn't be run for DNSSEC too.

https://datatracker.ietf.org/doc/html/draft-zhang-trans-ct-d...

https://www.huque.com/2014/07/30/dnssec-key-trans.html

https://datatracker.ietf.org/doc/html/draft-ietf-dnsop-deleg...


The issue with DANE is that now the DNS becomes a new point of failure, which means higher attention to details, more scrutiny, heavier and more secure processes for the root signing ceremonies that require more activity. It recentralizes everything and makes "failing" easier.


The CA system is just multiple points of failure instead of one. One is better.


No, because if one CA is broken only the certificates it authored are vulnerable. It takes more to break the entire system.


A CA can sign for any site. I'd say a single CA compromise breaks the entire system until it is revoked.

DNS root key compromise breaks the entire system until it is replaced.

Not seeing a huge difference.


There are multiple "root" CAs. CA compromises happen all the time, and it has always been dealt with because the fact that multiple "roots" exists means each one has to be kept in check. It's also possible for you (or more realistically for a company) to have their own CA, with their own certificates for all the sites you need. Not being unique applies pressure on each member to behave correctly because it is possible to get rid of them.

It is not possible to revoke the DNS root, and there are no widely deployed alternatives. The incentive to do the right thing isn't as hard: it's just "good" guys doing the right stuff. If something wrong happens, where will you go ? Nowhere else.


What could possibly happen to the root zone? What is the ”something wrong” which you say could happen? I want to see specifics. The CA system is frequently defended by describing its transparency, and how any bad CA will be discovered thereby. I want to know how a compromised root zone could be used in an actual attack, and how this specific attack is easier than attacking a CA quickly enough before discovery.


So? If an authoritative DNS server operator is broken, only the domains below that operator (in the DNS hierarchy) could be impacted.


DNSSEC relies on a hierarchy of trust, with a single root zone at the top. If the root zone has issues, the whole system breaks down.

In contrast, there is no unique "root CA" that can fail.


If one CA fails, it can still issue certificates for all domains. You can’t avoid any bad CA by using a good CA; any other bad CA can still issue certs for your domains. The CA system as a whole can only ever be as good as its worst CA.


> For ssh that doesn't really apply as clearly though.

Do you think that DNS-based proof of ownership is not something that CAs would use for SSH certification?


How do you authenticate the certificate issuing? Public CAs tend to use DNS (either host or dns txt records) to authenticate.


Right now, probably domain validation. But with the TLS infrastructure, if domain validation ends up in disrepute, or if a CA screws around, we can revoke our trust on a subset without needing to replace the system wholesale.


I'm going to skip the long part, but what you're basically requesting is that everyone must trust the countries operating their ccTLDs. Does that sound reasonable? I don't think so.


The long part is well stated here: https://www.youtube.com/watch?v=UawS3_iuHoA


Don't we already implicitly have that sort of trust at the moment with TLS certs considering that proof of ownership via DNS is quite common? Actually, any domain-based validationi, i.e., also HTTP-01, is going to be flawed if you don't trust the registries.


Kind-of, but not really. Multipath validation exists and you can not trust most (if not all) of DNS during issuance technically. Even if that goes wrong, we have Certificate Transparency to detect misissuance, this doesn't exist with DNSSEC.


Trusting the country that operates the ccTLD of your website is a much better situation than having to trust all the countries that have CAs operate in them.

A malicious CA in one country can issue a fraudulent certificate for a site in another country, whereas the people operating .ru can't affect the records for example.us so the blast radius is limited by design.

Moreover, no one is required to use a ccTLD, and there are hundreds of gTLDs to choose from, or you could even run one yourself if necessary.


> A malicious CA in one country can issue a fraudulent certificate for a site in another country, whereas the people operating .ru can't affect the records for example.us so the blast radius is limited by design.

Sure and they'll be quickly mistrusted. You can't really revoke DNNSEC trust of an ccTLD operator.

> Moreover, no one is required to use a ccTLD, and there are hundreds of gTLDs to choose from, or you could even run one yourself if necessary.

This is bypassing a dangerous design, at best.


> Sure and they'll be quickly mistrusted. You can't really revoke DNNSEC trust of an ccTLD operator.

But you don't have to, because the blast radius is so much smaller, and the incentives are aligned better. The reason why CAs require such extreme punishment for misbehaviour is that one bad CA can break the trust for every site on the web.

If a country decided to invalidate the security of (predominantly) its own citizens' websites then that wouldn't harm anyone who used any of the other ccTLDs in the world (not to mention the hundreds of gTLDs).

Also, I think you are over-estimating the ease with which a CA can be "quickly mistrusted". What is the record for how quickly a CA has been taken out of browsers' certificate stores, measured from the time of their first misissuance?

And I would argue that revoking CA trust to Let's Encrypt / IdenTrust would be much more disruptive than revoking a single ccTLD operator, since that would mean breaking most sites on the web. So DNSSEC is actually better in terms of the "too big to fail" problem.

> This is bypassing a dangerous design, at best.

But that's my point; DNSSEC lets you bypass the danger of a rogue issuer, by swapping to an alternate domain in the worst case, whereas with CAs you have to hope that the rogue issuer doesn't decide to target you, and wait for the bureaucratic and software update processes to remove that CA from all your users' browsers.

There are definitely limitations to the DNSSEC system as currently deployed, just as there were with the web PKI system before browsers started to patch all the holes in that, but I don't know why my position on this technical question is so controversial. Nevertheless, I really appreciate you taking the time to offer intelligent counter-arguments in your comment, thank you.


> But you don't have to, because the blast radius is so much smaller, and the incentives are aligned better.

Entire countries best case is a small blast radius? A small CA going rogue would have a much smaller one, when we're talking about best case. Worst case is massive either way (say LetsEncrypt and .com). People also buy a lot of domains ignoring the fact that they're ccTLDs. The mere implication that people should choose their domains considering this fact is terrible.

> The reason why CAs require such extreme punishment for misbehaviour is that one bad CA can break the trust for every site on the web.

They can, but it'll be discovered really quick, especially with CAA violations. This can't be said about DNSSEC, any key compromise and abuse is difficult if not impossible to detect. Imagine that but with DANE, indefinite MITM, scary.

> DNSSEC lets you bypass the danger of a rogue issuer, by swapping to an alternate domain in the worst case, whereas with CAs you have to hope that the rogue issuer doesn't decide to target you

That's an insane bypass though. "Just cut your arm off, then it won't hurt." Change your email, figure out how to patch millions of devices out in the wild, so many problems.

A rogue issuer is much less hassle short- and long-term to deal with. Most browsers ship CRLite or similar and can revoke the root quickly. You can resume operation with a new CA rather fast.

DNSSEC is a nice complement to WebPKI and vice versa, but for our all sake, it can't be the only source of trust.


It's interesting that all of your examples reference ccTLDs and gTLDs, and not like, ".com" and ".org".


The blast radius of .COM is ~most of the western Internet.


A comment mentions it[0] (I'll repeat here for ease)

> SSHFP:

> https://www.rfc-editor.org/rfc/rfc4255

>> Re SSHFP:

>> Regarding DNS as a database for keys... Please stop this madness.

>> DNS isn't a database.

>> It's not a configuration store.

>> It was meant and should be used only for name resolution.

[0] https://mjg59.dreamwidth.org/65874.html?thread=2106450#cmt21...


>> DNS isn't a database.

>> It's not a configuration store.

Its quite literally both those things.

There may be practical reasons to question storing high value keys in dns. But not being a database of configuration info isn't one of them.


MX records don't fit into the "only for name resolution" mantra either but no one has advocated that we stop using those have they? That also ignores all of the other record types that aren't just for name resolution too.


Including DKIM which is literally keys in DNS and widely used.


I don’t believe in the forbidden fruit ideology. If it’s stupid and works well, it isn’t stupid.

People have been using DNS for all kinds of configuration for decades now, with great success. So what, other than a pedantic and grumpy programmer’s perspective, should keep us from using DNS for configuration records?


1. DNS is insecure.... DNSSEC, despite all the effort goes it, still does not work in most TLD and registers.

2. Cache. People are caching DNS aggressively. Often more aggressive than what the TTL allows.

You don't want to save the fingerprint in a stale, insecure database like that.


> You don't want to save the fingerprint in a stale, insecure database like that.

I'm trying to figure out how having a stale fingerprint would be an automatic bad thing.

Let's assume you have a server with a fingerprint stored in DNS. Something happens and the server's certificate/key needs to change. So now you push out a new key fingerprint to DNS.

The failure mode for an out-of-date fingerprint would be to not trust the new server's key. In this case, the default failure mode is to fail safely. The client could then have a few new options, like querying the authoritative DNS server or prompt the user.

You can argue that you wouldn't want to have stale DNS caching in an automated system, but in a user-interactive mode, it's not the worst thing.

And for an automated system, the system should be robust enough to manage a bad fingerprint (fail safely again), or be in control of the entire infrastructure, including you DNS cache.

Or am I missing something?


I think cache poinsioning is what the person was trying to get at.

But its kind of a strawman because nobody is suggesting putting unauthenticated keys in dns with no dnssec. The suggestion is either using dnssec, or have some sort of CA system.


> DNSSEC, ... still does not work in most TLD and registers.

I'd be interested to know where you get your data from. By my count, there are 142 ccTLDs that support it and 106 that don't, out of 248 ccTLDs.[0]

That's already more than half, but if you include the gTLDs then the number of TLDs signed in the DNS root goes up to 92% according to the best data I can find.[1]

[0] https://www.statdns.com/cctlds/

[1] http://rick.eng.br/dnssecstat/


DNS is only as stale as the cache that you trust. If the cache is not adhering to DNS standards then it is probably not one you should be trusting. Nothing is stopping a DNS client contacting the authoritative nameservers without a cache, or using your own cache that implements the standard correctly.


This is pushing the problem out to an edge that can't really deal with it, i.e. everyone outside a controlled environment. Most people don't know anything about how their DNS cache behaves, and the server-side (where ideally the security posture should be controlled from as much as possible) can't have any influence over it.

If you start using other ways of using DNS, then you might as well not use DNS and instead develop something more suited to the purpose.


i'm reminded of theo deraadt's answer when asked in a /. ama about firewall on a floppy.

> I must say that I am not a fan of these floppy-based routers. Essentially, you are taking one of the most unreliable pieces of storage known to man, and trying to build security infrastructure on it. That's madness.

dns is not the foundation upon which you want to build your secure infrastructure.


I don’t know anything about the work you do, but of all distributed databases I wrangled with over the years, DNS wasn’t among them :-)

Besides, we’re not talking about secure infrastructure per se, but distributing public metadata for IP addresses. That sounds like the prime thing DNS has been invented for to me.


Its the classic 'perfect is the enemy of good enough' derailer.


SSH's key management in general doesn't depend on DNS being trustworthy, and I'm not sure that solving it by asserting that DNS is trustworthy is a great thing.


FWIW, many people regretted using DNSSEC. So take it with a grain of salt.


When we looked into it at work, one major problem we had with SSH certificates is that a cert can sign a key, but that's it. You can't have a cert that signs a cert that signs a key. So, the one and only cert that can sign keys still ends up having to be out there in a relatively live position, signing new keys as new machines come up and such. We really wanted to be able to create a cert, put it into super-cold storage, then sign a cert or possibly a number of other certs that could be used to sign keys. Then if one of those was compromised we could handle that particular cert, but the root cert could still be trusted by everything without breaking everything.

We ended up assessing the current host certificate situation as bringing some benefit, but it was difficult to assert that the benefits were commensurate with the costs, especially considering the many other quirks, and we judged it as distinctly possible that if we did try to roll them out we'd find some other "quirk" was actually a significant stopper.

It's not the same scale, but it's a very similar problem to what would happen if TLS could only sign at one degree of remove like that. If you occasionally take the time to poke through the padlock icon in your browser, you'll find the bare minimum trust chain you'll ever find is three; a root cert in your browser trust store, some signing cert, then the cert for the domain. I've seen some with more layers than that, but 3 is fairly common. Those 3 would be 2 in the SSH context since the last one would be the key rather than a cert. Can you imagine how much rougher it would be on the web if you could only have two levels, if a root cert could sign your site but that was it?


I like the `StrictHostKeyChecking accept-new` setting.

Accept new host key without prompting the usual `Are you sure you want to continue connecting (yes/no/[fingerprint])?` (great when scripting on 100+ servers). Still reject known hosts mismatch (so shity Wifi can't inject their ads). Everything can be setting up again by just removing `~/.ssh/known_hosts` (instead of that GitHub messy blog post about curling and seding output to `~/.ssh/known_hosts`).


Ew, no.

The point that mjg59 points out, just not super explicitly and using many more words, is that the confirmation step wouldn't be necessary, if only all of our tooling were just a little bit better. Instead of ssh $ip_address and getting that prompt, instead, you'd do ssh $hostname, where hostname can be generated and contain numbers, and then the same mechanism that checks SSL certificates then also checks if the host's key actually matches what the rest of the system (the configured CAs) is saying it should be. If it does, great! No need to ask the user "Are you sure you want to continue connecting (yes/no/[fingerprint])", with fingerprint being the one thing to check out of band, but few people are really that fastidious.

Competent Security/IT departments are able to get this to work going from corp managed endpoints (laptops) going to corp managed servers. It's just that the rest of the world hasn't caught up yet.


> and then the same mechanism that checks SSL certificates then also checks if the host's key actually matches what the rest of the system (the configured CAs) is saying it should be.

That's not what mjg59 is suggesting, to establish a trust chain to system shipped CAs. In fact that wouldn't be so easy anways because SSH certificates aren't X.509 compatible at all but use their completely homegrown specification. They explicitly say "please do not do this" about this idea.

mjg59 suggests to make the TOFU approach on the client trust certificates if a certificate is provided. That's a way narrower suggestion:

> OpenSSH has no way to do TOFU for CAs, just the keys themselves. This means there's no way to do a git clone ssh://git@github.com/whatever and get a prompt asking you to trust Github's CA. Instead, you need to add a @cert-authority github.com (key) line to your known_hosts file by hand, and since approximately nobody's going to do that there's only marginal benefit in going to the effort to implement this infrastructure. The most important thing we can do to improve the security of the SSH ecosystem is to make it easier to use certificates, and that means improving the behaviour of the clients.


To be clear, I'm not suggesting trusting the certificate itself - I'm suggesting trusting the signing CA from the certificate, and so also trusting any future certificates for the same host signed by the same CA


I stand corrected. Thanks for pointing out the difference, I was using the terms "CA" and "certificate" interchangeably because in X.509 world CAs also have certificates of their own but it seems that OpenSSH only deals with CA keys, that is, certificates are always leaf certificates, and CAs consist of their keys only.

> "Chained" certificates, where the signature key type is a certificate type itself are NOT supported.

https://cvsweb.openbsd.org/src/usr.bin/ssh/PROTOCOL.certkeys...


SSH certificates (host and client) were the best thing I could have ever implemented at my job, it has significantly improved a _ton_ of internal pain points.


<sigh>, I really wish that the kerberos/gssapi folks had packaged everything up nicely for the web people to use. Between keytabs, derived credentials, and cross realm trust it really did solve nearly every authentication situation that people have.

I pine for a world in which we had made it more approachable for the people who just wanted to quickly build an application without learning about the underlying infrastructure.

I've been dealing with authn/authz for a long time and kerberos is still one of the best protocols in existence.


I've been thinking the same thing... all of this has been done before, and will be done again. With LDAP and Kerberos, many of these workflows were possible decades ago. But having servers connected to a centralized auth infrastructure wasn't popular (probably due to automated setups). And if you wanted TLS, you might even be working with an in-house CA with LDAPS (that's how I did it).

Now we're swinging back to recognizing the benefits to some level of centralization in authentication.

From a historical point of view, this all seems very familiar.


"I've long ago made up a corollary to Greenspun's tenth rule; any sufficiently complex or mature access regime will re-implement half of kerberos, poorly." -- cduzz, https://news.ycombinator.com/item?id=30798057


Definitely <sigh>, particularly as Kerberos provides general single sign-on, rather than the uniform sign-on (just the same credentials for different services) that's typically labelled "SSO". SSH GSSAPI seems pretty simple, assuming you can populate the keytab. Is Kerberos setup really worse than alternatives for Apache (all I know)? However, the SPNEGO mechanism itself is cocked up in some way I've forgotten. One problem is people not wanting to expose the KDC to the WAN -- specifically Active Directory, as I recall Microsoft says. MIT have been doing that for ever and, for instance, Fedora uses FreeIPA over https.


Seems like a pretty way to solve this by having ssh clients check the TLS of the host and then fetch the public key in a well know sub path.

But the hard part is getting everybody to support this. We need to start somewhere. Maybe GitHub can use this bad publicity and turn this into one good


> Seems like a pretty way to solve this by having ssh clients check the TLS of the host

If you're doing TLS then you might as well do HTTPS. GitHub already supports HTTPS and way more features for it than SSH, and HTTPS works over more networks than SSH does. Continuing to use SSH is literally just being obsessed with a backwards old protocol for nostalgia reasons.


>Continuing to use SSH is literally just being obsessed with a backwards old protocol for nostalgia reasons.

This is a good point actually. It's kinda funny how even Microsoft's own GUI IDE uses an underlying ssh protocol with ssh keys that the end user doesn't even need to see or know about.

Now that we're mostly using ssh to push source code, or connect to remote systems, why not just use tls instead? Aside from its general availability, feels like inevitable.


If TLS/HTTPS supported self-signed client certificates I could sympathize a bit more with this viewpoint.

As it is, I‘d argue that straightforward protocol layer client public key authentication is one of the biggest benefits of using SSH, although I do wish X509/PKI server authentication could optionally be used with it.


> straightforward protocol layer client public key authentication is one of the biggest benefits of using SSH

I disagree, it seems not straightforward at all.

First as a user you have to generate a key. You need to install software and look up a manual. Then you have to give a server admin your public key, which for many users is often confused for the private key and thus defeats the purpose. Then when you connect to the server you have to manually validate the host fingerprint, which again requires more commands and documentation.

The alternative (in HTTPS) is to use Let's Encrypt on the server, and enable HTTP Basic authentication from either a web server or a reverse proxy. The admin gives the user their login and the client just connects.

With those two features you get 1) strong encryption that your user does not need to manually validate, and 2) a non-bruteforceable authentication token. Not only that, but it's compatible with a thousand other applications and modifiable in a million ways.


Agreed on all points about self-signed key server authentication being clunky, but my point is that SSH is doing client authentication better than HTTPS, which is orthogonal to it doing server authentication worse.

I think an ideal solution would offer some hybrid of TOFU and PKI for server authentication, and self-signed keys for client authentication (like FIDO and WebAuthN, for example), but at the protocol level.

HTTPS really falls short here and means that everybody has to implement something custom on top of it, and too often that is a login form driving OAuth.


> Continuing to use SSH is literally just being obsessed with a backwards old protocol for nostalgia reasons.

Honest question:

Client authorization (for private repos) is an important feature of SSH.

How does this work with HTTPS?

EDIT: I just remembered the existence of TLS client certificate authentication. I wonder if it's possible to use an SSH client key for this (making the server accept a self-signed client certificate from the list of keys in authorized_keys).


The usual way HTTPS client auth works is to issue some sort of persistent access token that then ends up sitting on the filesystem without any sort of additional protection, or alternatively to just fall back to username and password without any kind of MFA. It's somehow even worse than the SSH situation. The use of x509 client certs would be a huge improvement here, but I don't know of any git hosting services that support that.


Corporate networks will happily do DPI on TLS, no so much with SSH. So this 'error' of GitHub might be intentional in order to mold everything into something which they can monitor.


There is SSHFP records in DNS that could be used... but unfortunately, Amazon Route53 doesn't support that record type [1] - or a bunch of other potentially useful DNS records.

[1] https://docs.aws.amazon.com/Route53/latest/DeveloperGuide/Re...


I've always thought of SSH host certificates as somewhat of an abomination and maybe I am in the minority because the better solution was always about making SSH speak x509 properly

https://roumenpetrov.info/secsh/index.html

Roumen has been regularly updating this for a pretty long time now.


SSH host certificates as somewhat of an abomination

Many would say x509 is the real abomination.


I might not disagree with you there.

However, for all its warts, x509 due to hardware implementations, seems a great deal more secure than sitting on the FS SSH host certificates.


OpenSSH supports FIDO keys since 8.2p1 and has supported smart cards via GPG longer.


Yeah. Actually ssh agent speaks PKCS#11 (both client and server) so it's possible to interface with the hardware token quite easily. I'm using that to store my client key in TPM for example.


Can't view this site on mobile. Constantly refreshing and redirecting to itself on Firefox / Android...


I wonder how many thousands (millions?) of man-hours of lost productivity over the world the Github SSH debacle resulted in. Not only for teams to fix the issue in a build pipeline no one touched for several years, but also the lost productivity for teams build/deploy pipelines to break, etc.


It's just a tiny, tiny fraction of the productivity gained by using platforms like Github.


Can also use tools like Teleport that handles all of this programatically. Open-Source core as well (github.com/gravitational/teleport).

Source: I work for Teleport so a little biased, but love the fact that more and more orgs are going away from static creds/keys and using short-lived certs and/or passwordless solutions. Spent many years in a production role and wish I had these tools (or was aware of these tools) back then.


Another problem with SSH is that people remove the key check completly in automations since some services just rotates the keys without telling. Or for being lazy.


That's also solved with host CA. You can rotate the host keys, add new machines/keys however you want, and all that matters is whether the host keys are signed with a trusted CA, when you setup automation to trust that CA.


Yes, that would work.


How about delivering the public keys via another trusted path: https://paul.totterman.name/posts/ssh-pki-web-pki/


I think the comments just make another proof that the OP assertion was correct and also that we live in a Wild West industry. One day, I hope there will be a much more standardised way of doing things properly, perhaps caveated with "more secure + more work" vs "Simple and OK secure".

At the moment, there are too many strong opinions about 100 different "best" ways to do things leaving most of the rest of us just using whatever is easiest to find on Google rather than what the industry has discussed and approved.


How about something very pragmatic: Use the same key you use for your webserver, which already has a valid SSL cert. Clients first connect to port 443 to get the public key and can verify the certificate. During the SSH handshake, the client then compares the host key with the previously obtained key of the certificate.


Part of the problem is also that ssh has host key rotation support but openssh client has it turned off by default


At least on macOS and Debian it is on by default:

> UpdateHostKeys is enabled by default if the user has not overridden the default UserKnownHostsFile setting and has not enabled VerifyHostKeyDNS, otherwise UpdateHostKeys will be set to no.

It's not that useful for actual key rotations, though; the use case seems to be more geared towards switching to newer key types (e.g. adding an Ed25529 key seamlessly) than replacing a potentially compromised key. At least I couldn't find a straightforward way for the OpenSSH sshd to actually provide two keys of the same type to my client.

All in all, I‘m not a fan of the feature – it seems to complicate key security for pretty marginal benefits.

That said, it did auto-upgrade the Github key for me as far as I can tell!


> (It's actually possible to glue the general PKI infrastructure into SSH certificates. Please do not do this)

Can someone elaborate why? We are already depending on lists of known good CAs for everything (even banking). Why not leverage the same for SSH as well?


Anyone else relying on GSSAPI key exchange & authentication instead?


I assume that means Kerberos, as opposed to GSSAPI mechanisms like (the moribund?) ABFAB, (now rightly deprecated?) Globus GSI, or even NTLM. I've used Kerberos GSSAPI with ssh long ago, but no longer can at work. Incidentally, people arguing for X.509 might consider why GSI is such a dirty word in the research computing circles I know! I think it's still widely used in HEP, and even the US Access system, though.


Yeah, Kerberos. Don't think I've ever seen the GSSAPI used for anything else!


I have had a pretty good experience with SSH certificates generated by Hashicorp Vault, both on the client and host sides (it was not for git but for general purpose SSH).


SSH doesn’t have a good revocation procedure. Perhaps integrating SSH certificates with OCSP is the way forward.


Another thing with SSH. Is there something like a host header in SSH to be used for blocking direct IP access?


Not that I know of


No, you need to abolish SSH. It's an incredibly limiting protocol that does not support modern computing needs. Hacks and add-ons solve some but never all of its many problems.

Wire-level network protocols like Wireguard are somewhat useful, but largely a large step away from the modern best practices. We need more Zero Trust, Federated Identity, Fine-grained Access Control, and Least Privilege. Those solutions exist, but they are almost always for-pay, because SSH is always used as the default option, and so no more effort is put into better security practices.

And even without putting any real effort into a better protocol, you can just implement on top of HTTPS. Look at HTTPS+Git, compared to SSH+Git. First and foremost, this RSA key leak bullshit just wouldn't happen. Even if the TLS key on the server got leaked (and why the hell would it?! it gets generated automatically by Let's Encrypt), revoking and issuing a new one happens automatically for the client with no fraught extra options, and nobody's client configuration by default disables validating the certs! Then there's the fact that you can use a variety of AuthN+Z options, it goes over standard ports, and most providers give more fine-grained access control for it.

Nerds love SSH. But it is literally worse than the alternatives, and is in practice often not used in a secure manner. Kill your darlings and use something demonstrably better.


> you need to abolish SSH. It's an incredibly limiting protocol that does not support modern computing needs

Yes! Please invent something else, and leave SSH alone.


I meant the protocol, not the program. A secure shell is still useful, but the old protocol is like a unix neckbeard that doesn't wanna learn containers. (And while we're on the subject, SSHD should support an HTTPS port and either serve a javascript client or accept websocket connections, because it is 2023 and that's what everyone wants anyway)


You may need to do some more research. SSHD can run on an HTTP port, but why bundle a web server and additional protocols into something? There are web based ssh clients - https://github.com/billchurch/webssh2 - simple google search.

Plenty of "neckbeards" understand containers and probably understand the underlying technologies (cgroups, etc) as well as comparable (or historical) approaches (jails, zones, etc) better than most.

But, because something doesn't cater to your whims, must complicate things that actually work.


Unix has had containers before Linux ever had them, and certainly before the cloud hipsters embraced docker.


> Then there's the fact that you can use a variety of AuthN+Z options, it goes over standard ports

What's a standard port? AFAIK 22 is a standard port. https://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbe...

Are you saying that everything should only go over 443 (=HTTPS)?


SSH is used for a lot more than just git. And for those usecases it's really good. Federated auth is also possible if needed, we use it at work.

SSHv1 is indeed an outdated protocol but nobody uses it anymore.


> worse than the alternatives

What alternative is there to SSH? You want to go back to telnet?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: