Hacker News new | past | comments | ask | show | jobs | submit login
If you’re not using SSH certificates you’re doing SSH wrong (2019) (smallstep.com)
459 points by noyesno on March 24, 2022 | hide | past | favorite | 336 comments



Rant engaged. As a person who feels responsible for ensuring what I build is secure, the security space feels inscrutably defeating. Is there a dummies guide, MOOC, cert, or other instructional material to better get a handle on all these things?

SSH keys make sense. But certificates? Is this OIDC, SAML, what? Is it unreasonable to request better and deeper "how to do {new security thing}" when PKI is a new acronym to someone? Where can I point my data science managers so they can understand the need and how to implement measures to have security on PII-laden dashboards? As so on.


My needs may be entirely different than yours and I don't want to downplay the importance of security, but...

Security writing in "engagement" obsessed media yields lots of people screaming "FIRE!" whenever seeing something even theoretically flammable, and bandwagoneers— already imagining their hair is on fire— lambasting everyone not immediately evacuating for being careless about fire safety. It reminds me of politicians being 'tough on crime'— they reflexively jump at opportunities to tighten the screws regardless of its necessity or efficacy. It's an emotional response involving self-image, peer pressure, and fashion rather than rational cost benefit analysis.

Perfect is the enemy of good. Attacking every theoretical threat like an international bank's network admin yields no practical benefit for most. Not nobody but most. If this TLA is new to me, there will be another new one that people will lambast me for not knowing in a couple of years, max.

For me, this problem was a better fit for the Wizard of Oz than a security education resource— what I really needed was the right frame of mind rather than learning the implementation details of every incremental certificate authority update.

I evaluate my attack surfaces and reduce them if I can, evaluate the real importance of keeping what I'm protecting secret, implement standard precautions and architecture to mitigate those risks, pay attention to the systems, pay attention to new vulnerabilities, and re-evaluate upon changes. The process is technology-agnostic and only requires you deep dive into the stuff you need to know without feeling like you need a new certification ever 6 months to run your company's CalDAV server.


Relevant article: How I learned to stop worrying (mostly) and love my threat model

https://arstechnica.com/information-technology/2017/07/how-i...


When you start dealing with hundreds of servers or more (perhaps it starts earlier at the high tens), you start looking at all things as trade-offs, and doing so yields interesting insights that aren't necessarily obvious when you're working at smaller levels.

What is the cost (in time and effort and manpower and complexity) to implement? What is the cost to maintain? What is the cost to manage, when you are adding and removing people often? What are the failure scenarios, when any one server that needs to manage things starts to become a liability for disaster recovery and redundancy purposes?

Sometimes the destination is clearly better than your current place, but the road to get there has a cost all its own that makes traveling it the non-optimal choice.

It's very easy for 1-3 admins to decide to implement something over 10-30 servers and keep themselves up to date and with the right access and knowledge to manage and maintain it. It's quite another thing when you're talking about hundreds of servers and you've implemented clear delineations about access and you have 10-20 admins ranging from junior to expert with associated levels of access to servers and tools, and the fleet of servers has evolved over years (or multiple decades, in some cases). Applying changes over that type of system can be complex and error prone and when it affects your ability to actually access and maintain the systems in question, it can be very hard to reason about the problems until you start encountering them. Change comes with risk, and risk assessment of technology becomes a large part of the planning requirements.


> evaluate the real importance of keeping what I'm protecting secret

Excellent. This is often ignored by the security obsessed, those people yelling FIRE! as you say.

Securing access to my cloud-hosted cat photos does not demand the same energy as securing ICBM launch codes.


I feel for you. Security is a complex, evolving topic, with a dizzying array of concepts.

At work, we develop Teleport (https://goteleport.com/) to provide a secure access solution that is also easy to use and hard to get wrong. (Note: you cannot truly have "hard to use" and "secure" access, because people will always develop "backdoors" that are easier to use but not secure.)

If you are interested in some accessible writing about security check out: https://goteleport.com/blog/

On SAML: https://goteleport.com/blog/how-saml-authentication-works/

On OIDC: https://goteleport.com/blog/how-oidc-authentication-works/

I can recommend the YouTube channel too: https://www.youtube.com/channel/UCmtTJaeEKYxCjfNGiijOyJw


Teleport seems like a genuinely cool product.

With that said, the company really needs to improve its interview process--my experience was downright terrible, and Glassdoor shows that other people had a similar experience


I'm with you. I really like the concept of their product and would be interested in using it. I applied a while ago but bowed out during the phone screen. There were a couple strange things that came up during the short call but there was one that wasn't forgivable. The post was clearly for a rust developer but they were upfront that they don't have any rust and are primarily a go shop. He said they put rust in the job title because it helps attract smart, passionate people.

It really put me off. I’m not dead set on developing in any given language. I like rust and have been working with it for a while but that isn’t a deal breaker for me. The thing is that if our introduction starts off with dishonesty I don't have any reason to expect it to get better from there. What will they mislead me about after I’m hired?


Roles shouldnt be put up as rust, but there is clearly rust in the github repository. Its not a lot but my understanding is the usage is somewhat growing.

https://github.com/gravitational/teleport/search?l=rust


According to the gitlogs this conversation happened about a year before those were added. We talked about this pretty point blank. It was made clear that while they might use rust in the future and they had rust fans internally, it was a go position.


FWIW I had the same experience with Embark Studios. (Game Dev in Stockholm that prides themselves on doing gamedev in rust.)

Applied for a rust job. Got a Go coding assessment. Was told that the job was Go based.


Ah right, that sucks


Hey, I'm Sasha, CTO @ Teleport. I have designed our interview process and have described it here:

https://goteleport.com/blog/coding-challenge/

We are also trying to be as transparent as possible with our challenges being open source:

https://github.com/gravitational/careers/tree/main/challenge...

and requirements being published here:

https://github.com/gravitational/careers/blob/main/levels.pd...

I am sorry to hear that you had bad experience. Our interview process is a trade-off and has one big downside - it may take more time and efforts compared to classic interviews. It could also feel disappointing if the team does not vote in favor of the candidate's application.

However, if there was something else wrong with your experience and you are willing to share, please send me an email to sasha@goteleport.com.


non-involved opinion here - it appears that a self-confident and clearly communicating C*O person is explaining exactly why the company is completely correct, while evidence of at least two actual non-company people show examples of this not being the case. Isn't it common for self-assured execs to explain away all the objections of outsiders, despite evidence directly presented? looks like it here $0.02


bingo. CTOs should realise that job ads are screened by devs just as they attempt to screen for mini-me's and protect from dead weight.

Devs (who arnt desperate for richers) look at your company and think, how cr*p would it be to work there? where are the indicators?


No specific criticism of the process was offered, so a general justification is warranted.

Personally I became interested in working for Teleport in large measure because the interview process tested my practical skills, rather than having me pull leetcode trivia out of my ass. I haven’t regretted my decision whatsoever, all of my engineering teammates here that I’ve worked directly with are very responsible and competent and the company appears to be growing mostly in the right directions.


I like Teleport. If you're doing work samples, why is your team voting in favor of applications? Part of the point of work samples is factoring out that kind of subjectivity.


That's a fair question. The team votes on specific aspects of implementation that can not be verified by running a program, for example:

* Error handling and code structure - whether the code processes errors well and has a clear and modular structure or crashes on invalid inputs, or the code works, but is all in one function.

* Communication - whether all PR comments have been acknowledged during the code review process and fixed.

Others, like whether the code uses good setup of HTTPS and has authn are more clear.

However, you have a good point. I will chat to the team and see if we can reduce the amount of things that are subject to personal interpretation and see if we can replace them with auto checks going forward.


We're a work-sample culture here too, and one of the big concerns we have is asking people to do work-sample tests and then face a subjective interview. Too many companies have cargo-culted work-sample tests as just another hurdle in the standard interview loop, and everyone just knows that the whole game is about winning the interview loop, not about the homework assignments.

A rubric written in advance that would allow a single person to vet a work sample response mostly cures the problem you have right now. The red flag is the vote.


That's a fair concern. We don't have extra steps to the interview process, our team votes only on the submitted code. However, We did not spend enough time thinking about automating as many of those steps as possible as we should have.

For some challenges we wrote a public linter and tester, so folks can self-test and iterate before they submit the code:

https://github.com/gravitational/fakeiot

I'll go back and revise these with the team, thanks for the hint.


The good news is, if you've run this process a bunch of times with votes, you should have a lot of raw material from which to make a rubric, and then the only process change you need is "lose the vote, and instead randomly select someone to evaluate the rubric against the submission". Your process will get more efficient and more accurate at the same time, which isn't usually a win you get to have. :)


Disclaimer: I'm a Teleport employee, and participate in hiring for our SRE and tools folks.

> A rubric written in advance that would allow a single person to vet a work sample response mostly cures the problem you have right now. The red flag is the vote.

I argue the opposite: Not having multiple human opinions and a hiring discussion/vote/consensus is a red flag.

The one engineer vetting the submission they may be reviewing before lunch or have had a bad week, turning a hire into a no-hire. [1] Not a deal breaker in an iterated PR review game, but rough for a single round hiring game. Beyond that, multiple samples from a population gives data closer to the truth than any single sample.

There is also a humanist element related to current employees: Giving peers a role and voice in hiring builds trust, camaraderie, and empathy for candidates. When a new hire lands, I want peers to be invested and excited to see them.

If you treat hiring as a mechanical process, you'll hire machines. Great software isn't built by machines... (yet)

[1] https://en.wikipedia.org/wiki/Hungry_judge_effect


Disclaimer: this comment ticked me off a bit.

If you really, honestly believe that multiple human opinions and a consensus process is a requirement for hiring, I think you shouldn't be asking people to do work samples, because you're not serious about them. You're asking people to do work --- probably uncompensated --- to demonstrate their ability to solve problems. But then you're asking your team to override what the work sample says, mooting some (or all) of the work you asked candidates to do. This is why people hate work sample processes. It's why we go way out of our way not to have processes that work this way.

We've done group discussions about candidates before, too. But we do them to build a rubric, so that we can lock in a consistent set of guidelines about what technically qualifies a candidate. The goal of spending the effort (and inviting the nondeterminism and bias) of having a group process is to get to a point where you can stop doing that, so your engineering team learns, and locks in a consistent decision process --- so that you can then communicate that decision process to candidates and not have them worry if you're going to jerk them around because a cranky backend engineer forgets their coffee before the group vote.

I don't so much care whether you use consensus processes to evaluate "culture fit", beyond that I think "culture fit" is a terrible idea that mostly serves to ensure you're hiring people with the same opinion on Elden Ring vs. HFW. But if you're using consensus to judge a work sample, as was said upthread, I think you're misusing work samples.

You can also not hire people with work samples. We've hired people that way! There are people our team has worked with for years that we've picked up, and there are people we picked up for other reasons (like doing neat stuff with our platform). In none of these cases did we ever take a vote.

(If I had my way, we'd work sample everyone, if only to collect the data on how people we're confident about do against our rubric, so we can tune the rubric. But I'm just one person here.)

Finally: a rubric doesn't mean "scored by machines". I just got finished saying, you build a rubric so that a person can go evaluate it. I've never managed to get to a point where I could just run a script to make a decision, and I've never been tempted to try.

I'll add: I'm not just making this stuff up. This is how I've run hiring processes for about 12 year, not at crazy scale but "a dozen a year" easily? It's also how we hire at our current company. I object, strongly, to the idea that we have a culture of "machines", and not just because if they were machines I'd get my way more often in engineering debates. We have one of the best and most human cultures I've ever worked at here, and we reject idea that lack of team votes is a red flag.


Strongly agree with this, two key concepts in particular:

1. Using group discussion to make the principled rubric is incredibly respectful of everyone’s (employee and candidate) time, not just now but future time. Using the rubric is also unreasonably effective at getting clearer pictures of people quickly.

2. Systematic doesn’t mean automated, and that hiring should aspire to be systematic to the point it makes no difference who interviewed the candidate, and all the difference which candidate interviewed.

I’ll add one …

3. If you have a rubric setting a consistent bar, share feedback with the candidate in real time (such as asking to ‘help me understand your choice I might have done differently?’) as well as synthesized feedback at the end: “This is my takeaway, is it fair?”

Contrary to urban legend this never got us sued. Every candidate, particularly those being told no, said it was refreshing to hear where they stood and appreciated the opportunity to revisit or clarify before leaving the room. Key is non judgmental clear synthesis with, “Is that fair?”


You’re mistaken, we do have a rubric. All of the members of the interview team grade the interviewee according to the rubric, and the scores are then combined into “votes”.


That's good. I'm responding to "Not having multiple human opinions and a hiring discussion/vote/consensus is a red flag". I think having combined scores is an own-goal, but having people vote based on their opinions is something worse than that (if you're having people do work samples).


Thanks for replying!

Here's what I think it boils down to: working on a codebase with your coworkers is (or at least certainly should be) an inherently collaborative process. On the other hand, a job interview is, in a sense, inherently antagonistic. No matter what shape the interview takes, these people aren't your friends, they aren't your coworkers, they are gatekeepers.

I already have a job as a programmer. At work, I can push back on my coworkers and debate the merits of various designs until we all reach a consensus. But with the Teleport interview, there's an inherent power imbalance that makes that impossible: "I'd really like to argue about this, because I don't think I agree, but I'm afraid that will decrease the chances of them hiring me."

And the only people who are in a position to change this process are the ones who have already gotten through it successfully.


From my perspective you’re unfairly projecting bad faith onto Teleport and shooting your self in the foot in the process.

1) You’re assuming that a good faith argument would decrease the chances of us hiring you, but for the most part that isn’t the case. We’re an engineering company building a complex security product — the only way that can be done well is via a culture that’s perennially open to criticism, debate, and going with the better argument. In my tenure at Teleport, I’ve never experienced explicit or implicit punishment for voicing my opinion, even when it contradicted a more senior engineer’s opinion. The argument has always been evaluated on its merits and the correct option taken. An interviewee making a good argument and proving an interviewer wrong should, and based on my experience would, increase your chances of being hired.

2) I can imagine you retorting that even if that’s truly the case at Teleport, there’s no way you could know that beforehand, and due to the “antagonistic” nature of us being the “gatekeepers”, you’re forced to assume the worst. But if your goal is to work in a collaborative environment where criticism and debate is tolerated, then your implicit strategy makes no sense. If Teleport is that type of place you’d like to work, then pushback in the interview process will be well received; if it isn’t, then you won’t even get an offer. So you have nothing to lose by giving your true opinion, but if you assume the worst and self censor in an attempt to brown nose the hiring team, you risk ending up in a shitty work environment that you were hoping to avoid.


Yep imbalance, dynamics, so much to skew the process. If you think your interview process works, great, but likely it doesnt and you just get lucky. All the good people you screened out vs all the cruft you saved yourself from.. you will never know....!

Being a programmer isnt about what you know, its about how you learn. Born programmers vs learned programmers, you got a coding test for that? really? If you think you can screen anything more then selecting for familiarity; your been sniffing that corperate glue for too long.

If you come to me thinking i am suitable for a job, you reach out via linked in, you see my public repos, then ask me to code for you on demand like a monkey?! Pull the other one!

(not referncing OP, general comment on interview processes)


I work at Smallstep

We are hiring and we have a non-terrible interview process (and amazing culture)!


Their pricing is bat shit crazy. Stay far, far away.


Sasha, CTO @ Teleport here.

I agree, our enterprise product is quite expensive. Let me explain why:

* We are going through several security audits by third party agencies several times per year. We are trying to hire the best security agencies to audit our code and it is quite expensive.

* We are recruiting globally and try to place our comp at 90th+ percentile of the compensation as listed in opencomp.com and other sources we have access to.

* Our sales process also takes time, and the sales team employs sales engineers, sales and customer success specialists to assist with deployments of such a critical piece of the infrastructure.

* For all our employees we have wellness benefits for home office improvement, personal development, healthcare packages.

All of these factors above add up and we charge a lot for building a quality security product supported 24/7 across the globe.

However, this might not work for everyone, and we have a completely free and open source version that people can use without ever talking to our sales team:

https://github.com/gravitational/teleport


Hey Sasha :) Price should be justified by value to the customer, not overhead costs of the company. Even though your value/benefits are listed on the site, this is a good opportunity to reiterate them.


It’s an intersection of those two things. Hawks can profitably prey on squirrels, while lions could not.

There’s room in the security market for $10/mo/user products and room for <whatever it is that Teleport charges>. If not, they’ll find out in an expensive and painful fashion…

Given that they have paying customers, their price is justified to at least those customers.


gk1 thanks, this is a valid point!

Teleport solves many quite important problems four our enterprise customers' infrastructure. Our users use Teleport to replace secrets and static keys with short lived certificates, manage certificate authorities, add audit and compliance controls for access to critical data, consolidate access for SSH, Kubernetes, Databases and Desktops.


You have no idea how much money you are leaving on the table because of your insane pricing strategy. Your expenses do not scale with a customer's use. Amateur mistake.


I don’t follow this comment. The last time I engaged with Teleport’s sales team they somewhere between $40-$80/host (server, VPS, etc). That seems like it would definitely scale with use.

Edit: per year. And there was a minimum order quantity.


Free, extremely capable open source version: https://goteleport.com/teleport/download/

You don't get support and some other things (see: https://goteleport.com/docs/enterprise/introduction/), but this is not a "demo" version where you cannot do actual work.

Kind of crazy indeed.


It's a security product that could be a huge productivity gainer.

All competitors i can think of are also expensive.


If for some bureaucratic reason you can't use SSH, which the industry has been quite happily using for 20+ years....


It's not about not using SSH, it's about:

* having an easy way to connect to all machines in environments where not everything is built the same way and on the same cloud or whatever. A big company can have a ton of teams building stuff across a variety of clouds and DCs. Not to mention those machines could be dynamic, so you need to add discovery. Heck, there might be Windows boxes here and there.

* having audit logs of who run what command on which server when

* extra security features like team management, MFA, etc.

You can do all that (minus audit logging) with SSH, sure, but it takes time and effort by the people who care least ( practitioners) about those things ( security teams). Buying something like Teleport or Wallix or Boundary solves all those problems at once.


You don't need their paid product. The free (open source) version is excellent.


Is it really expensive? Their website lists a 14 day trial but I don't see any pricing, just links to "Contact Sales".


Wow, a pricing page with no numbers: https://goteleport.com/pricing/ Amazing


I share the dislike for “call us for pricing” model.

But in fairness there is a de facto number on this pricing page, and that’s zero. Their free open source plan.

So I give them a bit of credit for that.

It’s the companies that have no free tier or even an advertised monthly cost plan at all and just a “call us for pricing” that I find a real turn off (even in roles where I have been a potential “enterprise” customer). So I’d definitely draw a distinction between the two.


I work at Smallstep

here's one with numbers: https://smallstep.com/sso-ssh/pricing/#pricing


I previously contacted their sales to get a sense of pricing while evaluating options. Their enterprise pricing starts at $24,000. In the realm of business security products, that might not be overly expensive. I don't know what that translates to per user. I decided not to go beyond the initial email exchanges because their sales process with excessively opaque pricing gave me the same vibe of some one trying to sell me a time share.


As a current candidate, I'd be interested in hearing more about your interview experience.


I'd love to use this product in my organisation, but I don't want to self host, and it's really unclear what it would cost me.

Seeing an "enterprise, call for a quote" type tier makes me assume it's going to be too expensive for agency securing 10-20 servers.


Disclaimer: I work at smallstep. https://smallstep.com/pricing/.

For our hosted product, you're looking at $30-$60/month.


This looks great btw - I'm not ready to move yet but this is in my plans.


I can surely recommend reading into SSH certificates using the ssh-keygen manpage. No any extra tools required.

I sign SSH certificates for all my keypairs on my client devices, principal is set to my unix username, expiry is some weeks or months.

The servers have my CA set via TrustedUserCAKeys in sshd_config (see manpages). SSH into root is forbidden per default, i SSH into an account with my principal name and then sudo or doas.

My gain in all of this: I have n clients and m servers. Instead of having to maintain all keys for all clients on all servers, i now only need to maintain the certificate on each client individually. If i loose or forget a client, its certificate runs out and becomes invalidated.


Expiries are not protection against compromise.

Compromises happen in seconds - milliseconds, and once they do they will establish persistence. Expiry systems do not and have never been protection against compromise. They're an auxiliary to revocation systems to let you keep revocation lists manageable.

If you don't have revocation lists, or your number of changes is small, you should go ahead and just set your credential expiries to whatever you want - infinity, 100 years, whatever - it won't make the slightest bit of difference.

Particularly in the case when they're protecting sudo user credentials, they're no defense at all.


Yeah the lack of mentioning a CRL at all really stood out when reading this. I actually didn't know about SSH certificates until I saw this article (I always assumed that SSH did not support this), but do run my own CA and authentication for internal web services, EAP-TLS, and VPN. The CRL is your first line of defense in the sense that it blocks the use of that credential instantly when it is revoked.

I will argue though that the use of a short expiry produces slightly better protection than no expiry at all. If an employee leaves the company (with no CRL in place) and their certs expire in 16 hours, then unless their credentials are stolen in that timeframe your systems are still safe.

Likewise, if a CRL is in place and credentials are stolen without you being aware of it, the expiry still provides a form of buffer if the stolen credentials end up being used after the cert expires. In this case the expiry would trigger before you realised that credentials were stolen and updated the CRL. Now yes compromises can happen in seconds, but that's not in every single case.

That being said I definitely agree that the expiry is not a subsitute to a CRL and any certificate system should have revocation systems in place. In the end you really should have both a CRL and expiry date if possible.


Rookie mistake: SSH has no CRL, it has an KRL.

And its actually a separate thing since it operates largely independently from the CA.

I have one in place. Used it once to terminate access for someone.


Rookie mistake: SSH's KRL is also a CRL. See KEY REVOCATION LISTS in ssh-keygen(1). You can revoke plain keys with it, but also revoke certs (both by serial number and identity) with it.

The infrastructure I built for access control using SSH certs used it. I know it works because I tested for it specifically.


It sounds like you could be making the rookie mistake instead by not reading what he/she actually wrote.

> Yeah the lack of mentioning a CRL at all really stood out when reading this. I actually didn't know about SSH certificates until I saw this article (I always assumed that SSH did not support this), but do run my own CA and authentication for internal web services, EAP-TLS, and VPN. The CRL is your first line of defense in the sense that it blocks the use of that credential instantly when it is revoked.

This sounds like he/she is running an x509 CA. He/she is generating certs for various use-cases.

It is possible to use x509 certs with SSH of course, and so he/she could leverage his/her pre-existing CA for that function.

Given above context CRL is completely accurate. And, KRL is not.


No, SSL CA certificate are in no way like OpenSSL CA issued ones.

The format of the certificate have diverged between two ecospheres about a decade ago.


If you didn’t know about SSH certs, you shouldn’t be giving advice. You should study the fundamentals


I think you may also have missed the context that he/she used, as they described running an x509 CA first.

In an organizational context, many organizations are not going to jump to creating a novel CA type (SSH CA) when in fact regular x509 CAs are well known and the basis for much security, and many in regulated industries are using them already.

Additionally, given that he/she is running an x509 CA, telling someone with that experience to study the fundamentals is not very polite. It assumes the author of the comment is not educated, but the very description of his/her use-cases are not simplistic ones.

Engineering is all about tradeoffs after all.


That’s a fantastic point. Mea culpa


Your pronoun thing makes your text painful to read.


... it genuinely pains you to read "he/she"?


That’s why I use “one”.


I just didn't want to assume gender, and didn't want to go through comment history in order to find it.


You would seem to have a very low pain threshold.


I'm not familar with SSH certificates, but I do know the fundamentals of certificate-based authentication. If you don't have a way to revoke the cert, then the server will assume that your properly signed unexpired certificate is valid. You will need some way to let the server know that the previously issued cert is not valid anymore.

This is how this type of authentication works, and the article did not address the important case of wanting to revoke a user's credentials.


To connect back on my rant -- isn't it amazing the disparity of thoughts around security best practices? How does someone who knows next to nothing become a reliable security professional if even the security professionals disagree on fundamentals?


The fundamentals are that you need to exceed O(2^N) > 80 bits roughly in complexity of your keys. Adding some padding to that is a good idea because some algorithms can be simplified in theory (for instance AES-128 is actually simplified down to like ~118 already through known math).

This is for symmetric encryption, and for asymmetric the equivalent is ~1024-bits, so padding it up to 2048-bits is generally the "minimum" for RSA, and some of that math is advancing too so bumping it to 4096-bits isn't a bad idea. If you want to be quantum proof, RSA will be broken so moving to something else like EC is nice. AES would be halved O(sqrt(N)), so AES-128 becomes the equivalent of AES-64, so if you want to be quantum proof there you need to jump up to AES-256 (unless you are using XTS/tweak mode, in which case AES-512). Keep in mind quantum also is not exactly short term practical to accomplish at the moment.

You can use whatever technology to accomplish that complexity, be it passwords, SSH keys, or SSH certs. Anything else is just technology architecture noise. Passwords absolutely can clear the O(2^N) > 80 bit threshold. It's just about bytes, and how you store them.

Nobody is going to be brute forcing a sufficiently complex password over the network anytime soon unless it isn't actually random but some default password that looks random.

Just look at the title of this post: "If you're not using SSH certificates you're doing SSH wrong". It's just completely devoid of environment issues, user issues, datacenter issues, and reeks of elitism. There is no "one true way" despite people's insistence that they are the arbiters of truth. I keep reading here about "you should just use serial over network instead of SSH!" but fail to read about how those serial over network connections are usually less secure than SSH itself.

Best practices guides have gone off the rails. They are generally good guidelines, but you have to make sure you are taking into account your own environment and user needs and take them with a grain of salt. Learn for yourself, and read raw facts from real cryptographers and people in the field. Don't take best practices guides as absolute truth, but learn from them.

How does one become a security professional? Maybe not with one of those "become a security professional in 30 minutes" packages then start a blog about how everyone isn't conforming to their tiny worldview. No matter what it'll take >10 years with actual experience, just like any profession. One has to start from the bottom and make their way up. Most environments are too complicated for any "one size fits all" solution:

https://xkcd.com/927/

EDIT: Further discussion on this here is interesting. The top comments go all in on SSH certificates, then down the line people start questioning why passwords are bad in the same ways. A lot of the "SSL certificate" push theorized here from their perspective seems to come from VPN providers that need it from lesser skilled clients/users (think, people who bought VPNs off YouTube video recommendations):

https://arstechnica.com/information-technology/2022/02/after...


> You can use whatever technology to accomplish that complexity, be it passwords, SSH keys, or SSH certs. Anything else is just technology architecture noise. Passwords absolutely can clear the O(2^N) > 80 bit threshold. It's just about bytes, and how you store them.

I always try to assume breach in my thought processes, but I recognize that this lead to overengineered solutions because sometimes the mitigation is not worth the cost.

> Just look at the title of this post: "If you're not using SSH certificates you're doing SSH wrong". It's just completely devoid of environment issues, user issues, datacenter issues, and reeks of elitism.

I think this is an excellent point you make. There are a few different ways to use SSH securely and I probably lean a little towards the x509 and other alternatives, given the established base of x509 within my industry.

I don't use SSH certificates at work because they really don't make sense for me when I am using a strong credential already (HSMs)

> There is no "one true way" despite people's insistence that they are the arbiters of truth. I keep reading here about "you should just use serial over network instead of SSH!" but fail to read about how those serial over network connections are usually less secure than SSH itself. Best practices guides have gone off the rails. They are generally good guidelines, but you have to make sure you are taking into account your own environment and user needs and take them with a grain of salt. Learn for yourself, and read raw facts from real cryptographers and people in the field. Don't take best practices guides as absolute truth, but learn from them.

These are some other seasoned points you make.

I like to think about "Security Objectives". In most cases I am concerned about is something secure from a confidentiality, or integrity perspective. But, since I also deal with an ICS/SCADA community, their context is completely driven by "Availability as Paramount", defined performance within an acceptable range being next, and only after that, does the other objectives come into play.

However, given the varying use-cases of machine, mobile, app, connectivity basis or lack thereof (internet, transient, air-gap, etc) and the limitations of each, sometimes a smorgasboard of solutions are needed to satisfy within constraints.

> How does one become a security professional? Maybe not with one of those "become a security professional in 30 minutes" packages then start a blog about how everyone isn't conforming to their tiny worldview. No matter what it'll take >10 years with actual experience, just like any profession. One has to start from the bottom and make their way up. Most environments are too complicated for any "one size fits all" solution:

Appreciate the words of wisdom.

I view security as having much in common with other rapidly evolving fields of expertise. The generalists becoming specialists, are now becoming sub-specialties, adding fellowships, etc. When I was a young force-sensitive had the good fortune to fall in with the right community in which to collaborate.

My opinion is that many of the security communities are among the most welcome, diverse, and inviting folks around.


> I always try to assume breach in my thought processes, but I recognize that this lead to overengineered solutions because sometimes the mitigation is not worth the cost.

I agree with this mindset, I do the same. But at the same time, yes you do have to realize that sometimes it's not worth it. For instance, there are two types of attack you might encounter, a strong nation-state and a drive-by botnet using known exploits and weak passwords to grab the low hanging fruit. If you are patched and using strong passwords, you aren't going to be affected by the drive-by botnet. If you are patched and using MFA and whatever strong credentials, a zero-day sat on by a nation-state is going to plow through anyway. Then they have gotten into that outer ring as a user and you are trying to protect against privilege escalation. Most things to protect against that here that are actually going to work are going to be strong process control or integrity checking (Windows), or Mandatory Access control systems (SELinux), or just basic user silo-ing and not running things as privileged accounts (either one). Most of that is going to be on the OS design itself or architecture of the process.

So we go to privilege escalation exploits. Take this year, at time of writing this is March. I have been patching nothing but privilege escalation flaws on Linux machines (I don't admin Windows, so I don't know that landscape) all year in 2022. It's only been three months. There's no short supply of them being discovered, and many of them are mildly, moderately, or entirely mitigated by just using SELinux. Some of them go all the way past it, though, so sometimes it can be futile.

So the nation-state threat in almost any case will likely have the ability to jump right past the zero-day to root level. So what about in-between? Well, learning about attack and if you are stockpiling or developing zero-days, those tend to add up quick or you just get locked out entirely because they get patched. Your skills also ramp up pretty quickly, too, as an exploit hunter. So you either develop a strong foothold or you fall out of the criminal world entirely. I'm sure it's probably the most paranoia-driven and stressful "job" to have while you are striving not to completely fall apart and get locked out due to defense ramping up or locked up (not that trying not to get hacked isn't paranoia-driven enough).

I also want to emphasize, you REALLY don't want to get compromised AT ALL at this point. Patching is probably the best way to do that, and the most important step. The reason being, you can't necessarily prove that you have kicked out the user after you think you have unless you just completely wiped the machine, and even then you have no idea if they got as far as a firmware exploit (in the instance of a nation-state), which is the more terrifying exploits that are being discovered and sought after.

But regardless, if you find out that you've been compromised and you're using a random password, you're going to change that password anyway if you are doing things right.

> I don't use SSH certificates at work because they really don't make sense for me when I am using a strong credential already (HSMs)

And that's a great point, too. HSMs are a great way to secure SSH as it is, and use the same or similar cryptography as SSH certs as long as they are well developed.

What comes to mind for me for a complicated environment where SSH certs don't help is that there might be inter-organizational issues where you have to make a connection work over multiple crazy hops. So for instance, an end-user's laptop has to connect to Citrix from home, then RDP into a local machine in organization A, then over an existing IPSEC tunnel use OpenVPN software to VPN into organization B, then SSH into a server in organization B. Organization B just did things using OpenVPN, and then SSH, but the rest had to be tacked on due to the client's environment. Real world example. So, the best usage in this case was for organization B to use Yubikeys in OTP mode to type the AES signed secrets typed as a keyboard through the multiple connections. Organization B had no control over organization A's infrastructure or ability to tell them to stop doing anything the way they were doing it, but had to consider the security implications of the way they had set their systems up anyway because the "client" was working in this environment. Then there was the issue of training the users, and explaining SSH certs OR keys to them would have been impossible. Telling them to hit a button was hard enough.

I've heard much crazier stories from the military involving piping encrypted sessions over satellite and jumping it over cable connections, etc (including patching live Super Bowl feeds over serial connections for officers which are always fun stories, especially when dealing with legal copyright issues involving the government in the 80s and fudging reasoning), but there are just some things when you are involved with multiple organizations or multiple connections or inter-organization or international things that you just can't control every single detail of. This is going to get more and more complicated as remote-work gets adopted more as well, so these old stories of network insanity are extremely useful for application level connectivity for sysadmins now.

Long story short, sometimes that thing you think is engineered terribly has a reason for it. Usually it involves stupid logistical nightmares, weird requirements, or bureaucratic/legal hopping. It's only going to get worse, too.


I'm not sure I understand the point here: are you saying that a CRL is an effective protection against compromise? If so, how exactly does that work?


If I'm not using a device for a long time, it ceases to be an authorized client. This is what i want.


`ssh-keygen` #Certificates: https://man7.org/linux/man-pages/man1/ssh-keygen.1.html#CERT...

"DevSec SSH Baseline" ssh_spec.rb, sshd_spec.rb https://github.com/dev-sec/ssh-baseline/blob/master/controls...

"SLIP-0039: Shamir's Secret-Sharing for Mnemonic Codes" https://github.com/satoshilabs/slips/blob/master/slip-0039.m...

> Shamir's secret-sharing provides a better mechanism for backing up secrets by distributing custodianship among a number of trusted parties in a manner that can prevent loss even if one or a few of those parties become compromised.

> However, the lack of SSS standardization to date presents a risk of being unable to perform secret recovery in the future should the tooling change. Therefore, we propose standardizing SSS so that SLIP-0039 compatible implementations will be interoperable.


Now you have a centralized single point of failure. While the ease of use is inherently obvious with the implementation, if/when it does fail you will have to fall back to public key/password auth anyways.


Centralized single points of control are a basic goal of corpsec. They trade availability for security. The alternative model of individual SSH keys is theoretically more highly available, but has many single points of security failure.


Please enlighten me on the ‘many single points of security failure.’


Which failure mode do you mean? The CA is accessible via offline means. I can walk to it and sign me a new keypair.


What happens when the building the CA is in burns down?


The CA is in a gpg-encryped secrets store (pass) and has a password on itself, so it can be backupped like normal data to an off-site location.


Scan printed QR codes of your private key that you had backed up off-site.


Ideally, k-of-n key shards, stored in safety deposit boxes.


That's actually pretty brilliant.


Provided you keep said papers away from prying cameras in a verifiable way, that is.

For more inspiration, check out the Glacier Protocol.

https://glacierprotocol.org/


Thanks for the heads up!

I wish I'd thought about this when playing with bitcoin a few months after launch and amassing an integer value larger than zero. That wallet died with the hard drive.


Please tell me you still have the hard drive. There’s a chance for recovery, and I have some experience in this area if you want some tips. Step 0 is always keep your drives for future recovery attempts.


It was dumped many, many years ago while BTC was still a novelty paying for pizza in the thousands BTC per. I went to see if I still had a backup of the wallet with a USD:BTC spike a few years back and it was gone.

Life goes on, even when sad things happen :(


Think of it this way: by starving the supply of that one bitcoin, you have contributed in some small way to the eventual loss of all bitcoins through similar events - speeding up the rate at which the world can move on from this silly fad.


ddrescue may be of interest if you still have the disk.

That's `dd` for broken disks. It keeps a log of data it couldn't read, and can keep trying to read it indefinitely, it even supports a save state and can resume trying again later.

I've recovered filesystems from several failed disks using it. It's not fast though!


The extreme version of this is using an HSM, and putting one in a safe deposit box.


It's not so extreme, you have to trust the HSM manufacturer.

Try generating randomness using casino-grade dice, and xor-ing it with the HSM. Maybe then.


Now I'm wondering who's managed to pull off supply chain attacks on dice, since I'm sure it's happened already.


Also, this doesn’t apply to most real scenarios (especially not “how I run my personal stuff” type scenarios), but is a fun one to contemplate: what happens when your customer has requirements that specify all keys (including root signing keys) to be rotated at a certain point in the future? Having a process for this is an interesting challenge.


The CA is a key, not a network service.


Sign with two or three CAs, and have sshd accept any of them.


> SSH keys make sense. But certificates?

I am equally mystified... Never understood how involving a possibly malicious third party can make communication more trustworthy.

But then again, I was also sure when I first heard about it that public key cryptography was obviously impossible. You just could not have secret communication when everything is on the open! Is there any simple explanation that we ignorant people can read about certificates to get an "aha! insight" moment? For the case of public key cryptography, the moment where everything snapped together was when I read the mathematical description of the Diffie-Helman key exchange [0].

I'm not interested in how to do certificates with ssh, but on what problem do certificates solve, exactly.

[0] https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exc...


People talk too much about "certificates" as a good thing, but they're just a means to an end. The two major goals you're trying to solve with SSH:

(1) You want all of your authentication to route through a single point of control where you can enforce group-based access control, MFA, onboarding/offboarding, and audit logging.

(2) You want the actual secrets that allow you access to an SSH server not to live for a long time on anyone's laptop, because it is effectively impossible to ensure that, on a sufficiently large engineering team, nobody's laptop will ever get compromised; there's just too many of them, and developers do weird shit so the machines can't be ruthlessly locked down. You want people to have SSH login secrets for exactly as long as they need them for a specific server, and no longer.

Certificates solve the problem of having dynamic access control to SSH servers without having some weird system that is constantly replacing authorized_keys on all your servers; instead, there's a single root of trust on all the SSH servers (the CA public key) and a single place that mints valid certificates that enforces all the stuff I mentioned in (1) above.

It's worth knowing here that SSH certificates are nothing like X.509 certs; they're far simpler, and you could bang out an implementation of them yourself in a couple hours if you wanted.


Using certificates does not provide anything useful for a small private network, with a single administrator, or where all the users are trusted.

On the other hand, they are useful for large organizations, with needs for differentiated access rights and management rights.

The centralized control over the certification authority allows the delegation of restricted rights to other levels of network administration.


A certificate is just a formatted list of attributes that has been signed by a particular private key. Username, UID, GID, membership in this, privilege for that, good-after date and good-until date, for instance.

Everybody knows the public key associated with that private key, so you can verify that the private key did sign this list of attributes.

An ssh keypair is an actual public/private keypair, but a certificate is just signed and encoded (but not encrypted) formatted data.

If an ssh daemon has knowledge of a public key used to sign a cert, and has been instructed to trust that cert, and all the dates are good, then the ssh daemon can accept that cert as proof of identity and allow a login.


I understand what you mean, thanks for the explanation.

But why would you want to do that? What problem does it solve? Just that you can connect without having a private key yourself? This doesn't sound very safe.


You still need your own private key plus the certificate.

I have n clients, m servers.

On clients, i sign the lokal keypair with the CA key and log-in via certificate. The client-side certificate basically replaces the line in the server-side authorized_keys. The editing stays locally.

On servers, i register the CA key as "Certificates signed by this keypair are trustable", the authorized_keys file stays empty. No further editing required.

During normal daywork, the CA key sits unused and can be shut away.

Key Advantage: I don't need to edit anything on the countless servers anymore.


Because the keys aren't directly coupled to server configurations, but rather indirected through a CA which hosts the only durable key, those "private keys" users have to have can be extremely short-lived, and tailored for each individual access request.

I think people really get into trouble with SSH certificates trying to reason about the properties of certificates versus SSH keys versus passwords. The format isn't the point; making the endpoint keys dynamic is. If you built a secure messaging system that propagated one-time-use SSH keys, it would address the same problem. Nobody will, because certificates are easier and already work, but you could.


The common way of managing ssh keys involves having some central entity that somehow updates the authorized_keys on all relevant hosts, which involves interaction with all the hosts which is somehow triggered by interaction with the user requesting access. With ssh certificates the central trusted node only interacts with the user (by signing the certificate) and does not have to update anything anywhere else.


It solves partly managing authorized_keys files. If you have a team separate keys can be difficult to manage. Shared keys are even worse. Certs can help with this if you properly manage the cert signing server (like hashicorp vault). All of that is currently free and open source. Also can now have short expiry times if desired.


One example also mentioned in the article:

If you connect to a ssh-server for the first time, ssh will give you a warning and let you know that you have to verify the fingerprint of the host key.

This becomes annoying when you connect to many different servers and I would not trust everyone (including me) to do this check correctly every single time.

SSH certificates solve this by having the ssh-host-key be signed in a way that your ssh-client can verify and you only have to add a key-signing-key to you known_hosts once.

Now you have to sign the ssh-host-key but you only have to do it once per server as opposed to having each user having to do it locally on every first connect.



This is a (quite convoluted) explanation of public key cryptography, that I already understand. My question was about certificates.


For public key cryptography my go to analogy for non-technical people has always been the mailing padlocks example (where the padlock is the public key, and the key to unlock itbis the private key that stays with the sender).


I would say PKI (and especially the associated X509 standards) is by far the least understood (or most misunderstood) part of actually building secure stuff.

It would be nice if there was a dummies guide but I'm not really aware of one. Doesn't help that most of "how to PKI" on the web amounts to a bunch of unexplained cryptic openssl CLI incantations.


I recommend Security without Obscurity: A Guide to PKI Operations by W. Clay Epstein and Bulletproof TLS and PKI by Ivan Ristić.

I started working in this space a year ago (I'm on a project deploying zero trust networking at a large company) and these books have been invaluable.

https://www.amazon.com/gp/product/036765864X

https://www.feistyduck.com/books/bulletproof-tls-and-pki/


It is combination of two things. X.509 is arguably overly complex ASN.1/X.500 thing, but that is not the main issue.

Main issue is that most people do not even grasp the concept of a certificate (ie. binding of public key to some additional information that is signed by some other entity).


Also it is a moving space. Browsers don't accept a single certificate for a site anymore, you also have to have that signed by a CA. You can create such a certificate yourself too, but as of today you will need at least two certificates for browsers to fully accept a TLS secured connection. It hasn't been that long since that rule is in place.

So it isn't only the technicalities of asynchronous encryption, there is also specific behavior of applications that use certificates to prove identities.


Practical Cryptography by Bruce Schneier and Niels Ferguson is decent in that it gives a good lay of the land without diving too deep in to the mathematical rigor. The first half explains at a high level the concepts of encryption, key exchange, asymmetric encryption, digital signatures, and lays out the problem statement that PKI solves.

It's nice in that it will list out a bunch of available encryption algorithms or hash algorithms, but at the end of the chapter say "Just use this one, it's considered safe right now." i.e. AES256 and SHA256.

Unfortunately, it mostly avoids the practical steps of web security, like its not going to print out the command to type in to your shell to generate an SSL signing certificate. So I wouldn't recommend it if you're looking for an immediately practical book to help you secure your web server. But it orients you to the landscape so you have a general idea of what you're trying to achieve, and can google yourself the rest of the way there.


If they're willing to read a book on security design, I would recommend Security Engineering, 3rd Edition [0]. It includes a broad survey of what matters in the security space (rather than just cryptography), and generally in sufficient depth to understand how we may build secure platforms in the face of adversity.

Also, many of the chapters are available to read for free - read author's text under the cover photo.

[0]: https://www.cl.cam.ac.uk/~rja14/book.html


I feel this is the exact right thing for me right now -- people trusted in industry. I can follow tutorials and documentation. The part where a concept is explained is often missing and can be guessed at (albeit often wrongly).

I'll look into this and perhaps supplement with some good tutorials for my developers and data scientists. I appreciate your input!


I don't think Practical Cryptography is going to give you much of an intuition about why this article is advocating for certificates.


I spent an afternoon implementing a SSH cert service in Go using the standard "x/crypto/ssh" package, with little to no prior knowledge of SSH internals.

SaaS like Smallstep and Teleport are trying to middleman and monetize what is actually a simple process that more developers should be comfortable implementing themselves.

This isn't "rolling your own crypto", this is standard SSH key stuff plus a bit more nuance and LOC to make things more secure.

When you pay for those services, you are essentially paying for a wrapped SSH command + dead-simple web app + someone to be your CA (read: store the resulting files of `ssh-keygen`, hopefully securely). And all the potential headaches of relying on yet another SaaS.

Plenty of developers are comfortable writing a script, a simple web app, and securely storing a file on the webserver. This is all that is required to build SSH cert support into your internal apps/tools, plus an afternoon understanding how CAs work (in short, CA private key can sign any SSH public key, then that SSH public key can be validated by anyone holding the CA public key, no TOFU required).


I have the same feeling, and it motivated me to recently purchase this "Bulletproof TLS and PKI" [0]

I haven't read it yet, so I'm posting i hope of someone else giving a quick review.

https://www.feistyduck.com/books/bulletproof-tls-and-pki/


What is the scale you are operating on?

If you are having 10-50 servers and 5-10 people working on those - SSH keys are definitely good enough, it might be a bit of hassle to manage keys but quite OK.

If you go into large corporation area with more than 100 servers and more than 50 tech people that need to login to those servers you probably would already found out that there are other options and you probably have to run your internal CA (certificate authority).

If your org grows you would probably have CTO and other technical people who will have experience knowledge to implement things differently.


Check out teleport. It abstracts away the certificate bit and manages it for you. You run 'tsh login' once and you get a cert good for 12 hours (then you can get access to all the teleport resources you are allowed to, weather that is ssh server access, db access, kubernets access, etc.) I am evaluating the product now and am quite impressed.

https://goteleport.com/


It's not unreasonable, but we need some kind of universal knowledge base for tech stuff. Security is just one of many inscrutable topics in tech where you need weeks of research to understand the best practices.


Scalable and secure access with SSH @FB https://news.ycombinator.com/item?id=12482212


This blog post seemed eminently understandable to me, at least as someone aware of public key, but not certificate based authentication.


I feel that trying to make SSH keys short-lived is becoming more painful each year because there's an increase of tools that use SSH keys for purposes other than SSH logins. For example, age [1] encrypts files with SSH keys, agenix [2] does secrets management with it, Git can now sign commits with it [3], and even ssh-keygen can now sign arbitrary data [4]. All of these become useless the moment you start using short-lived keys.

[1]: https://github.com/FiloSottile/age

[2]: https://github.com/ryantm/agenix

[3]: https://calebhearth.com/sign-git-with-ssh

[4]: https://www.man7.org/linux/man-pages/man1/ssh-keygen.1.html


Umm, please correct me if I'm wrong but I think you're confusing SSH keys with SSH certificates. A SSH client key can be reused to create short lived SSH client certificates. You can keep using that SSH client key to encrypt data, sign data, login to GitHub etc. There's no such thing as "short-lived keys", there's short lived SSH certificates.


Yes, a cert is just a public key that's been "stamped" by a certificate-authority (CA), allowing it to be validated by servers holding the CA public key (as well as enforcing other policies like lifespan, principles). It is a totally separate file and does not modify the original public or private key, which indeed have no notion of lifespan.

If you are constantly regenerating uncompromised SSH keys, you are probably doing something wrong.

The GP is misleading in this way.


> If you are constantly regenerating uncompromised SSH keys, you are probably doing something wrong.

Yup, there's no real reason to generate a new SSH key pair each and every time you want to get a short lived certificate. The SSO or CA management system (like Vault) is responsible for verifying your identity.


This is also true for X.509, there is exactly zero reason to generate new key (if it was not compromised) or even new CSR for certificate renewal. Yet people tend to do this which only increases the opportunities to fuck something up in the process. (Well, it does not make much sense, but over last year I have seen at least three instances of somebody overwriting the only copy of newly generated private key with the old one…)


> This is also true for X.509, there is exactly zero reason to generate new key (if it was not compromised) or even new CSR for certificate renewal.

It might make sense to regenerate the private key on each certificate renewal if the private keys are kept unencrypted, as they often are in the X.509 scenario. If the keys are encrypted and if your web server manages to get the password to decrypt that key each and every time it serves a request, then yeah, I don't see the point of regenerating the private key on certificate renewal.


One of the first things the article mentions is rekeying. Since the step utility does in fact regenerate keys when obtaining certificates, it actually does have a lifespan. Besides, how do you even tell that your keys have been compromised?


> One of the first things the article mentions is rekeying. Since the step utility does in fact regenerate keys when obtaining certificates, it actually does have a lifespan.

I went through the article again. Essentially, rekeying makes sense if the private keys in question, whether they are host private keys or client private keys, are kept unencrypted on disk. Host private keys typically are, so it might make sense to rekey host private keys. However, if your user private key is kept encrypted on disk, as it should be, there isn't really a good reason to rekey.

The step tool seems to abstract that process and it also generates a new key pair on each login but that keypair never even touches the disk, according to the article. This makes sense assuming the step tool generates a key pair and doesn't encrypt the private key. In that case, yes, rotating/regenerating the client keypair on each login make sense.


Those are choices made by the Smallstep SaaS and are not reflective of the underlying SSH cert technology.


The thing is, the topic of the article centers around the step utility and security practices that it considers is the best. My comments were in relation to the article, not all the possible ways in which SSH certificates can be used.


Yes, but the entire point of the described setup is to get rid of traditional long lasting keys in favor of ephemeral certificates (which I believe is another way of saying signed keys) obtained through SSO. Signing certificates with your existing keys kind of make the whole point moot.


I doubt we're getting rid of traditional long lasting keys anytime soon. They do have their uses and it'd be a waste not to use them.

> Signing certificates with your existing keys kind of make the whole point moot.

You mean getting signed certificates from a SSH CA with an existing client public key? Why does it make the whole point moot? The SSO is responsible for verifying your identity. Rather, it seems pointless to generate a new SSH keypair each and everytime you want to get a certificate and login to a machine. You can certainly do it if you want but I don't see the point. You can keep using your existing SSH public keys to get short lived certificates and login to a machine, do you job, logout, and repeat the process with the same keys.


When you sign short term certificates with your existing keys, you're still using your existing keys, managed in the exact same way, to authenticate. The certificate would just be another layer of indirection. I fail to see how that would be a meaningful change.

One of the primary benefits of the described setup is that there would only be a single long term key. It would be managed more securely because it won't be lying around on each user's personal machines.


> The certificate would just be another layer of indirection. I fail to see how that would be a meaningful change.

That certificate would only be issued after you've gone through SSO or some other certificate management process and it would have a defined expiry period, maybe even as short as a few minutes. It is a meaningful change.

When using key based authentication, as long as you had your public key in the authorized_keys file on the server, you would be able to login without issues, even if your keypair is compromised. With certificates, even if your keypair is compromised, you wouldn't be able to login to that server because you'd have to compromise the SSO/MFA authentication step as well, which adds another layer of meaningful security in the process.


I think we're in full agreement here. My prior comment was about using your existing keys as the CA.


Oh, I didn't realize you were talking about not using your SSH keypair as the SSH CA keypair. Well, that's kinda expected. The SSH CA keypair is the single point of failure in this case, which sounds worse than it is but it's okay to have such a thing because of the benefits, so it should be protected and isolated.


Different strokes for different folks.

Different keys for uhh, different complex application use cases.


This article was very illuminating. And, I wish it was written like this:

  To really secure your SSH server, do these three things:
  1. Setup a trusted authority
  2. Use ssh certificates
  3. With ssh certs, you can easily add MFA for logins expire until you reauth.
  Now that we've established this, here are the gory details.
  Lorem ipsum, lorem ipsum, lorem ipsum.

It's a great narrative, but I wish I knew the big payout in advance.


MFA for ssh means you can't automate cluster-level ops. Unless, well, you automate the MFA, which defeats the MFA.

This drives me a bit nuts about security people in the age of cloudscale. They assume you don't mind MFA'ing every hour and are manually doing logins and accesses for everything. Yeah, uh, I need to script orchestration on several hundred machines at once, and orchestrate/access on the scale of hours or even days for some things like "Big Data" backups or restores.

If certificates are anything like SSL certs and the horrorshow cli tools / options / management involved in those, no thanks. I'd rather have an automated sshkey switchover, or for stateless just routinely cycle the infrastructure with new keys.

It's been a long time tenet of security that you want an open algorithm that gets broadly and publicly challenged so you know its secure. Well, in the age of the state actors, this might not be the whole truth.

I think layering some klugy not-invented-here obfuscation atop the more battletested methods is a useful and important deterrent/delay. Sure someone will figure it out, but they have to TRY HARD. A lot of the institutional attacks seem to be based on human attacks on standardized systems, which HAVE to allow human access and vectors.

So in AWS land, secrets manager is secure unless you get the permissions. Then you have everything. But if each of those secrets has some whacko obfuscation for the various apps, then that is a big slowdown to the human attack vectors.

And the state actors? Well, even they have budgets. They'll probably move on to easier targets. If a state actor is motivated at targeting your company in particular, well, given that they'll have malware in the firmware of your hard drives and motherboards and the like, you're probably helpless.

Finally, what really bothers me about most security is that it leaves one of the most important "canaries in the coal mines" aspects of security: honeypots. Sure do the diligence on securing the access, but how about some turnkey approaches for setting up honeypots to detect when people are poking around? Honeypots are perfect for that, because the devs only care about the stuff they are working on. The intruders are doing the scanning.


Cluster-level ops really shouldn't be done by direct SSH. Tools like ansible, salt, chef, puppet, etc... all can be run in some form of daemon mode with a central management server for a reason. You should be authenticating against the management service and running your configuration or automation scripts from there, not as a massive pool of CSSH or whatever directly from your laptop.

Over the last 6 years I've worked for three different companies that all universally disabled ssh and we never had troubles running management scripts or tooling.


Stateful databases or disposable api servers?

Disposable API servers you can eliminate ssh access and the like. Containers generally don't run sshd, but they do still often have kubectl/dockerrun if you absolutely need to.

SSM sucks on a certain level because the output is capped at ?1MB? I think and you need to poll S3 to use it.

Salt daemon polls fine.

As you kind of alluded to, well, you can do SSM to "port knock" or simply do a change to a secgrp to flip on the ssh access, and then flip it off afterward.

I find Salt/Ansible/k8s too big, you can't "step through" to debug your orchestrations. I would categorize them as "heavyweight". SSH is a good substrate for everything else, including adhoc stuff.

Anything that can be run off a laptop can be run off an admin server. Remotely debugged. I love it. With the heavyweight stuff there is too much trial and error: try recipe, it craps, guess what's wrong, it craps, guess again, it craps.

The stuff I use actually doesn't require SSH. As long as you can deliver a command and get the output, I can use SSH, SSM (with the crappy limitations but its GREAT for stuff in China), kubectl, salt, dockerrun, teleport (until the token runs out), or combo ssh-to-bastion then do other stuff.


Until the daemon breaks and someone has to get in with SSH anyway? Chef is notorious for doing this. Or at least it was before we got rid of it. Some random script would break and then the run would be incomplete. And we couldn't just fix cookbooks as the run wouldn't complete.

Some deployments make the daemon approach (that phones home) difficult. Such as management in a corporate network. It's easy to configure AWS and the like to accept requests from well known corporate gateways. It's not as easy to make them from the outside the corporate network in. And even when that's doable, different cloud providers and regions make it difficult. You end up having a bunch of chef (or similar) servers scattered around.


In the rare case this happens we still don't use raw SSH. We rely on something identity-driven like SSM in AWS or IAP in GCP to initiate the tunnel.


GCP IAP sounds like Teleport, which we've already run into issues with since the Teleport daemon will die/not accept connections in some situations, while the good ol sshd does. Like: full disks, memory stress, or (I think) the teleport daemon getting killed.

SSM sounds like an advanced port knock. Or you could toggle the security group port access, or keep the bastion down and spin it up if you need it.


You mean like Thinkst Canary?

Or OpenCanary if you can't afford $arm&leg?


I am not a security consultant, but I've never heard of either.

Which says something... but then again if the security group is doing its job with setting up canaries, why would peon dev like me even be awares?


I agree, something like that. From what I've experienced it's (for me) a common problem in that realm:

a lot of blah-blah that makes my eyes glaze/lose focus before getting to the core/overview.

~15 years ago, when the company I'm working for first implemented PKI, I needed something like 100 hours of "help" from local security engineers and the external software-vendor's programmers to understand how that worked and what the SW was supposed to do. After that, explaining to colleagues at least on a high level how that works became a matter of minutes.

To be fair towards myself, even the local scrty eng gurus (relaxed, as they themselves didn't have to "deliver" anything) and the external vendor's programming gurus (hardcore, as nothing would be paid if the SW could not be implmemented) were often absolutely not understanding each other, so I ended up becoming their unofficial mediator/translator:

if I felt like an obvious question was not dared to be asked by one of the parties then I sacrificed my self-esteem to dare asking it, if "even I" did manage to understand some concept then both parties were supposed to get it as well and if not then at least I could act as gateway to explain/expand offline :P

It was an interesting time - not nice nor very bad (a little bit bad, as we had of course an implementation deadline), but at least I learned a lot, as well in the area of social skills :)


Is that what I get out of the box using tailscale for ssh?

https://tailscale.com/kb/1009/protect-ssh-servers/


Two examples of things Tailscale doesn't give you for this usage model that SSH CAs can:

* Transcript-level audit trails for what people are actually doing on SSH sessions.

* Differential access to different groups of users to the same machines.

Tailscale and SSH CAs work together nicely: require membership in the right Tailscale group to talk to SSH at all, thus tying access to SSH to your (e.g.) Google login and MFA requirement, and use something like Teleport for the actual SSH login, to get the audit log, group access, and an additional authentication factor.


Tailscale (and other similar solutions) works on the network level. This is not a bad idea in itself, but SSH certs operate on the application level.

The fact you can ping the server shouldn't mean you are allowed to actually access it.


Meh, you're not wrong for skipping SSH certs. They're mature security, but not mature enough for everyone. And like everything else, they break, predictably and unpredictably. If you're not ready for someone to emergency access their way into prod to fix a broken SSH issue on Christmas morning, you're not ready for SSH certs. Maintenance will get you, one way or another..


Is there something wrong with installing client public keys in the server and preventing password based logins?

Seems more secure than handing over the security of your servers to a CA?

Managing the authenticated_keys file is trivial and I like simple file based solutions.

I find simpler is often more secure because its easier to understand.

I have had plenty of ssh servers on the Internet even, (on non-22 ports to prevent logs filling) and not had a problem yet AFAIK.

I have also been on call when certs expired gawd knows how many times.


I'm not sure I buy it, but, TFA tries to explain:

Situation: A person's machine dies and takes the SSH private key with it

No Certs: They now need to get IT to distribute the public keys to a set of hosts, where the full list is possibly not known by any one person, so for the next week it will be "oh shit, I forgot to ask IT to put the pub key on server-12, so I'll open a ticket and not get any work done for the next few hours"

With Certs: They get IT to sign the new key.


> They now need to get IT to distribute the public keys to a set of hosts, where the full list is possibly not known by any one person

If the team responsible for server security isn't able to identify all the servers they're responsible for, you've got far bigger problems than SSH certificates are going to solve.


You could just get your authorized keys via an `AuthorizedKeysCommand`. It will make it slower to load but even something as simple as `AuthorizedKeysCommand aws s3 cp s3://myprivatebucket/authorized_keys - | grep USER` will do the trick. Then revocation is just deletion from the file and adding is just addition to the file.


"Situation: A person's machine dies and takes the SSH private key with it"

Not sure I buy that, people take backups. No reason why a CA can't loose its private key if we're presuming backups are not being taken, and then everyone is affected.


If the IT team is treating servers like pets, then this is a problem. If they aren't amateurs, then it's not a problem to update the script and push the configuration out to all of the servers.


The second part is only kind of accurate, because usually the "sign a new key" part is automated as part of some other authorization flow. For instance you can log into your corporate SSO provider, and that can actually provision a short-term (i.e. 1 hour) SSH key that is signed by the CA for you. That short-term key is then used to shell into hosts, and you periodically renew either the key or the SSO session token. There's a bunch of points in the design space you can do to make this transparent. Normally it's all wrapped up in some command that invokes ssh for you; so you just say 'corp-auth ssh username@host' instead of just 'ssh username@host' and it all "Just Works" if you're lucky.

You can also bake some basic rules/policies into the certificate, i.e. the above flow returns a signed certificate that is only valid for SSH'ing into hosts in the 'www' group (not the 'database' group.) So then operations people can add you to groups in some other place (LDAP, Exchange, whatever) and when they add you to the 'database' group, any future short-term SSH keys issued by the central authority allow you to now shell into the 'database' hosts. The hosts themselves can also have some automation to check these permissions are accurate when an SSH login request occurs. So you just file an IT ticket asking to shell into a host or group of hosts, they fiddle with some knobs somewhere, and a few minutes later you're done.

So the actual second part is more like "Get a new laptop and log back in through Okta" or whatever, and you have all the exact same permissions and whatnot you did before. It's easier for both the user and the administrators.

An immediate advantage of this are that keys are easy to immediately audit and revoke (because a central authority issues them so you have a trusted trail). But there are some more subtle advantages; one for example is that you make servers more homogenous and identical. They all just ship some specific sshd_config file, normally identical, rather than each server potentially having a different authorized_keys file; instead the central authority that issues keys is where the mapping of hosts/authorized users is, rather than each server having that knowledge "individually" in its own file. That's easier to understand, track, keep up to date (who has access where?) etc. You can just go modify a single global user and have the permissions flow downwards.

None of this really matters if you only have like 1 or 2 people doing everything, or it's your home network, but once you get above like, 5-10 people, or you have at least one sysadmin, it's actually really useful IMO. Also, to some extent, you can mix and match various parts of the above concepts (e.g. running code on every login request, to add extra authorization checks based on out-of-band information.) You have to decide what's appropriate for you. Read the ssh and sshd man pages and you'd be surprised at what you can do.

Source: I basically implemented my own SSH certificate authority infrastructure (server/client automation) "for fun."


I guess I don't see how this is better than using LDAP or AD authentication on each host?

Also, users tend to hate having to always relogin to SSO; maybe that's because the implementations have poor UX, and maybe there's no secure way around it.


Sure, you can use LDAP or AD or any other number of things to control server authentication as well as mapping some global database of user IDs to accounts. You could also do other things like combine this with a 2FA solution like Duo.

One thing SSH certificates certainly have going for them is that they're actually easy to script and integrate with, and "piecewise" migrate to, in my experience, while using a flow you already are pretty familiar with. I personally didn't use any sort of LDAP or AD setup to back my design; you can implement a custom backend for all this pretty easily yourself. There's nothing inherently confusing about the concept of cryptographic certificate authorities or anything, anymore than public key cryptography itself. It's a relatively natural extension of the SSH design you know already, is my point. Again, the man page is worth reading to understand it all a bit better.

> Also, users tend to hate having to always relogin to SSO; maybe that's because the implementations have poor UX, and maybe there's no secure way around it.

Well, I'll be honest, people who tend to use SSH and would be impacted by this stuff tend to hate lots of things and not always for good reasons. Put another way, listening to developers or whatever about what they hate and what's actually good isn't something I would factor into something like this. SSO is mandatory for very good reasons at any reasonable scale (and by "reasonable" my opinion is you should have it in place at, like, 10+ people.)

Anyway, besides that. There's nothing in theory that prevents you from doing something specific like having the backend refresh the SSO token issued for your SSH certificates every time you log into some server, upto some given interval e.g. logging in at least once a day seems reasonable, but if you login every 5 minutes to a new set of hosts you can refresh the token.

In my case the flow was something like 'my-ssh-ca-wrapper ssh user@bar', which would ask you for a token. I would then get this token by visiting a little webpage I wrote, but in theory it could also just launch the browser itself with xdg-open with a direct link. I just use a password manager to fill out those "SSO" credentials. It isn't ideal or fully integrated but in practice it would only take a few seconds and it's similar enough to corporate SSO setups. But yes, polish is everything for those final few steps. The actual backbone is pretty straightforward, though.


The SSO flow is way too complex for something that must work on an emergency.


> This makes it operationally challenging to reuse host names. If prod01.example.com has a hardware failure, and it’s replaced with a new host using the same name, host key verification failures will ensue.

A new machine should just have a new name. If one really wants to pretend that it's the old one, they'd better really copy it, including the keys. But even skipping that, sorting this out doesn't seem like a big deal (at least at a small scale; I suspect the article makes more sense in some scenarios than in others).

> Curiously, OpenSSH chooses to soft-fail with an easily bypassed prompt when the key isn’t known (TOFU), but hard-fails with a much scarier and harder to bypass error when there’s a mismatch.

Seems to me like a sensible behaviour for TOFU, not sure what's curious about it. Sounds like it implies that an unknown key is at least as bad as a different-than-known key, but that sounds wrong in context of TOFU.

> Once the user completes SSO, a bearer token (e.g., an OIDC identity token) is returned to the login utility. The utility generates a new key pair and requests a signed certificate from the CA, using the bearer token to authenticate and authorize the certificate request.

So the weakest point will likely be the SSO and the related infrastructure, instead of SSH and actual keys, and you'll probably depend on third-party services and/or custom/uncommon self-hosted infrastructure. Likely with a SPOF too. Doesn't sound good in general.

It probably does make sense in some organizations, but this particular setup doesn't seem to apply to all SSH uses, and to justify the title.


Again, the alternative to SSO's single SPOF is many SPOFs for each individual engineer with access.


Or you could automate key distribution and revocation instead of giving away the keys to your little kingdom to Google.

(Let's not pretend that "SSO" doesn't mean "let Google or Microsoft handle password storage for me".)


No:

(1) Authentication bypass to your email service is already game-over for almost every serious company.

(2) Google (if that's what you're using) has better MFA and access control than what you're going to roll yourself.

(3) Having multiple sources of truth for authentication is a corpsec nightmare, and companies that don't have that invariably wind up accidentally persisting access for departed team members or contractors, and, worse, no single place to consult for a reliable catalog of who has access to what, which is why if you poll CISOs at large-ish tech companies, they'll universally tell you than one of the first 5 things they did when they took over was get SSO stood up.

(4) The "automated key distribution and revocation" system you roll yourself will be jankier and less safe than the certificate-based systems that already exist.

(5) Because that automated key distribution and revocation system does not in fact exist, what you're really saying is that you're going to live with developers having long-lived keys on their laptops.

If you don't trust Google, set up Shibboleth or something; the Google stuff is a sideshow. But the idea that you should manage SSH authentication separately from the rest of your authentication is pretty unserious. I spent about 4 years, recently, parachuting into dozens of mid-sized startups, all of them clueful, and except for the teams that had SSO-linked SSH access, SSH management was invariably a total nightmare. The "just manage SSH directly" approach is, empirically, a failed model.


I'm sure it's different for large megacorps, but if you have less than 100 devs then the single point of failure in your SSO scheme is a far bigger security and operational risk than having long-lived keys on some dev's laptop.

> the rest of your authentication

What is "the rest of your authentication" in this context? Corporate email? As far as I know, SSH is the only real authentication possible here.


How so? Are you picturing that alternative as regular records in ~/.ssh/authorized_keys, or something else?


It depends on the scale. If a company has a handfull of hosts I'd argue that deploying the full AAA and PKI systems to back cert auth is doing it wrong.

Traditional ssh-key auth is simple and reliable, it's not until you have a large, complex and diverse user base that you need something more. That's why the huge fang sites use it. Every org doesn't need to mimic fang.


Exactly!

I don't understand this obsession with enterprise-level security on home networks and hobby projects. If you think it's fun and educational to set up, then you're doing it for fun and education, not security. If you're doing it for security, you're basically setting up anti aircraft guns to do what a drone jammer could do with way less resources spent.


Anti-aircraft guns can't actually take out most drones.

Good analogy.


I'm using SSH certificates to manage a few nodes in my homelab and it's a pleasure to not have to deal with managing the known_hosts file on my clients and authorized_keys file on my servers. There's only 1 line in my known_hosts for my nodes and authorized_keys doesn't even exist on any of my servers. If I add a new node to my homelab, I don't have to make any changes in known_hosts or authorized_keys in the existing nodes and it's easy to bootstrap the same known_hosts and sshd_config that I use everywhere in the new node.

SSH keys would make managing these few nodes a lot more complex that it is.


You really don't need to be anywhere close to mega-scale to benefit from SSH certificates and integrated authentication flows, though. Even at the scale of "only" 10 people with SSH access, the whole system can be massively simplified and made more secure by integrating centralized logins, and SSH certificates are rather perfect for this.

I implemented my own SSH certificate authority myself more or less, and while it's overkill for my own homelab-level stuff, I absolutely would never use anything else once I have more than like, 5 people logging into some set of machines. The benefits of centralized SSH access control that you can freely integrate (and pretty easily too, thanks to OpenSSH!) with your existing identity provider is really nice.


True, and while the title is somewhate clickbaity, I think your point was pretty clear in the article.


I still don't care about any of this nonsense for my personal stuff when I can avoid it. Passwords all day for me. 0 security incidents in my lifetime.

Sucks that Github and some other things force SSH keys which are just passwords except always saved to your disk so that anyone who steals your laptop gets access.

It adds insult to injury when you try to capitulate to this malarkey, generate a key in PuTTy's key generator, then Github whines that the default setting isn't overkill enough and you have to make a whole NEW key with some other setting. I miss the good old days.


> Sucks that Github and some other things force SSH keys which are just passwords except always saved to your disk so that anyone who steals your laptop gets access.

This is the reason to encrypt ssh private keys with a passphrase. If the key is leaked it's still protected by the password.

It's a built-in feature of ssh. For an existing key downloaded from a cloud provider, use ssh-keygen -p to add/change the passphrase.


How hard is it to crack these passwords? Since it's local (unlike the github password) I guess you can run brute force attacks etc at full speed ?


Yes it's local, but also can be taken away to run on a cluster. Looks like ssh-keygen is using 16 rounds of bcrypt_pbkdf. My laptop just took 185ms to try a password. So I guess I could run less than 10 passwords per second (per core?).

I don't keep an ssh key on disk though. I use my gpg key on my hardware security token, which gives you 3 attempts before you have to unblock it with a separate management password, which again you get 3 attempts at before the key is entirely locked.


The longer the better. A memorable sentence is a good place to start.

ssh-agent will cache the passphrase in memory, which helps avoid needing to type in a long phrase repeatedly.

But it's worth saying that if any private key is leaked (passphrase or not), it's time to revoke it and generate a new one.

Having a passphrase in place raises the bar from "key leaked, 3rd party has access to everything" to "key leaked, 3rd party has to now attempt to crack the passphrase". It mitigates a very bad scenario and buys time.


Pass phrase not password. You are going for length to protect from brute force.

Soylent Green is NOT people

Was one of my shorter pass phrases


I'm sceptical about the entropy of easy to remember pass phrases, including negations and simple capitalizations. Even when going for something like "correct horse battery staple", which requires a memorization technique to remember, the space of words we are realistically drawing from when prompted by a shell is probably not that large.


That’s what diceware is for: https://theworld.com/~reinhold/diceware.html


How many times a day did you typ that wrong!


That's going to depend on the length of your password. Longer is more entropy and orders of magnitude more difficult to 'brute force' with each character added.


Yes. This is precisely why passphrases are a bad idea - people tend to use their easy-to-remember default password, which gets compromised along the way if an attacker can get their hands on the key file and throw their full processing power at it.

SSH certificates are a solution to that problem.


What mechanism is used to protect the key of the certificate authority?


That’s a different situation - the CA key resides on some high security server, not a developer laptop that may get stolen or compromised by ordinary usage.


sure, but that's why you're using a password manager that lets you generate 24 character mixed everything random passwords and use them easily, right? Right? Guys?


I miss the good old days before passwords were a thing and everyone just trusted others to behave ;-)

You can secure access to Github (and other places) with hardware keys, e.g. from https://www.yubico.com/.


But you add a password to the key, so it's the same.

And not everyone saves it to disk. My ssh key is my gpg key. It's stored on a yubikey and can't ever leave it. If I do a `git pull` then my yubikey flashes and I have to tap it to allow that connection to happen. Steal my yubikey, well you can't unlock it. Hack my laptop and you can't tap the key.


> I still don't care about any of this nonsense for my personal stuff when I can avoid it. Passwords all day for me. 0 security incidents in my lifetime.

Even for personal stuff, why would you want to use passwords? Keys are more secure AND more convenient. Sure you don't need certificates but I don't understand how keys are more 'nonsensical' than passwords.

Keys have more flexibility, you can use SSH Agent, you can do SSH agent forwarding, etc.

> except always saved to your disk so that anyone who steals your laptop gets access.

This is wrong. First of all, your laptop should have disk encryption. Always. I don't care what your threat model is, encrypt the disk. Second, SSH keys can (and SHOULD) have a passphrase.


Ever don't the math on the amount of time spent entering passwords? I have. Stopped using passwords for personal stuff.

I multiplied by cost per min for downtime in professional support and the cost of typing passwords was more than my yearly wage.


This contains several misconceptions.

Keys are not just "passwords saved to disk". My private keys exist in hardware, on Yubikeys. They aren't on disk. The hardware requires authentication to access.

You're typing your passwords zillions of times. I log in to my system once, authenticate to my HSM once, and I can access many hosts via scripts and automated tools. This is impossible to do securely with password auth.

You're also training yourself to manually input the entirety of your authentication credential multiple times per day (or hour). This is bad practice, as anyone stealing it then has the keys (ha) to your kingdom (and they have way more opportunities to steal it!). Even if you just replace password auth with a password-protected key on disk, and don't use a password-caching agent that holds the decrypted key in ram (as would be typical), so that you're still typing your password each and every authentication, you've raised the bar substantially because someone would need to steal your encrypted key from disk in addition to obtaining your password.

Then there's the issue of cycling credentials, and the mental loads involved. I can cycle my keys without changing my workflow or having to type anything differently.

Passwords are not good authentication tools. Use actual cryptography.


not if you encrypt your ssh keys, which is what everyone I know does - then your potentially weak password requires physical access to exploit while things accessible over the internet can't really be brute forced.

further, this is even more convenient when paired with an ssh-agent that will securely hold your private key in memory and not allow anyone to export that key...you could dump the memory but that would require root access, which again should be password protected


My servers expect an environment variable to be sent too, (new sshd and ssh can do this). This gives basic 2fa without typing anything.


The github change also messed up my workflow, which involves pulling/cloning my company's git repo from lots of machines, many of them being short lived or disposable. Now I have to save the password forced on me in a file because I'm unable to memorize it easily and that made our setup less secure. Thanks github...


Passwords are safe if you can memorize them. It is not too hard in my opinion. I also think they should always be an option for any kind of auth. Maybe I want to authenticate against a system but I don't want others to know my ID. For that use case a password is the better solution.


The SSH servers that I'm familiar with are spun up with a host cert, so all of the FUD in this article about connecting to an unknown host is a non-issue. Check that the host cert matches the one you expect once, and the tooling makes sure to notify you if it changes.

As far as provisioning, maintaining a secure CA signing practice is a nightmare. It's K8S level of self-inflicted pain for a startup. If you're running at a larger scale and can dedicate a team to it, fine. If you're a dozen people trying to launch, getting the devops guy to run `ssh-copy-id` is not the challenge that this article makes it out to be. Nor is the slightly more automated Terraform script that installs and uninstalls authorized keys from servers.


So now simple and reliable SSH keys must now be replaced by a far more complex security architecture with a lot of interworking parts and a full-blown PKI and certificate authority, and the opportunity for any of the nodes to DoS the CA and prevent me from logging in.

I guess it's time to toss out my local (self-hosted) Userify setup that has been reliably working for years, where I can just instantly update my keys across all servers, and still log in even if some bad guys start DDoS'ing my Userify host, and just switch over to certs.

Oh, wait, now I see. This is a sales pitch for their web-based SSO app. If you don't use it, you're "doing SSH wrong". Good to know.


As long as it is not as easy as let’s encrypt, it won’t take off.


I toyed with SSH certificates a few years ago. They seemed cool and secure but, ultimately, the x509 stuff was quite (!) arcane. And trying to get the IT team on board would have been a nightmare.

I quite agree: as long as it's not as easy as Let's Encrypt then it won't take off.


SSH with x.509 requires patching the SSH client and daemon. It is an unofficial patch to add the functionality.

SSH certificates themselves are similar to x.509 but a lighter version. They are supported out of the box.


man ssh-keygen lists me this:

> Note that OpenSSH certificates are a different, and much simpler, format to the X.509 certificates used in ssl(8).

Where do you work with x.509 certificates when doing an SSH ca? I use SSH certificates productively and only ever used ssh-keygen commands.


Like I said, it's been a few years so maybe things have changed since then. But if I recall...

x509 supports all of the things that ssh key file format does ... and also a lot more. ssh supports x.509 too so that makes it "easier". When you use `ssh` to connect, you'd specify -I and point to the private key and it will automatically try to find a certificate file whose name is identical to the identity file but with "-cert.pub" appended (see `man ssh_config`). There's another option to explicitly specify the key certificate file `CertificateFile` but there's no short-argument version so you have to use `-o CertificateFile [filename]` or add that to your `~/ssh/.config` file.

You'd need to use `openssl` command (or, of course, any openssl-like command line) to sign the your SSH key. And, of course, openssl doesn't understand ssh key files so you have to use `ssh-keygen` to convert the key to x509 format or else generate the key using `openssl` (but again, at least ssh understands x509 natively so the downside is having the much longer x509 text). And that's quite the arcane part: openssl cli is fucking awful. It doesn't follow normal command line conventions, doesn't have tab autocompletion, and its documentation is obscure/difficult to find, extremely terse, and difficult to even understand if you're not already explicitly familiar with exactly your inputs and outputs.

And then there's the whole process of getting your key file signed. At least that process is (in general) identical to having a web SSL key signed -- because the private key is actually identical but the only nominal difference is the format of the file that you normally think of using for it. But the workflow is different because the certificate doesn't get installed to somewhere that a webserver would want. I had ended up creating my own Certificate Authority to test with and that was yet another rabbit hole of anger management.


Yeah, you should re-educate yourself. Like i said, i used SSH certificates managed exclusively with ssh-keygen, i don't have any connection between the OpenSSL and SSH cryptosystems, nor do i see a point in that. My advice: Disregard any blog posts mucking SSH certs with OpenSSL and take the same time to read the ssh-keygen manpage.


> Yeah, you should re-educate yourself.

Thanks! That sounds quite condescending. I'd educated myself by reading about ssh keys in ssh and wondering how to use them.

Perhaps you should make your own blog post describing how to do it all using only ssh-keygen then.


Here is my "blogpost": https://www.man7.org/linux/man-pages/man1/ssh-keygen.1.html . The signing process is the `ssh-keygen -I` part.

The source of truth (in sync with your currently installed version) is right on your disk. Invoke `man ssh-keygen`.

Sorry for my tone. I'm pissed that, every second day, some person comes along and demonstrates to me that I'm seemingly the only person in the world who is reading the documentation on the tools we all use. Like, how do you even know what you are doing?


And yet: I did read the documentation and using openssl was the solution I had come up with. Perhaps I was using a nonstandard version of ssh. Telling people to "just read the documentation" is quite condescending when they're demonstrating that they've come up with a different solution. It was years ago and I don't remember what version I was using.

Indeed, using ssh-keygen for the whole process certainly seems easier.


I'm sorry for my harsh words.


I didn't know people thought Let's Encrypt was easy. My first foray into setting up certificates for a web server let me realize how convoluted the system itself is, not just specifically Let's Encrypt. I guess it's easier compared to the alternatives. But holy cow I hate the process.


yeah, it's really confusing.

Many folks are comfortable with jumping through the hoops (following instructions) WITHOUT understanding what's really happening.

Those of us who aren't comfortable with "just works" need to plow through a enormous amount of persnickety jargon and easily forgettable material to reach an understanding that gives us confidence.


Thank you for this reminder! It has slowly dawned on me over the years that when "everyone" thinks something is easy and I don't, it's often because we're comparing a surface level understanding vs. a full grok.


Server setup isnt hard. No need for authorized_keys files anymore as you just trust signing server instead. What is hard is user experience. Sshagent doesnt support the certs. Putty (most popular client on windows?) doesnt support certs. I want to use them but need better client support. Vault makes almost easy to create and manage certs but have to use openssh client only with extra argument every time.


The USP of Step is to make setting up a custom CA as easy as it is with Let's Encrypt


What seems to be missing from this is how I assign permissions to individual hosts. When I'm using public keys, I do so by only adding the user's public key to the hosts I want them to access. It seems the way that certificates were presented that adding a user implicitly gives them access to all hosts using the same CA. I'm sure there's a solution to this problem, but does anyone have any pointers?

For example, I want hostA and hostB to use the same CA. But some users should only have access to hostA but others should only have access to hostB. Others may have access to both.



Thanks! This seems like a pretty simple approach to implement but I'd also imagine it would scale reasonably well after automating various pieces.


For scalable, centrally managed solution see: https://goteleport.com/docs/access-controls/reference/#rbac-...

Disclaimer: day work.


Set the principal name in the Certificates to the name of the users team.

On the server side, set the AuthorizedPrincipals* for root to the list of allowed teams.


Simply don’t create a user account for these users on the hosts they shouldn’t access.


SSH certificates make sense. But can you use hardware backed ones like the OpenPGP applet on yubikey with this?

I currently use this method to store my SSH keys safely. But I don't know how this would work with certificates. If I have to store them in the computer instead of a hardware token it's a huge step back in security.

By the way what do home users use to set up a PKI? Scripting everything with OpenSSL is but very nice. It would be cool if there were an open source PKI platform ideally even with IDP built in. With a nice web interface and easy to install with docker. Never found one though.


Yes, ssh-keygen allows you to use private keys from any PKCS#11 backend using the -D option. This includes smartcards and tokens.

Including for the CA key.


Ah Yes I used this method before with PIV cards and OpenSC/OpenCT.

The toolchain is a real PITA though. Unstable, difficult to provision. Proprietary tools for card management. Not all platforms support PKCS modules. As far as I remember openssh on macOS was compiled with support for it. So I had to replace it with one from brew which is much harder these days. Mind you we're talking 5-6 years ago.

OpenPGP is really user-friendly. Nice config menu with gpg --card-edit . SSH agent functionality built into the gpg agent.

I kinda want to retain this level of comfort to be honest.


Why use gpg at all? SSH supports FIDO.


Good point but it's not well supported yet. There's no mobile clients I know of that support it (let alone with agent forwarding which I really need for jumpboxes).

And of course once you do fido, you're back to the same issues around SSH keys that certificates are a solution to (as the article demonstrates). So moving to Fido is not a whole lot better than using SSH keys which work very well everywhere.

Also, another important point: I also use GPG a lot to encrypt files. It's great to use the same key (sometimes, for sensitive stuff I use a different OpenPGP card) and toolchain (always) for this. So I need it anyway, might as well use it for SSH authentication as well.


I tried the FIDO2 way of authenticating to SSH when I finally had a Linux system with new enough versions of OpenSSH et.c in the repos.

It works quite nicely, but the PIN has to be entered every time I auth to SSH. This may be a desirable feature to some, but I prefer the GPG way of the PIN being cached until the key is removed from the system.

(This could possibly be a shortcoming of KWallet, as it would pop up the dialog asking for the PIN, but checking the "remember this password" would achieve absolutely nothing, and besides, I wouldn't want to save it permanently, which that checkbox would otherwise do.)


Ooh yeah I definitely don't want this either.

I want GPG to ask for the pincode once, and the yubikey to require a physical touch for each authentication (including the first, obviously). This way it can't be automated by malware either (a pincode can be sniffed and replayed through the keyboard driver, the physical touch can't).

The PIN is great against attackers that find your key on the street. Not against a determined attacker that is already on your computer. For that the physical touch thing is a great solution (though the yubikey doesn't require it by default, you can easily turn it on).

Asking the pin upon first use and a touch every time is the perfect compromise between security and usability IMO. There's still some weakness around attackers with physical access but they are more easily mitigated.


I don't have any PIN entry at all with FIDO.


But then someone with your key just has direct access?


It's a second factor, not first. This is fine.


But this is not FIDO2 in CTAP (passwordless) mode, which is what the previous poster referred to. It's FIDO1 and it's not supported by default in SSH, only with some PAM plugins.

OpenSSH supports FIDO2 passwordless mode natively in the latest versions.


FIDO is still of extremely limited availability. I'm still running into hosts I need to access that can only use RSA/DSA keys, as opposed to the ed25519 key on my Yubikey, nevermind FIDO.


The 'ideal ssh flow' involves a program that I think the author has written and a website login. Do these certificates require an existing system running single sign on or similar to hand out access to other machines, in single point of failure fashion?


I'm not doing anything wrong.

My sshd is behind spiped + logging in requires me to physically tap my yubikey.


Isn’t this the process that Teleport streamlines?


Teleport completely streamlined all sorts of access management in my projects. Every new feature they release (Kubernetes, Database, App access) work as advertised and the Helm chart + Terraform plugin (which for some reason is not published in the marketplace??) are great for automating the whole deployment + configuration. Great piece of software.


The article does a pretty good job explaining advantages of certificates. It overlooks the existence of solutions to some of the individual problems it mentions with keys, though. Tools like sssd exist, with which a way for the end user to keep their public key updated in a directory takes care of key distribution and at the same time limits who can use sudo, all without changing any files per-user on the server.


> you’re doing SSH wrong

I understand the theoretical superiority to keys, but do we have some data per practically how many times key security actually failed someone?


I would very much like to read about that too.

Setting some sane security parameters for your SSH setup looks like a less jarring/drastic approach into securing SSH further[1]:

- Use keys.

- Allowing only strong cyphers.

- Remove weak primes.

[1]: https://disknotifier.com/blog/simple-ssh-security/


Aside from being a pain to set up, what’s wrong with using GSSAPI / Kerberos? (within an org)


Yeah, my org had a Kerberos setup for some DB/middleware/CLI tools. I inherited a bunch of random script tools that took the whole typing in passwords for everything. Screw that, first thing I did was write a Kerberos module and set up a Kerberos keystore. Presto, no more typing passwords for everything. Typically it ends up boiling down to whether or not the backend thing you're talking to supports Kerberos, many don't and only have the whole SSL thing. A well setup Kerberos was much nicer to work with than chained SSL certs.


Kids theses days hate using technology that's older than themselves? Kerberos is amazing and it make me sad to see all these people reinventing stuff that was solved 30 years ago. The number of companies selling SSO and doing it poorly (see this weeks okta hack) is unfortunate.


21yo here, I think Kerberos is bloody awesome, but then I was introduced to it by a somewhat old timer, who showed me its benefits when properly integrated in the company.


I've long ago made up a corollary to Greenspun's tenth rule; any sufficiently complex or mature access regime will re-implement half of kerberos, poorly.


This typically still involves typing in your domain username and password, no?


I wish revocation was covered as well. The article mentions this issue in SSH:

> Keys are trusted permanently, so mistakes are fail-open.

But you can make the same mistake using certificates, by issuing a certificate with a large expiration date. AFAIR you _can_ revoke valid certs, but that involves making changes to every box running ssh, much like when revoking a public key.


You can distribute a key revocation list along with your regular configuration management. The key revocation list can be generated and maintained via ssh-keygen.

I used to have a central location with an revocation list. All servers had a cronjob fetching it.


And that's pretty much the exact same infrastructure you'd need for distributing authorized_keys.

Instant revocation seems to be a giant hole in this article's argument which isn't addressed at all. If you fire somebody partway through the work day, end-of-day cert expiry is not good enough to prevent compromises.


Active reconfiguration doesn't work well if your machines aren't unconditionally reachable.


Hm, that is true.


Previous discussion from 2019: https://news.ycombinator.com/item?id=20955465


SSH certificates are great in theory, but the whole certificate management, ad-hoc issuance, and revocation require boatloads of infrastructure. If you do it right, certificates will be signed as needed and have a short validity period, say half an hour or something. That means you need an automated signing application, or a very cheap full-time certificate manager.

I’ve actually started working on such an app recently, including a web portal, CA rotation, automated configuration distribution, etc. Still far from usable, but if you’re interested in contributing: https://github.com/Radiergummi/fides


SSH certificates are useful in large environments when scaling, automatic onboarding and offboarding are important, but IMO, small teams can (and should) continue using authorized key files as they have for years. They don't really need these features.


> What you’re supposed to do is verify the key fingerprint out-of-band by asking an administrator or consulting a database or something. But no one does that.

Actually they do where I work, pretty much every time, and I didn't ask them to. Eyebrows will raise even higher if I forget to notify everyone I replaced a server at a domain causing a new signature (resulting in the scary "possible MITM attack" message). This is a good thing, but I should probably make it more efficient by publishing the fingerprints.

Although the article does point out this specific disadvantage of domain reuse with known_hosts. I can see why this solution could make things easier at larger scales.


> I should probably make it more efficient by publishing the fingerprints.

When I last ran a fleet of servers, I published the known hosts files via git, and strongly suggested using that in new user documentation (you needed to setup ssh_config for our jumphost/bastion anyway, may as well link to the known host keys). There's a tool to generate the files that comes with openssh, iirc.


Agreed--SSH certificate authorities (and principals) are powerful things that can be used to manage SSH access at scale. My workplace is a large enterprise that uses our own CA for getting access to systems--the keys it issues are good for 8 hours, then we have to grab a new key (using an internal utility).

For anyone who is interested, I put together a little playground which can be spun up in Docker that allows you to play around with and learn how SSH CAs and Principals work:

https://github.com/dmuth/ssh-principal-and-ca-playground


I would love this to work, as it indeed fixes several issues addressed in the article.

However, I don't think openssh-server supports OCSP natively, so while you might be doing SSH right, you're doing certificates wrong.


OCSP natively? No.

Does the current OpenSSL have its own KRL?


I have built a small signing service that works by a user SSHing in, performing LDAP (password) authentication and 2FA with duo, then injecting a time-limited certificate signed by Hashicorp Vault back to their user agent (although it could be modified to remove the Vault requirement). The UX is very simple (SSH to this address once per day), but the backend is complicated, so if there is any demand for me to put this on Github I am happy to do so.


Did it make a good use of `AuthorizedKeysCommand` option of `sshd_config`?


If this is true then why does ssh not get set up this way by default?

Honestly what I would prefer is a better (and faster) version of monkeysphere [1]. That was honestly the most natural-feeling solution to SSH security.

(1) https://www.systutorials.com/docs/linux/man/1-monkeysphere/


Because it's not suitable for everyone in all circumstances.


So if I am not using certificates them I am not doing it wrong? :)


Yes, SSH certificates are the way to go and pretty easy to set up. But what these articles fail to address is the user management aspect.

For the SSH certificate to be accepted, the unix user must first be present on the system. As far as I can understand, FreeIPA(or similar LDAP systems) cannot be used in conjunction with SSH certs. Whereas SSH keys are supported by these systems.

Can anyone provide any insight/experience with this?


Its not the username that needs to match, its the principal. You can allow any principal for the root user, for example.

You can define principals when allowing a CA via authorized_keys, or you can configure allowed principals globally using sshd_config directives like AuthorizedPrincipals* .


Many years ago, I did all of this with an LDAP system. Public keys were generated by the user and entered into LDAP (or you could auto-generate keys, etc). Users were authenticated with their ssh key (stored in ldap, password based access was restricted). Authorization for access to each host was also in LDAP, as was sudoer status (as a group setting).

It was actually quite an elegant setup. You would still need to setup a CA for generating local certificates for TLS connections to LDAPS, but the auth was handled all in the LDAP server.

I think the main downside would be trying to have the authentication overhead on a single server (the ldap server) when you are dealing with many hosts. Over a handful of systems, it’s great. But it doesn’t scale when you’re taking thousands of hosts (or cloud vms that spin up/down).


In most circumstances, you want these two things (user is authorized on the system, user can be identified and authenticated) to be different. Having a process that creates the user on system in order to authorize them to login is pretty similar to all your other configuration management tasks.


I imagine it's a matter of automating the certificate insertion on the target servers when it's updated on the user's account in the LDAP server. In other words, it depends entirely on your systems and how far your administration is willing to go to automate it.


I disagree.

SSH keys seem great at first, 4KiB ~ 8 KiB public/private key-pairs are tremendously more secure than something like an 10-character password. The math checks out at an academic level, but the implementation has a glaring flaw. One cannot easily ensure private keys are themselves protected by a 10-character password unlock. Put another way, people using private keys NOT protected by a secret pass/phrase are super vulnerable to compromise. For example, physically take the laptop that contains the private key, and BOOM!

That's the jist. Private keys can be setup with passwords, but the person in control of the private key can change their key's pass/phrase at anytime after, so straight-forward key escrow strategies don't work. Inspecting the public key does not indicate the associated private key has any protection, and that's good insofar as one key not leaking information about its' counterpart.


Stuff like this is really cool. But the problem is that certain clients (as in clients that pay your bills) can't even get public keys to work and we end up allowing password logins. So I don't know how I could make this work in the real world.

That's where security ends, when 'it just has to work' because 'they're the ones paying/in charge'.


You can always host your own SSH CA pubkey server.

It is called an “AuthorizedKeysCommand” in `/etc/ssh/sshd_config`.

https://jpmens.net/2019/03/02/sshd-and-authorizedkeyscommand...


The comparison to public-key authentication is wrong. Introducing cert authentication brings a third party: public key infrastructure (which isn't addressed at all and is described as some type of security panacea).

The more apt comparison should be made to other authentication technologies like, in particular, kerberos.


I love ssh certificates for access and indeed we're using this for accessing our production network using a few small pieces of home-grown infrastructure.

However, there's one big issue nobody is talking about: There is zero support for certificate authentication in any SSH clients for iOS and most people who have network access here also have iOS devices.

And even if there were support, just supporting the certificates alone is not enough - there would need to be some automatable way of getting a new certificate into an app as the whole idea of the certificates is that they are very short-lived (days or even hours if possible)


This is definitely true... and a lot of people are doing it wrong.

This is one of those things that drives me nuts as an experienced/old developer, seeing people type passwords for ssh/git/whatever several times per day. Sometimes there are tasks that require copying / checking some file on N servers, and these people seem to think that cannot be done in a shell script because the password needs to be entered interactively.

Then there's ssh port forwarding, X11 forwarding, etc... but its amazing how many people use ssh for years without so much as glancing at the man page.


It is amazing how many people use any CLI tools for years without reading the man page.

The man page for the shell (man bash; man zsh) is a good place to start.


Yes that too... "the kids of today" seem to regard basic shell tools and scripting as a dark art rather than an everyday part of using a computer and doing development productively. It's kind of sad.


I can't speak for others but personally I've never built up enough motivation to learn shell scripts (and related hackery like awk and sed) properly, even though I've learnt maybe 5-10 programming languages quite well and use the shell interactively all the time.

I can't explain with certainty why, but I think it's due to lack of discoverability, inconsistent conventions for flags and positionals, esoteric syntax for simple control flow, lack of errors/feedback for things like undefined variables, no scoping/namespaces, unclear type system (not asking for much, strings, bools and ints would suffice). That said, piping/streaming is amazing and often better than in modern languages.

In short, it's quite different from other imperative languages - the design feels arbitrary and the learnings non-transferable, even though I know it is useful and ubiquitous.


IIRC, one could have said the same of "the kids of" twenty years ago. It still blows my mind that I didn't learn scripting at university. Could it have been because all the CS classes insisted on tcsh? My first job, I was using an internal tool and thought to myself "this could be better". My boss said "call this guy", and a brief phone call completely changed my understanding of Unix.


> these people seem to think that cannot be done in a shell script because the password needs to be entered interactively.

That's what autopw[1] is for. ;p

[1] https://github.com/jschauma/sshscan/blob/master/src/autopw


Keybase had an elegant solution I use https://keybase.io/blog/keybase-ssh-ca


I used to like Keybase so much, it felt like a Next Big Thing. I just wish they had stuck to identity management and validation, providing SSO etc. Instead they tried to be a chat client, git host, and crypto wallet too. They spread themselves too thinly trying to compete with dozens of rivals for each function they added.

I wish they'd have been a standard that Microsoft, Apple, Google etc provided implementations of.


AFAIK unless something SSH passwords don't use any kind of PAKE or zero-knowledge. It just straight up sends it to the server after authenticating, as the password box on a website login page would.

Really missed a great opportunity to add perhaps even more security than the server cert can(Because of how many users override it).

Come to think of it, website logins shouldn't exist either, that's way too common and I don't see why there's no PAKE based http basic auth feature.


Too bad you can't do that with dropbear


Which is why I installed openssh-server on my OpenWrt hosts. However, due to a bug [0] in openssh-server package in the latest 21.02.2 release of OpenWrt, OpenSSH doesn't allow you to login in failsafe mode on an OpenWrt host. Because my OpenWrt host lacked a recovery mode, it was essentially soft bricked.

I was able to recover it using the serial port but even after all this, the comfort of using SSH certificates on all of my nodes was enough to keep making me use it instead of Dropbear.

[0]: https://github.com/openwrt/packages/issues/17833


Anyone using GitHub as a public key repository? I found it very convenient to set up a cron job that pulls https://github.com/USERNAME.keys into authorized_keys and SSH from any client that has GitHub access.


Another headline that completely ignores the concept of threat modeling. If you're preaching about what's "right" and "wrong" regarding security without considering the threat model, you're only doing harm to your readers.


Great article. All of your internal authentication should be using certificates. Web auth, Wifi, VPN, SSH

In the late 90s we came close to having this for the public internet as well but it never caught on. We paid the price with endless breaches and unmanageable credentials.


I am only disappointed that SSH certificate didn’t leverage OpenSSL for their powerful capability.

I guess that’s the small price to pay for speed of development without going through the international committee of ASN.1


dumb question but .... isn't it a problem that private ssh keys are stored in ~/.ssh and that any random app, npm dependency, build script, etc could copy them across the network?


You can password protect them and optionally load them to agent at logon.


Anyone have experience with using SSHFP records to avoid the so-called anti pattern of trust on first use?


Biggest problem with SSHFS RR is the trustworthiness of DNS to deliver the answer record.

Most everything do not enforce their DNS resolver to only return the DNSSEC-verified Answer RR.

Not that problem at all if you set the resolver to return only the DNSSEC-verified answer RRs; then again, most common websites would then stop working simply because they don’t use or have a proper setup of their DNSSEC overhead.

Most implementation of distribution of the SSH public keys are delivered under cover of TLS, IPSec, or variants of secured tunneling just because … because it IS A metadata.


If you trust a third party you doing ssh wrong :)


Who really wants to run a CA though?


Cool, how do I set up a CA?


All you need is an ssh key, which you can generate like this:

  ssh-keygen -f ca.key
Then you can generate certificates like this:

  # user key
  ssh-keygen -s ca.key -I key_id /path/to/user_key.pub
  # host key
  ssh-keygen -s ca.key -I key_id -h /path/to/host_key.pub
Secure ca.key according to whatever level of paranoia you desire. e.g. Passphrase, hardware security module (PKCS#12 is supported for generating certs), airgap the machine. anyone who gets access to ca.key has access to everything that trusts ca.key


Precisely this. From what I've read it isn't that easy to setup a CA.

Look into step-ca though, I've heard it's.. Okay? I don't know. It seems too complicated still - I'd rather stick with pubkey auth


Setup Hashicorp Vault. Almost easy but actyally hard to do right. Policies are easy to make too open and possibly insecure.


> Users are exposed to key material and encouraged to reuse keys across devices. Keys are trusted permanently, so mistakes are fail-open.

What? This just sounds like you're doing it all wrong then blaming the tools.

Maybe I'm just thinking about it from my POV as a mainly hobbyist user of SSH, but I rotate my keys (not as frequently as I should, but I do), and I remove old ones from .authorized_keys files and I use different keypairs for different servers sometimes too. I never copy keys to other devices - I ssh-keygen on every device, and add my pub key to the authorized_keys file manually.

It isn't even "hard" to do it this way. Sure, it doesn't "scale" for big corps - but I don't need it to scale, so I'm not "doing SSH wrong" by not using certificates.


I decided that I wanted to automate my rotation, so I built a shell script to do it, then wrote about it.

If you Google "ssh key rotation" you will find my article as a "featured snippet."

I don't think that I need a CA, as I only have personal and admin keys (two sets) and I'm flipping these once a quarter. Plus, I don't have expired entries in my authorized_keys (do CA users ever clean out authorized_keys?), and these are very clean as I see them regularly.

https://www.linuxjournal.com/content/ssh-key-rotation-posix-...


It's more effort than I would like. I should not be bothered with updating hidden config files containing data that is mostly not human readable. I want to prove my identity once and let the system take care of the rest.


Well that's fair enough, but I take issue more with the title of the post telling me I do "ssh wrong", because I can't be bothered to setup a CA and fuck around with certificates.

To me, setting up a CA is more effort than I would like.


It becomes feasible when you have a larger number of key pairs that are supposed to have access to the same set of machines. I did it as a private person because I'm an SSH nomad, using several clients with different key pairs each.

I agree with you, for a regular user with a single client device (or two) its not worth it.


It's very much a use-case and risk driven decision. A company should be using Teleport, which is a lot more than just certificates (but they do use certs). For your personal VPS or GitHub account, nobody is going to go out of their way to get your SSH keys.

The biggest "you're doing it wrong" I see is people who disable host key verification because their servers' IPs change constantly. Do you want MITM?! Because this is how you get MITM! Might as well use Telnet for connections.


> brew install step

The audacity of mac users who think everyone uses a mac.


The same audacity of how tos with "apt get" while lacking any distro specific language.

Linux desperately needs a standard for writing how tos. Even better would be definitive how tos for every common thing a user might need to do being posted and maintained on a distro owned site.


Right? Most folks just need dnf.


I work at smallstep and we support a wide range of Linux distros already. https://smallstep.com/docs/step-cli/installation

One of the things on my punch list is to get step CLI into upstream Fedora, Debian, and Ubuntu to make installation a bit more easy.


Bro you almost made me spit out my latte.


Brew has been available on Linux for a long time and works very very well.


Sure, but if you're not on a Mac, brew shouldn't be the first tool you reach for. You can use brew if you've transitioning from Mac and you need to time to get acclimated, or if the software hasn't been integrated into an the local OS's package management system yet.

Having said that, `brew` appears once in passing in the entire article. This article is almost completely about ssh without regard to platform. There's nothing "audacious" about suggesting users use brew, so the anti Apple sniping here is entirely unwarranted.


There are other OSes than Linux and MacOS, in fact, one of those other OSes is far more popular than both of those put together: Windows. It has built-in openSSH support since 2018.


I'd argue windows isn't terribly popular, in that popular means "people like it more than not", but rather is used often, if grudgingly, by folks who have experienced alternatives or are unaware objectively better options exist.


I use Linux (baremetal on a System76 machine, and WSL) and Windows daily. I can’t stand OSX. As in literally want to trash it and the device it’s installed on within 10m. It’s a beautiful OS, but as a windows manager, it’s literally unusable; at least with a QWERTY keyboard. Like what genius put cmd+q right beside cmd+a, with cmd+q not even prompting the user? I could rant for days… so I’m just going to stop.


I also use Linux -- when clients require OSX on furnished equipment I cringe a bit :)

There are orders of magnitudes of UX difference between how poor Windows is versus OSX and Linux, in my mind. OSX is still worse than Linux, in part because of the inconsistency of key mappings and lack of options to make things consistent.


And they have an open issue for producing a chocolatey package: https://github.com/smallstep/cli/issues/365


There are cases where if you leave SSH enabled then you are doing it wrong. If you only need SSH to do a few things once a while, you should just turn it off, and only turn it on when needed. In these situations, I feel it’s OK to just use password to login.


Honestly, I agree. My ssh passwords aren't going to be brute-forced anytime this century. It's also pretty easy to put a fake sshd on port 22 and set the real one to some other port (preferably one low enough to still require root privileges though).

I don't have encrypted drives on all my devices. I don't want to have to worry about what could happen if one of those gets lost/stolen. I'd rather not leave keys or certificates lying around.

Also, things sometimes go wrong and I need to get access to a server from a device I've never been on. It's nice to be able to do that. Passwords do that.

To be fair, I usually have a single VPS which I keep as locked down as possible that has VPN access to the server I really need. The VPS doesn't even need to be running most of the time. So I can spin it up to get access to the VPN, then ssh into the server with a password. If the VPS gets compromised, the VPN alone won't give an attacker immediate access to the server like it would if I left keys / certificates on there. I have to trust the VPS, and if it gets compromised without me noticing, and I then log in to my server, yeah, I'm SOL, but certificates don't solve that problem.


How do I turn it on if I can't SSH into the machine? In most cases, SSH is what you use to do management tasks.


Many hosting services have a control panel to do this on VPS. Some hardwares like Synology also have web interface to do this.


Ughh. Go read up on ACME (Let's Encrypt). Unless you run your own Certificate Authority root, or configure things very carefully, using TLS certificates grants host level access to your DNS provider, and every organization that reliably routes external traffic to your host.

To what end? The threat model rekeying tries to protect against involves compromised authenticated client machines.

Once you have those, the attacker has shell on the server, and it's game over. There are these things called "advanced persistent threats" that have been in the news a lot already.


This article has nothing to do with Let's Encrypt, ACME, TLS, or DNS.


I'm tempted to write a competing article: "If you're using SSH in 2022, you're doing it wrong"

There's no need to open port 22 if tools like AWS Session Manager (and GCP's equivalent) are available to you.


But even if you do it through that, SSH is a much nicer protocol than typing on a remote console. You get file transfer, X, agent and port forwarding, terminal window scaling and much lower bandwidth.

Also, not all servers are on cloud platforms.


Is it possible to run Zmodem over this AWS pseudo-console the way you can over SSH?

Not all servers are on cloud platforms, but there are somewhat comparable ways to shunt a serial or out-of-band management console over the network.


Not sure, I haven't used AWS much. Most of the web consoles I have used are like VNC. So no, not really possible to run anything like Zmodem over.

Not that I'd want to either, of course. Bringing an 1988 solution back to fix a conceptual 2022 problem does not sound like a great fix :)

I understand that in some workload types you want to have full autodeployment on servers, using ansible, kerberos, whatever. In that case interactive login is never needed.

But this is a very specific subset of 'servers' in my opinion. A lot of HN contributors work in this so this approach may work for them but it won't everywhere.


Don't forget about the no-ops "If you're opening a shell on any server, you're doing it wrong (2022)" method.


What's next? "If you're using a computer, you're doing it wrong"?


"If you have data, code, or processor execution, you're doing it wrong."


That's really not that far off from what some people think. "You should never touch your own data, that is what the cloud is for."


Now this is where I want to go! I mean using technology in 2022 feels so outdated. Don't we have cloud to do everything for us.


There are actually people working on things other than CRUD web apps!


And can I access my university's server with that? My office computer from home?


I think you're being downvoted by the same nerds obsessed with self-hosting. "But how can I self-host without ssh?!"




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: