Since the author is on here - reckon you could beat up on simple random tokens a _little_ bit more? In particular how easy they are to identify and prevent leaking (easily fixed by adding a prefix).
I work on secret scanning at GitHub. When token issuers use easily identifiable formats for their tokens we can easily spot them when they're accidentally committed. We can then work with the token issuer to automatically alert them of those leaks. A good example is AWS - if you commit an AWS key and secret to a public GitHub repo we will tell AWS about it and they will tell you about it (and quarantine the exposed keys) within a few seconds. We work with dozens of other token issuers too, though - some of the latest we added were Linear, PlanetScale and Ionic.
The above relies on tokens being identifiable - we can't send hundreds of partners everything that looks like 32 hex chars. In future we want to be able to do even more sophisticated things, like ask users for confirmation before the push code that contains secrets. We recently changed our own token pattern for that reason.
I bring this up every time this is mentioned, but I really wish the API token format included a domain to notify in case of leaks. Having services register with your in-house secret scanning system works very well if you're GitHub, but otherwise it's a very closed mechanism.
If sendgrid tokens were `secret:sendgrid.com/91on9SIkbUfSs` instead of `SG.91on9SIkbUfSs`, or Amazon keys looked like `amazon.com/JGUIERHT` instead of `AKIAJGUIERHT`, we wouldn't need a database of regexes and endpoints to report secret leaks.
Appreciate your passion here Remi. I don't think a full standardisation of API token formats is ever likely, but I do think there's value in nudging things in that direction.
One big challenge is that it's hard to get service providers to change their token formats. Very few have this at the top of their priority list - they're busy with other things. Here's an example playing out in OSS that is pretty typical: I tried to persuade the (excellent) team at Sentry to update their format, and they essentially told me "we have other priorities" https://github.com/getsentry/sentry/pull/26313. And that's a relatively simple changes, not the adoption of a whole standard.
In addition, as Thomas points out in this article, there are a lot of different token types for someone thinking about minting API tokens to choose between. They might rationally have different preferences over them. A standard that is prescriptive of format and approach is likely to struggle given that diversity.
With that said, I do see an opportunity here for a more modest standard targeted at service providers that already use JWTs or Macaroons. Generic tokens of those types are relatively easy for scanning providers to identify, and it's easy (and hopefully uncontroversial) for service providers to encode more information in them, like an "if found" link. I think a standard that defines the attribute name there, and the API for reporting / responding, would be a good start that might see adoption.
Importantly I am not proposing a big change at all: The tokens can stay exactly the same (in the database and crypto code), you can still use UUID or Macaroons or JWT, you only change the frontend to add this prefix. Apologies if this wasn't clear in the the two examples I posted without explanations. The benefits would also be a bit higher than the PR you reference, which seems to help with scanning on GitHub (you mention that it would already work without the change).
As you note in your PR, many tokens are already identifiable, so standardizing a way to put the reporting domain in there shouldn't reduce security (by obscurity).
Taken a step further, the secret should just be a URL that revokes itself - Like https://revoke.sendgrid.com/91on9SIkbUfSs. Github should then just make a get request to every URL (can whittle abuse down a bit by requiring https://revoke.sendgrid.com/robots.txt to have a Revoke: YES` section). They and anyone could maintain an allowlist of revocation URLs to pattern match as well. This makes a global registry unnecessary, and standardises the act of revocation.
"Woops. I hit ctrl+enter instead of ctrl+c while copying my secret. Guess production's down for a bit while we roll new ones!"
I mean your core idea is decent but that's just really funny.
There's some amount of practicality being lost if your secrets start growing massively. There's also potentially restrictions to what you can put in them, and a prefix with an underscore or colon might be easier than something that has slashes in it.
Your idea is probably living as a queriable dns record on the domain in question. Or a standard subdomain, or even a .well-known path.
The alternative (as in, current reality) is "Whoops. I hit ctrl+entry while copying my secret and no-one noticed for a month. Guess all our data is leaked now!"
It's also up to each provider what actually happens when the "revoke" action is triggered. Maybe they just warn you immediately, which is still better than nothing.
What I had in mind is posting them to <domain>/.well-known/report-leaked-secrets or a location looked up from the domain using DNS. Making them URLs is an interesting idea, but they are likely to look awkward (e.g. include "revoke" like your example) and get a lot of non-revocation traffic (even if we have a way to tell scanning apart from actual revocation requests, we'd probably rather only get the revocation traffic).
My professional email also mangles URLs, turning them into urldefense.proofpoint.com/... Such solutions are sure to interfere with tokens looking like URLs.
Maybe don't let anyone delete other people's tokens, even if leaked, but automatically alerting the admin if anyone accesses the "URL" would probably be a good option.
It would be in that repeatedly hitting the URL will not have any effects, other than disabling them the first time.
But yeah, auto-link followers will invalidate them immediately. There's a case to be made for that being a good thing, but don't want to get into that.
I agree with you. Someone brought this up on Twitter and I'm kicking myself for not remembering to include the notion of adding identifiable markings to sensitive tokens (I'd do it not, but I'd feel like I was plagiarizing).
And it's a noodly and somewhat incoherent notion of "safety" I'm using here, because of course, random tokens are unconstrained bearer tokens --- authenticated requests, CATs, Macaroons, and Biscuits all address that weakness. I'm biased by my concern over cryptographic implementation mistakes.
It's a neat property of Macaroons (and maybe Biscuits) that you can come up with sane configurations where checking a Macaroon into source code can be, if not totally safe, at least not a major incident. I wish I'd thought of that, too, since I think "checking the token into source control" is a more vivid example than "emailing tokens" or "passing them around".
Surely you needn't feel like you're plagiarizing, just give credit. The credit won't be any less deserved next week or next year, even if you "Know" you'd definitely have fixed this without prompting, nobody else knows that, and so it looks like you avoided giving credit where it was due, so, don't do that.
Nobody's asking for royalties, so "Shout out to @SomeoneOnTwitter for reminding me" is enough.
tbh I actually do think this is an error. I worked someone where we didn't do this and I think of it as a bug rather than a missing feature. It makes it very difficult to identify the token if it's ever leaked without checking it against our database!
Macaroons are great. I was using them successfully in a (now dead) product. Very easy to reason about when you get the basics. Unfortunately the ecosystem is small and the effort spent of getting colleagues on board (i.e. convincing them you're not using some fringe thing) was substantial and continuative.
I think part of the problem of Macaroons is the belief that there should be an ecosystem of them, and a standard, and standard libraries. They make work best when they're custom tailored to the applications that really want them.
> In future we want to be able to do even more sophisticated things, like ask users for confirmation before the push code that contains secrets
If you all have thought about it, do you imagine you'd only warn in the presence of some generic token identifier, like `secret-token` a la https://datatracker.ietf.org/doc/html/rfc8959 ? Or, would you be able to warn on everything that matches the regular expressions your partners give you to identify their API tokens?
The latter. Our objective for secret scanning is to prevent as many serious secret leaks as possible. Where a service already has a token format that is highly identifiable we want to take advantage of that, rather than rely on the adoption of generic token identifiers.
I work on secret scanning at GitHub. When token issuers use easily identifiable formats for their tokens we can easily spot them when they're accidentally committed. We can then work with the token issuer to automatically alert them of those leaks. A good example is AWS - if you commit an AWS key and secret to a public GitHub repo we will tell AWS about it and they will tell you about it (and quarantine the exposed keys) within a few seconds. We work with dozens of other token issuers too, though - some of the latest we added were Linear, PlanetScale and Ionic.
The above relies on tokens being identifiable - we can't send hundreds of partners everything that looks like 32 hex chars. In future we want to be able to do even more sophisticated things, like ask users for confirmation before the push code that contains secrets. We recently changed our own token pattern for that reason.
GitHub secret scanning program: https://docs.github.com/en/developers/overview/secret-scanni...
GitHub's updated token format: https://github.blog/2021-04-05-behind-githubs-new-authentica...