I would like cover basic username/password auth, OAuth and Active Directory, security keys and everything in between. Would like to do this in a linear fashion, ie like a coursera course with practice problems.
What made me understand these things the most, was setting this up just for myself.
For example host your own instance of Zitadel, Authentik or whatever you find most appealing.
Tinker a bit around with it. Then use that instance to authenticate yourself somewhere, i.e. another service where you can set up your own oauth provider. Take a look at the API requests, take a look the code of some OAuth implementation, for example in projects like Gitea, Nextcloud.
May not be it for everyone, though I really like learning by doing.
I'd highly recommend this approach as well. What started as a fascination for me turned into owning a suite of authentication capabilities as a technical PM for one of the SaaS vendors.
The problem with auth is that in practice, it's a lot of messy implementations of insufficient specifications requiring customizations/extensions. Reading a good explainer for say, OAuth, is a great thing to do. But won't be sufficient due to the myriad of weird quirks across various apps/vendors implementation of Auth <Approach>. Dealing with those quirks helped me more deeply understand the underlying concepts and the intent of the spec.
The messiness of the space really means that to be successful, developing some deeper intuitions about how everything fits together can be extremely helpful. And to do this, I think that just playing with auth - a lot - is one of the best ways to develop these intuitions.
I personally liked to spin up dirt simple Express.js apps, and then I'd wire up a sparse UI to the auth endpoints for some vendor, doing a barebones implementation of each thing needed to satisfy the auth protocol used. This really cemented the concepts for me and gave me a library of code that I could easily copy and tweak to experiment with something else.
I also read a lot of blog posts, watched a lot of YouTube videos, and went to an Auth conference or two.
That’s how I’ve learned about authentication, and most other things too. I do worry, sometimes, that I’ve missed a big detail that opens my server to a vulnerability of some kind. That’s where education can help.
As someone interested on upskilling, can you perhaps expand on what you mean by "tinker"? I hear a lot of devs talking about "learning by doing" but I'm trapped in a tutorial hell.
When you want to learn something, you probably have some kind of goal in mind. Let's use this as an example: you want to learn about Auth, because you need to integrate OAuth in an application.
Sure, you could just read a tutorial that does exactly accomplish your task, and you are "done". The tutorial will probably tell you, call this API endpoint, use this library, do this and that. But now you only know how to implement OAuth in your application, which may be fine, or enough for you.
But with tinkering I mean, that you don't just learn/read about the requirements for your task, but go a bit 'beyond'. For example, you set up an Identity Provider (e.g. Authentik, Zitadel, Keycloak, ...) which is 'out of scope' for your task, but now you are on the other side of the API endpoint, you can play around with the settings, see what might change with the OAuth integration in your application. Treat it like a sandbox, it's not production, you can 'build' anything you like, you now have power over both sides of the application, stuff you would never be able to do, as you are only tasked to integrate that API endpoint in your application, but would never be tasked, to deploy that API, as this is 'not your job'. You'll maybe spot one or two things, which are good to know for your integration, which you would've never seen in a Tutorial. You'll already pick up some terms that will later pop up while debugging your integration.
Your goal may be something completely different from my example, but what I'm trying to say is. If you want to learn something, because you need/want to do $X don't only learn to do $X but also take a peek at what happens end-to-end, be curious, and maybe question why someone in a tutorial says do $A and doesn't even mention $B. You'll never know that $B exists, even if it may be the better solution to do $X in your case.
Just because you want to get a project you’re working on done doesnt mean that a tutorial that gets things working will be the end all be all of your learning of auth. You can always come back after getting the project done and working to play with auth in ways beyond the tutorial taught.
I can second Hackmanit from my own experience. We had them come on site some years ago to train our team. This was incredibly helpful as they also focused on what we use in house in a separate chapter.
This is practical, but awful advice. Auth (z or n) has been very badly over engineered. You don't need anything more than http basic auth, the rest is just people with too much time on their hands. Oauth particularly is a travesty that their authors should be ashamed of.
OAuth 2.0 took the best features of what was already being deployed by Google, Microsoft, Yahoo, etc. and added in scopes and refresh tokens. The objective was to standardize how to delegate authorization so that developers did not have to learn slightly different ways of doing effectively the same thing.
Typing your username and password into a 3P website so it could crawl your contacts was horrible anti-pattern.
I work on the cloud security team for a Fortune 500 company. They won’t even consider a third party service that doesn’t provide a enterprise SSO/SAML integration with our auth provider. I suspect this is the more common approach for enterprise level companies given that at 40k+ employees it’s just not possible to manage employee auth across hundreds of services.
No. They used Oauth. I wrote their entire Oauth system. And it was a nightmare reading through Oauth/OIDC specs for something that could be handled trivially with http basic auth.
I think the most important place to start is appreciating the distinction between authentication ("is the person trying to use my application really the person they say they are?", abbreviated "authn") and authorization ("is this person allowed to perform the action they're trying to perform?", abbreviated "authz").
Most of the comments on this page are referring to authentication. It's important to know, but also the piece you're likely to spend far less time on. It's where most of the heavy lifting will be done by some vendor or tool you set up instead of by your own code.
Authorization is far less likely to be something you get off the shelf and far more likely to be where you spend significant time. It can be very intimately connected to your business logic. Active Directory roles and groups are one authorization solution for a particular class of problems but I have only seen them used for controlling business internal assets (mostly file servers); not public-facing applications.
I really like Oso Academy as a resource for authorization topics. It's structured like a progressive course, though I don't know if they have the kind of exercises you mentioned.
It’s true that authZ requires a lot of customization. In my experience, though, authN is the harder one to implement when there is no existing infra to support it. How do we store and distribute credentials? How do we allow user-defined identities? How do we implement session keys? How do we scale out the authN layer? If we decide to use certs for authN, how do we manage certs lifecycle? The list is long.
> Authorization is far less likely to be something you get off the shelf and far more likely to be where you spend significant time
Agreed. It is business logic, which means that it is harder to do off the shelf.
That said, there are some startups trying to make this work. Here are the ones I'm aware of:
* permit.io
* cerbos.dev
* osohq.com
RBAC (role based access controls) can take you a long way for many applications, but at some point you will be more interested in ABAC (attribute based access control) or PBAC (policy based access control).
Just to add another AuthZ approach to your great comment:
ReBAC for Fine Grained Authorization (FGA) is also something that's becoming more common at the moment. Google released their Zanzibar whitepaper explaining how they implement FGA for things like YouTube and Drive and it's lead to a lot of new tooling based upon it.
I'm working on a project at the moment with quite complex document management with various levels of access. Auth0 open sourced their FGA implementation recently as OpenFGA which looks ideal for our use case. As it's all fairly new there isn't much info out there about different ways of implementing it so we're kind of figuring it out as we go.
This is the thing about "OAuth isn't about authentication" argument. . . there is quite a bit of overlap between RBAC and authorization. And that in itself, if quite confusing.
What most annoys me is that OAuth is also very much about authentication, specifically outsourcing your authentication to a third party. It's not like OAuth has nothing to do with authentication, which is the knee jerk response you get from people when they attempt to simplify an explanation about what OAuth does and doesn't do.
One thing that might be interesting is SASL has evolved over the years. Most things are RFCs, so well written, short and open specifications. This gives you one larger thing to learn. Should be rather linear if you sort by RFC number.
It would head well into advanced user/password schemes.
The problem is that even advanced mechanism like a SCRAM based authentication with additional 2fa are rather simple to grasp & implement, but really hard to get right / secure.
A lot of the evolution is rather an evolution of attacks and issues, leading to new schemes. OWASP is thus pretty relevant, too.
I'm also interested in this, but specifically something that covers authentication between services and in particular situations where a user authenticates against service a and now service a needs to ask service b to do something on behalf of the user. Not just a handwavy "use OAuth" but more concrete and thorough.
To be honest, my first instinct was to give a hand wavy "use OAuth". But to elaborate a but further, oauth is made for this and is the industry standard for this kind of thing. There's lots out there on oauth that tells you more than "just use it", it's just a couple searches away. So I don't think I really understand the question.
I’ve learned a lot about these things by working on a project using Ory Kratos. The documentation is a bit patchy but it’s open source so you can dive into the gritty details of how a fairly large id provider implements the various aspects of OAuth and so on.
(One nice thing about Azure Active Directory is that it supports OAuth2 integrations so if you understand and can implement OAuth2 then you can also implement AD).
I know it’s not a linear learning answer but hope it helps you perhaps later. Good luck!
Another vote for Ory setup. It forces you a bit to go through it on your own, but you also learn a lot. There's Kratos for authentication, as well as Hydra for federated authentication. Then, there's also Keto for authorization which is really nice and more flexible than Keycloak in certain things, and ultimately there's also Oathkeeper if you want things to be transparent to application developers.
for example if you need roles defined per client and you have a lot of clients. In Keycloak that would be realm(s) and it's not really designed to have a lot of those. In Keto world it's all a function of queries, which admittedly can also be a bit of a challenge if you need to cache it and be part of a fast response chain (like high speed API).
I found Nate Barbettini's video on OAuth and OpenID Connect incredibly insightful for understanding these topics. He explains everything so well- https://youtu.be/996OiexHze0.
Additionally, I'm part of the ZITADEL team, an open-source project that's free to download or use in our cloud offering. So, you can always tinker around with it as some others have already suggested. Our blog dives into various security topics, ranging from OAuth, OpenID Connect, and Single Sign-On, Authentication, Federation to emerging issues like Passkeys. We also discuss real-world Identity Management problems and solutions seen by ZITADEL users— https://zitadel.com/blog.
For any specific security-related queries, feel free to join the conversation on our Discord chat: https://zitadel.com/chat. We're always discussing and sharing insights on these topics.
I recently had to learn OIDC which is the standard for auth that most people really mean when they say OAuth now, I think. I learned by implementing (using Keycloak) and most importantly by reading the OIDC specs. It may seem intimidating, but the real core of it is not that large.
It's a topic I'd be interested in writing more about, and I'm happy to start here if you would find it useful.
As mentioned elsewhere, I'd probably start with OAuth2.1 (not quite a standard but well on its way) as this updates the OAuth2 standard, as well as consolidates lots of improvements.
OAuth 2.1 has no new features. It is OAuth 2.0 rolled up with all the specs since 2.0. It is the better place to start for learning about delegated authorization.
I wonder. Do open source oauth servers actually implement all of 2.0 these days? Do clients? What do they do for the bits the spec leaves... unspecified? My memory isn't the best but I remember ten or so years ago when the spec was fresh that so-called off the shelf servers at the time didn't actually implement anything of value, so had to write my own barebones version. I remember thinking the 1.x spec was actually better, but it didn't matter anyway because every real app would just write code targeting whatever it was that social media companies were doing and calling oauth. (One notable thing was not ever presenting the user with an HTTP Basic experience, and everyone is still addicted to JSON vs. form-encoded body parameters.)
Fair! I considered "OAuth 2.0 rolled up with all the specs since 2.0" an update, but you are correct. They specifically didn't want to set out any new features in OAuth 2.1.
From the spec:
"This Standards Track specification consolidates the information in all of these documents and removes features that have been found to be insecure..."
A bit salse-y, but Oso has a pretty nice overview on the problems that led to their product and how they reason about AuthN/AuthZ: https://www.osohq.com/academy
It's more focused on application level architecture rather than the whole domain of AuthN/AuthZ, but I've found it's a decent reference for folks unfamiliar with a lot of the common issues one encounters in implementation.
Alternatively you can go down the OIDC or SAML paths (generally the path of a developer)
While I've worked with keycloak I've always found the [Curity's resources](https://curity.io/resources/openid-connect/) for understanding OIDC core and extensions very good.
The order in which you learn these things isn't really important but they're both important (but really depends on your problem domain)
I don't know of any, but here's the resources I've found useful as I've worked in the space (disclosure: I work for an auth vendor, FusionAuth).
* Solving Identity Management In Modern Applications is a great book offering an overview of the entire identity process, including provisioning (adding users), authentication and more. I read and reference the 2019 edition; don't have the 2023 edition but expect it is just as good: https://link.springer.com/book/10.1007/978-1-4842-8261-8
* OAuth2 In Action walks you through building an OAuth2 server from scratch (in JavaScript). You'll learn about the fundamentals of tokens, clients, registration, and more. Very accessible. https://www.manning.com/books/oauth-2-in-action
* The Security Engineering Handbook is great for foundational security knowledge, like 'What does a hash look like, and what makes a good hashing algorithm' as well as a lot of broader security topics: https://www.cl.cam.ac.uk/~rja14/book.html
* The Identity Unlocked podcast with Vittorio Bertocci (RIP). This is not about the basics at all, but is a deeper dive into the dev focused side of authentication, and will give you great pointers for more reading: https://identityunlocked.auth0.com/
* I have a substack where I talk about aspects of customer identity and access management that I think is pretty good :) : https://ciamweekly.substack.com/
I think this would be a great linkedin learning, udacity or coursera course, but didn't see anything when I searched there. I've put together courses before and it's a ton of work, but hmmm, maybe it'd be fun to do for this topic.
Edit: corrected spelling of Vittorio Bertocci's name.
One thing I did early on, that I would highly recommend, is picking up a Security+ study guide book and reading it. I recommend a digital copy, since it's easier to ignore the fact that the book is quite large. Even if you never do the certification (I haven't), the Security+ curriculum gives a really nice broad overview of a ton of the concepts involved and how they're used practically. From there, as a few others have mentioned ,it's hard to beat reading some of the specs for Oauth2, OIDC, SAML, etc, to understand how the primitives are woven together and what the different terms mean.
Perhaps the higher level architecture reference guides can provide a good overview of all these items? Written from a GCP perspective, but nevertheless the concepts can be shared cross cloud:
I made the simplest Rust server I could to learn the basic workflow of OAuth2, it gets the user gmail after you log in with Google. I also included some instructions of how to set up the Google account. Feel free to check it out!
https://github.com/alexgf0/oauth
I'm currently in the boat where I need to set up authentication (and eventually authorization) for a startup catering to big enterprises almost exclusively. I'd love to be recommended resources for setting up something like Keycloak or Auth0 (or anything else) for that use case.
Huh, I always forget a lot of programmers weren't around when this stuff was invented. It's all actually pretty simple, and very little complexity. However, there are so many "gotchas" (that can result in zero security) that anyone writing a guide like this would probably have you sign a waiver, then any company you work for sign a waiver, and include your firstborn child.
For example, user/pass is pretty simple on the surface:
1. app sends server user/password.
2. check if it matches the password in the database.
3. if so, respond with a token the app can send back that is associated with the user. if not, return with a 401.
The number of gotchas in this simple 3-step process is insane... here's some off the top of my head (not exhaustive):
- make sure the login form includes a CSRF token.
- do not store the password in plaintext in the db. or encrypted, probably. Since an attacker can possibly get the encryption key and then decrypt all your passwords. Use strong, slow hashes.
- rate limit your logins to prevent brute-forcing (slow hashes work great here)
- use constant-time comparisons to check if the password matches (e.g., hash_equals() in PHP), RTFM for whatever constant time check you are using or you will open yourself up to timing attacks.
That's the issue with security stuff, there are so many gotchas that anyone writing a course would open themselves up to getting sued (at least in the US) just for missing a gotcha or someone with Dunning-Kruger thinking they know everything and getting hacked ... it's too risky. You have to just get into the industry and learn it the hard way. At least that's how I learned everything I learned.
Actually, I think we're doing a huge disservice to our profession as programmers when we call stuff like this "an insane number of gotchas". This is no critique of you or your post specifically, mind you, and I know where you're coming from. But it's a critique of a general tendency among programmers to call anything that requires a bit of knowledge and thought beyond the simplest surface level solution "complex" or "insane to implement on your own". It's not. While I know that you're list of gotchas isn't exhaustive, the real list is not so much longer that it's not perfectly reasonable to expect someone to be able to implement it correctly.
I say that as someone who was on the "receiving end" of this kind of advice for years btw. I always thought that the things that are "better left to libraries" are really arcane and impossible to understand, which only lead to confusion and an inability to truly assess options. And it's really just a matter of semantics and framing. It would be perfectly reasonable to say "it's not complex as long as you keep this reasonably long list of gotchas in mind".
I don't know why you're getting downvoted (I have no idea why people are on HN if they think this is just Reddit. If you downvote, say why and start a discussion), but you're right. My intent wasn't to imply "just don't do it" or "leave it to libraries." I was trying to say why you can't really find a guide like the post is asking for (at least for free!) and it likely has a lot to do with liability and things like that.
I was trying to say exactly what you are saying, and that is just get in there and learn. It isn't that complex to implement this stuff yourself if you need to. I've implemented this stuff myself dozens of times over the years... but I try to use a library before implementing it myself. Interestingly, over the years, I've reviewed libraries and found bugs in them. So, do read the code of the library you're using. Once you've reviewed a few of them (and implemented it yourself a few times), you kinda get an idea of what to look for.
Haha, thanks for your reply and for understanding where I'm coming from. And while your original comment might not be a _direct_ reply to OPs question, I would've hoped for it to be far more upvoted as well, as it's easily one of the most valuable comments in this thread in my opinion. HN truly is weird sometimes.
The only thing I took issue with in your original comment was the "The number of gotchas in this simple 3-step process is insane" and the "there are so many gotchas" parts, as I think that this exact wording made me read the whole thing in a wrong way. I just wish we would tell people new to these kinds of topics "Don't fear this, this is normal, but totally manageable! It might seem like a minefield at first, but actually, there's a very well-trodden path through it. Here's (part of) the map." (Basically,you provided that map, which is great, and more than people genrally do, but the wording above made the map seem more daunting than I would wish.)
IMO it is insane to implement Auth on your own in almost all real life use cases.
You wouldn't roll your own crypto either.
Good for learning but for real users use something that is tried and tested.
Implementing auth is nowhere near as risky as implementing crypto. The argument against doing it should mainly be from practicality. Even if you only need a basic auth scheme and not a complex net that must integrate with other services, even though such basic schemes can be done in an afternoon from scratch without problems, it can be done in even less time just using one of the bigger-than-you-need libraries for it. Sometimes it's just a few lines in an XML config. Though still, arguments for minimizing dependencies (especially frequently updating ones, which are more likely the bigger the thing is) can overrule that, case-by-case.
* on the client side, store the token as a secure https only cookie, as local storage is accessible by any module of your app, see supply-chain attacks.
I'm having to deal with this stuff right now. Firebase stores their refresh token in local storage and that allows minting new session tokens once they expire, are they wrong? Is there any other way to remain signed in "forever"? (until logout or until token is revoked)
That is what I'm using but I have to authenticate again every few days. If I used the client library it would autorefresh the token periodically, but that stores the refresh token in local storage. Since that is something you recommend against, I was wondering why.
Also... don't rely on slow hashes themselves for rate limiting. They're slow because they eat up CPU. Rate limit the requests themselves or you're setting yourself up for denial of service fun. (And also, slow for your server does not necessarily mean prohibitively slow for an attacker's cluster if they do manage to dump your DB. Salting is useful and hopefully uniquely done per account for you by your hashing function, but it's also useful to just forbid very weak passwords entirely, and maybe go so far as to forbid even strong-looking ones that have shown up in data leaks.)
[extremely popular framework] doesn’t do salting per account. Imagine my surprise when I could simply copy my user’s password field to another user in the db and login with my password as them. Luckily it is super pluggable so we could implement proper salting. It’s entertaining how even popular frameworks can miss simple gotchas.
The best thing is to use a salt that is a fact about the user that can’t reasonably change (like the user ID). So you can copy the password field, but if you copy the user id too then nothing happens (assuming there isn’t a unique constraint in the db), you are still logged in with the same user id.
However, user ids really only work for UUIDs as numbers are not random (and very easy to create rainbow tables for). If your users can’t change their user name or email address, you could use that as well.
It seems you'd need to include the same key as you'd log in with. If you hash a UUID primary key then the same attack just requires changing the username as well as the password hash. Like, if I change withinboredom to projektfu with projektfu's password hash, I could log in and mess around, then change everything back.
All assuming database access but for some reason reading the database doesn't get me what I want. Perhaps this system is used to get a bearer token or session cookie for another system I can't access.
The way it usually works for something like bcrypt is this: password comes in as input, you choose a work factor / cost, your bcrypt library takes both and returns a string that you store in a single hashed pw field. The string consists of the work factor followed by a random salt (that it generated and concatenated to the input on its own) followed by the hash, so each account has its unique salt, you're good. If your DB gets dumped, people can't generate rainbow tables to discover the passwords of multiple accounts at once. Devs do sometimes concatenate other data to the pw before passing it to the bcrypt library, basically a runtime salt not stored in the (same) DB, so that just having a DB dump actually isn't enough to find users using the password "password".
Sure, if someone swaps the stored fields of two accounts around, they can login as them, but I don't think I've ever seen attempts at preventing this with just hashing policy. Things are rather dire for you already if an attacker can swap the fields of two accounts. Is the attacker in this scenario an employee...? As you say, any additional data from the row that is used along with the main random salt can just be swapped around too. I guess it's possible that an attacker can only swap the hash field, but not anything else, or especially not something like the primary key they would need. Some older schema designs store the per account salt in a separate column from the hashed password, if only one field can be altered then I guess that would stop the given threat scenario too. (Edit: I guess another scenario the GP might be thinking of is that all other data is gated behind queries that filter on a user id, so an attacker partially modifying just the user login row to login "as them" doesn't actually help get access to anything else if they had to modify the user id.)
I think it's better here to focus on the broader scenario of "unauthorized access" which includes concerns like: attacker actually knows the user's password (from keylogger or whatever) and thus doesn't even need to alter anything in the DB or care about the hashing policy. It might seem like it's over in such a case, but many systems out there still manage to prevent many cases of such unauthorized access. e.g. with 2FA, or pseudo-2FA with things like "we don't recognize this device fingerprint/IP, so re-verify the account's email and/or phone and/or trusted recovery email" -- and even after that sometimes the service (Google) still might not let you in.
How much are you willing to pay for it so you would get a knowledge base that is not superficial, but thorough and you'll really know the ins and outs of it?
imo, the best way to learn these is to implement them on a small scale yourself. When I wanted to learn about JOSE, I implemented a JOSE library and read the RFCs alongside my implementation. It taught me a lot.
For example host your own instance of Zitadel, Authentik or whatever you find most appealing. Tinker a bit around with it. Then use that instance to authenticate yourself somewhere, i.e. another service where you can set up your own oauth provider. Take a look at the API requests, take a look the code of some OAuth implementation, for example in projects like Gitea, Nextcloud.
May not be it for everyone, though I really like learning by doing.