Hacker News new | past | comments | ask | show | jobs | submit login
SAML Is Insecure by Design (joonas.fi)
300 points by aj3 on Aug 5, 2021 | hide | past | favorite | 180 comments



What kills me about SAML is that instead of this straightforward XML style:

    <claims>
        <nameid>foo.bar@test.com</nameid>
        <givenname>Foo</givenname>
        <surname>Bar</surname>    
    </claims>
It instead encodes elements and attributes using elements and attributes, in the manner of the classic "inner platform effect" anti-pattern:

    <saml:AttributeStatement>
      <saml:Attribute Name="uid" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
        <saml:AttributeValue xsi:type="xs:string">test</saml:AttributeValue>
      </saml:Attribute>
      <saml:Attribute Name="mail" NameFormat="urn:oasis:names:tc:SAML:2.0:attrname-format:basic">
        <saml:AttributeValue xsi:type="xs:string">foo.bar@test.com</saml:AttributeValue>
      </saml:Attribute>
    </saml:AttributeStatement>    
This is roughly the equivalent of the SQL table schema where there's one table with the columns "rowid, columnname, value" along with a "schema table" that some hand-rolled code uses to validate that the data table contains only data that is valid according to the schema.

It's not XML, it's XML squared: using XML to encode XML concepts instead of just using XML directly.


Oh wow. I'd never looked close up. That's fascinating, and a term of art I expect to use in the future.

Btw I think your SQL example is called "EAV Schema" and does have a legitimate purpose once in a while.


I think it was coined here https://thedailywtf.com/articles/The_Inner-Platform_Effect (Haven't thought about that website in a while)

I've always wondered why more people didn't actually issue DDL for these kinds of databases. Like if you want to let your user define their own custom fields, just take what they enter and actually CREATE TABLE / ALTER TABLE to build the the tables they need. And then more or less you could just write regular queries.


I've worked on a system like that, writing software on a project base for enterprise clients.

Turns out sysadmins get REALLY nervous when you tell them your application needs permission to use DDL at runtime. It required a lot more filling out forms.


"Just" don't use the same database that the application is using for it's internal data.


You can scope the DDL permissions to a single schema while still using the same database (and benefit from all the transactional guarantees a single database offers).


Yes, if database supports it (I guess every major ones now do) then it's great. Postgres's transactional DDL seems like great fit.


I'm sorry, I don't follow.

I was talking about an application and its accompanying database, which would still be paid for and administered by the customer, not a database outside the application which it would be accessing.

That still raised many eyebrows.


I think the GP was assuming the reason sysadmins worry about running DDL is the danger of overwriting/modifying tables the application depends on.

If this is the main concern, you could have two databases (or schemas, if your database supports it).

One for application configuration/storage that the application never runs DDL on and another for user data, where the tables can be created/modified dynamically.

That separation means that, at worst, user data is screwed up if there's a bug in the DDL generation, but the application should still be able to run.

At least, that was my read.


This is how arcgis works and it really sucks when your tier 1 starts adding columns to your schema because they made a typo.


Some people do that, and at one point it was much faster to do that so it was more common, but as databases have generally gotten faster that kind of optimization is less interesting to people.


I assume performance is a lot less predictable with that method, no?


these days if you are modeling the "sparse, high-dimensional data" of the kind that EAV schema does well, you probably should be reaching for JSON columns instead of EAV schema.


Some databases support "sparse columns", which use EAV under the hood but are better optimised: https://docs.microsoft.com/en-us/sql/relational-databases/ta...


Thanks, I forgot about that! :)


Isn't that the problem that these "columnar store" DBs address?

A bunch of dictionaries containing UUID/data, where each dimension is its own dict, and the dicts might or might not be running on the same system?


> "EAV Schema" and does have a legitimate purpose once in a while

Well, technically, the database's own system tables have tables with such column names and responsibilities.


There's a reason for that. The attribute statement / attribute name / attribute value style has a consistent schema that won't change despite any custom attributes that are being added. Your "claims" example does not have a consistent schema; it changes when new claims are added, and therefore makes it impossible to validate the response (the notion of arbitrary objects does not exist in XML schemas).


Typically this would be implemented by having a set of predefined elements in the XSD (name, email, etc...) and also an 'additional' element that has a list of "Any" as the children. Easy!

E.g.:

    <claims>
        <nameid>foo.bar@test.com</nameid>
        <givenname>Foo</givenname>
        <surname>Bar</surname>    
        <extensions>
            <myCustomClaim>010231239dfadsf</myCustomClaim>
        </extensions>
    </claims>
It's also possible to use namespaces for extensibility, and this can even be used to guarantee that the tag names don't conflict.

So instead of this garbage:

    <saml:AuthnContextClassRef>urn:oasis:names:tc:SAML:2.0:ac:classes:Password</saml:AuthnContextClassRef>
You'd have standard namespaced and extensible elements such as:

    <claims xmlns:saml="urn:oasis:names:tc:SAML:2.0:ac:classes" xmlns:adfs="urn:com:microsoft:ADFS">
        <saml:nameid>foo.bar@test.com</saml:nameid>
        <saml:givenname>Foo</saml:givenname>
        <saml:surname>Bar</saml:surname>    
        <adfs:securityId>S-0-1-52534-12362341234-12312315</adfs:securityId>
        <adfs:objectGuid>47e3a2d3-266a-48ad-b080-b785b4d0b45f</adfs:objectGuid>
    </claims>
This would enable clients to validate the claims they do understand and ignore the claims they don't. Similarly to how X.509 does it, elements could be added to "must process" or "optional" parent elements to ensure correctness and security even in the face of a wide range of client capabilities.

This is common in the XML world, but SAML had to invent its own wheels.


Surely there are better ways to do that, if it is even needed. If your validation only validates that you have

> attribute statement / attribute name / attribute value style

Does that even have a benefit? Is it any different that simply validating that a string is valid xml/json/something syntax?


> there's one table with the columns "rowid, columnname, value"

Yeah, that's known as "entity-attribute-value model" (or just "EAV") and is considered an anti-pattern unless you actually need the flexibility it provides (which is fairly rare).


Yup. We run a system that uses this pattern for user-defined data fields. It works, but not without difficulty. It's very difficult to validate with business rules because you can't eliminate nulls and the "value" field is forced to be a character field. The system does have type checking enforced by the application, but multiple applications access the data.


> the "value" field is forced to be a character field

It doesn't have to be. We use this (under SQL Server) for our simple "custom attributes" mechanism (some irrelevant details omitted):

    CREATE TABLE private_data.ATTRIBUTE_VERSION (

        OBJECT_ID bigint,
        ATTRIBUTE_NAME nvarchar(255),
        ATTRIBUTE_GEN bigint,
        IS_TRASH bit NOT NULL,

        VALUE_BOOL bit,
        VALUE_INT bigint,
        VALUE_DECIMAL decimal(38, 9),
        VALUE_DATETIME datetime2,
        VALUE_STRING nvarchar(255),

        ATTRIBUTE_TYPE AS private_attributes.ATTRIBUTE_VALUE_TO_TYPE(
            VALUE_BOOL,
            VALUE_INT,
            VALUE_DECIMAL,
            VALUE_DATETIME,
            VALUE_STRING
        ),

        CONSTRAINT ATTRIBUTE_VERSION_PK PRIMARY KEY (OBJECT_ID, ATTRIBUTE_NAME, ATTRIBUTE_GEN),

        CONSTRAINT ATTRIBUTE_VERSION_FK1 FOREIGN KEY (OBJECT_ID, ATTRIBUTE_NAME) REFERENCES private_data.ATTRIBUTE,
        CONSTRAINT ATTRIBUTE_VERSION_FK2 FOREIGN KEY (ATTRIBUTE_GEN) REFERENCES private_data.GENERATION,

        CONSTRAINT ATTRIBUTE_VERSION_C1 CHECK ( -- At most one of the various value types can be non-NULL.
            private_attributes.ATTRIBUTE_VALUE_CHECK(
                VALUE_BOOL,
                VALUE_INT,
                VALUE_DECIMAL,
                VALUE_DATETIME,
                VALUE_STRING
            ) = 1
        )

    );
Since SQL Server encodes NULLs in a bitmap, having a bunch of NULLs for every non-NULL takes almost no extra space at all. But even on a "wasteful" database such as Oracle, this shouldn't add more than an extra byte per NULL.

> The system does have type checking enforced by the application, but multiple applications access the data.

Traditionally, this is solved by funneling all applications through a public API of stored procedures and views/functions which enforce the desired business logic, but I understand this approach has fallen out of favor in recent years (decades?).


That's really not an improvement. This complex enough that you're going to run into problems when multiple systems use the same database. I can imagine the fun that comes when another system wedges a field because it used the wrong type. That's the same problem we already have with a single text column. This is way more complex and still has the same problems.


The reason for that is that you still want to be able have an XSD to validate.

In the context of SAML, these values are completely arbitrary metadata without fixed meaning in the protocol itslef.

You could even use attribute names that aren't valid for XML tag names.

It's the same reason why -- for example -- AWS APIs return tags as a similar entity list. You don't see

    <Name>Example</Name>
    <BillingGroup>prod</BillingGroup>
but rather

    <tagSet>
      <item>
        <key>Name</key>
        <value>Example</value>
      </item>
      <item>
        <key>BillingGroup</key>
        <value>prod</value>
      </item>
    </tagSet>
---

Database EAV would likewise be 100% appropriate if users could use the database to store arbitrary attributes. (Think JIRA custom fields.)

But if users are limited to certain attributes, then encode that in the schema.


>In the context of SAML, these values are completely arbitrary metadata without fixed meaning in the protocol itself.

Well, that's the problem though. YAGNI, KISS, and all that.

>The reason for that is that you still want to be able have an XSD to validate.

Well, you can still validate fine when you have specific fixed metadata...


A big part of SAML development was the attempt to satisfy every single requirement put forth by every stakeholder. The outcome of this was that they couldn't come up with a single schema that covered everything (schema and taxonomies that attempt to cover too much have their issues). They couldn't even agree on what to call a user ID. Instead they ended up with a schema to describe schemas, making SAML able to flex to handle nearly any use case, but resulting in a terrible mess and wrecking any sane interoperability.

What we have now are groups like OASIS, the Shibboleth Consortium, eduGAIN, and InCommon, each defining the schema their members agree to support, and an explosion of competing standards.

By the way, even Microsoft can't interoperate with themselves. Traditional ADFS SAML can handle arbitrary multi-value attributes that Azure ADFS rejects.


The most astonishing thing about the xml ecosystem is that the XPath query language specification standard doesn't use xml syntax.


That's a good thing


Why is it astonishing? The hype wasn't manufactured by the spec authors. The sales departments were responsible for that.


Consider xslt syntax. Consider xsd syntax.


Thank you for this comment. Now, I know, I do not need to learn SAML. I will wait for the next technology. It reminds me how SOAP has been superseded by JSON.


On top of this, I have inside knowledge that some extremely common libraries that implement SAML were built not by reading and understanding the spec, but simply by looking at sample XML documents in the wild and writing code that handled it.

Multiple exceedingly obvious vulnerabilities have been the result. One fun one was: looking at an XML signature in the document, verifying it, then ignoring the assertion it was claiming to sign and just trusting the assertion at the document root.

I tried to write a standards-based implementation and gave up. The standard is enormous, and consists of three parts:

    1. The definitions of what each XML tag means in a vacuum
    2. Patterns on how to assemble those XML tags into a document that means something useful
    3. Protocols that exchange these documents back and forth to accomplish some authentication objective
Half the problem comes from the fact that it's meant to do anything and everything, and so you can theoretically just mix and match all the above parts to get what you want. But that also means that it's exceedingly simple to mix and match stuff in ways that are subtly (or not so subtly) insecure. The other half comes from the fact that the standard is so damned complicaed in order to handle everything under the sun that it's damn near impossible to wrap your head around it all. So people just glance at the spec occasionally and just write something that handles documents they see in the wild and hope for the best, with predictable outcomes.

The whole thing is a tire fire.

Note, I last worked with it about a decade ago so I may have gotten some of the characterizations wrong.


I get the feeling that this seems to be a common issue with any authentication or authorization "standard" nowadays. When I had to implement SASL OUATHBEARER support, and I started poking around how to actually implement OAuth2 from the standard itself, it was similarly full of "here's several ways that you might do something, and whether or not they're supported by the provider is implementation-defined, and what you have to provide with the requests is implementation-defined, and where you go to find the stuff is implementation-defined."

And frustratingly for me--trying to implement this in the course of a SASL method as a client--there's not even anything hooked up in SASL that might have hinted at the client what to do. Which is insane because the entire point of SASL is to bridge the gap of "how to request authorization given username, hostname, protocol." It makes me want to retreat back into Kerberos as a better way of supporting SSO than anything invented in the decades after it.


Last I checked, and having worked on a few oauth clients, OAuth2 is more of a set of rough guidelines.


Correct, it’s officially a framework and not a protocol. It’s a framework to build a specific protocol which then is using the same patterns as other OAuth2 based protocols but not necessarily compatible. For example URL endpoints are not defined in a strict sense and the provider can also add arbitrary parameters to calls as long as the basic OAuth2 parameter are present as well. OpenID Connect 1.0 builds on that to make the framework more strict.

RFC6749 The OAuth 2.0 Authorization _Framework_


I ran into a similar issue trying to write an FTP client implementation from the spec. It turns out that most servers don't support 90% of the spec, and some very important things (such as the output format of LS) are completely unspecified.

I threw out that first implementation and just got the 3 most popular servers and made it work with them.

Random example: I'd like to present directories and files differently so users can chdir to a directory. There was no single method for detecting a symlink to a directory that worked on any 2 of the 3 servers I targeted for support (other than issuing a CHDIR and checking for success, and I wasn't going to issue a CHDIR for every item found when listing a directory for performance reasons; I did have to issue a CHDIR for every symlink on one of the 3 though).


I was once tasked with writing a crawler for FTP servers and lost about half the hair on my head over this madness. Just when I tuned the robot to work with 10 servers, an 11th would pop up that behaved totally differently. Thankfully, I'm not in that job anymore -- I'll probably never touch FTP again.


The arbitrary parameter stuff can be a real pain because there’s some loose conventions around which parameters to use, but sometimes they’re named slightly differently or clash. Terribly annoying to work with.


Isn't this the usual practice with any nontrivial XML handling? You get some WSDL and XSDs of a coarse-grained service (as in having 300 optional input elements with no idea what can be combined or not) so you throw real-world examples on your implementation until it works.


It is indeed a common way of parsing XML. Unfortunately, it's a bad idea when dealing with SAML.

SAML is a way for exchanging 'security assertions' - for example, if a Alice logs into the AWS Console with her employer's single-sign-on service, the SSO service gives Alice a signed document saying "The bearer of this document can log into AWS as Alice within the next 60 seconds and access accounts B and C" which she hands to AWS. The digital signature on the document is very important.

Unfortunately, SAML uses the 'XML Signature' standard [1]

The people who wrote the XML signature standard didn't just take an XML document and whack a PGP signature around the whole thing! They thought about use cases like 'what if you wanted to only sign part of a document' and 'what if, after signing, the document was processed by something that reordered the attributes' and 'what if you have signatures from multiple sources and want to collate them into the same document'. They decided they wanted to support all of them.

As such, it's entirely legal for a document to contain a valid set of properly signed security assertions and a load of different security assertions that aren't signed.

So SAML documents have to be parsed very carefully - if you naively check the document has a valid signature then read the assertions, an attacker can add extra assertions outside of the signed part of the document, granting them broader permissions than they ought to have.

[1] https://en.wikipedia.org/wiki/XML_Signature


I feel like we keep making this mistake again and again. Like JWTs can be signed with the "none" algorithm, which as you might guess is very easy to forge. Conformant implementations can see that and say "yup, that's a valid signature" and happily let any user grant themselves any access they want. Big mistake. It probably shouldn't be in the standard.

(I like PASETO instead of JWT, but nobody else in the world does, it seems, so we're stuck with the broken thing forever. Client libraries at least special-case "none" now and make it auto-fail, which has probably saved a few people that didn't actually read the spec.)


Thanks for mentioning PASETO. It looks good, except that it doesn't have a standard way of specifying expiration like JWT. But at least it's easy and safe to roll your own implementation of that part. Edit: My mistake; it's just the Elixir implementation of PASETO that doesn't support expiration.


PASETO is a bit of an odd duck, and isn't really an answer to the OIDC vs. SAML question. OIDC relies on JWT in the same way that SAML relies on DSIG, but there's a whole authentication protocol build around it. You would probably not be better off replacing an OIDC implementation with a hand-rolled authentication protocol based on PASETO.

The big problem here is that none of this matters. People do abuse OIDC as a general-purpose token for individual applications, but almost nobody does that with SAML. The point of these schemes is to federate authentication. In other words: it only matters what Okta and Google will agree to do. Your options are OIDC or SAML.

If you had to pick between the two, I guess you should pick OIDC, and think about what Ryan Sleevi said downthread about certificate verification; you'll want some kind of pinning scheme (pinning is a mess in client-server situations, but still sane in server-server applications; in fact, server-server pinning long predates browser public key pinning).

If you're not federating authentication, I don't think you should use JSON tokens of any sort.


> The people who wrote the XML signature standard didn't just take an XML document and whack a PGP signature around the whole thing! They thought about use cases like 'what if you wanted to only sign part of a document' and 'what if, after signing, the document was processed by something that reordered the attributes' and 'what if you have signatures from multiple sources and want to collate them into the same document'. They decided they wanted to support all of them.

You forgot to add the part - what if someone wants to add comments randomly without it changing the signature, even when those comments (mildly) change how the xml doc is parsed!


I did forget. I've spent years trying to.


It's too late to reform SAML, but this where TDF shines. [1]

This is what we use in the military to provide provenance assurance, classification markings, and authorize dissemination destinations for digital intelligence products, and also what we use for manifests to enable trusted data transfers from unclassified to classified networks.

It's pretty interesting to be on a network where everyone has a PKI identity. Your client certificate gets sent to your sponsoring agency's personnel database, which returns the exact set of clearances you have, that gets sent to the server you're requesting information from, and you get back a web page with exactly what you're authorized to see and everything else redacted. No need to login and you can access any page you know the name or address for. You just may not see anything on it.

[1] https://www.dni.gov/index.php/who-we-are/organizations/ic-ci...


> 'what if you wanted to only sign part of a document' and 'what if, after signing, the document was processed by something that reordered the attributes'

Those use cases are valid. The second one is actually the reason why JSON can't be used for a similarly usage (there is not canonization standard, and reading then writing a file could reorder the elements order).


The problem isn't that there's no valid use case for it; it's that there's no use case for it in SAML, just as there's no valid SAML use case for XML comments (which also broke SAML).

You can defend DSIG in a lot of different ways. I'll disagree with all of them and feel like I could probably win the argument; DSIG is a cursed standard. But that's not the bar you have to clear in this discussion; here you have to establish that DSIG is good for SAML. It clearly isn't.


I might be wrong but I vaguely remember that something like “all” of the popular open source SAML libraries in the wild were vulnerable due to DSIG mishandling circa 2012. They’d blindly accept any document as valid as long as there was a signature in place that signed some element, but didn’t bother to check that that element was the set of assertions actually being used.


There were similar issues more recently (2018/19) with custom XML entities or comments. The XML signature was unaffected, but if code used the equivalent of element.children[0] to get the contents, it was possible for attackers to truncate the attribute values seen by the service provider library.


Yep. You still find variants of that today.


The latter is not a valid use case for anything using signatures, because it² is virtually impossible to implement correctly.

The former might be important for something? But generally also very nearly impossible to do correctly. When that is the case, one should contemplate whether the actual business requirements can be fulfilled another way, instead of making things broken by design.

² "canonicalize-encode-sign" as opposed to the correct way of "sign blob, parse blob".


"Throw stuff at it until it looks like it works" is probably not the development practice you want to employ for an authentication service.


and yet, "throw features at it until it meets the client's requested idealized spec, with weird carve-outs for some legacy system that we promise will be gone in a year or so, maybe" while also maybe conforming to some mythological security model drunkenly conceived by Baphomet" is where we arrive at in practice


Surely not! I'm just saying I'm not surprised too much.


I think there's a mismatch between security and software engineering. Software engineers are paid to make things work, but the point of security is to make things not work (for attackers). So there is that fundamental mismatch which leads to a lot of strife, I think. (The real answer, of course, is that the software needs to do two things instead of just one. It needs to permit authorized users AND deny unauthorized users. One is not more important than the other, both must work perfectly. But when testing it, you're mostly happy when you, the authorized user, are allowed to log in, and so tend to forget about the "what if the assertion is malformed" case.)

I think it's funny because engineers are always portrayed as being grumpy and that they hate everything, but if you look at how they handle errors or security, they seem like the type of people that like everything and always imagine the happy path ;)


I've often made this comparison. Software engineering seems primarily focused on making it work correctly, security engineering is about making it fail correctly.

But I also think that latter part should be just given more weight in software engineering in general. A simple example is regex: most people tend to think about writing regexes that positively match expected input, but this generally results in expressions that match things it shouldn't. Writing them is generally easier and more straightforward when you think about not matching what they shouldn't!


That's a good point, but I'd actually go a step further and argue that we need to think more generally about actually defining when things should fail. This is exacerbated by a tendency of many people to actually require systems to not fail on bad input but instead try to make as much sense of it as possible so that now compatibility requires you to make the same guesses as to what the user actually meant. See, for example, the tag soup parsing of HTML (which was eventually standardized in HTML5), or the insanity that is trying to parse email messages.


> This is exacerbated by a tendency of many people to actually require systems to not fail on bad input but instead try to make as much sense of it as possible so that now compatibility requires you to make the same guesses as to what the user actually meant.

Isn't this Postel's law in practice? https://en.wikipedia.org/wiki/Robustness_principle

There's a competitive advantage in handling bad input, right? I'm not saying it should always be the primary consideration and certainly not in security situations, but I think that competitive pressures are how we arrived at this situation.


You framed the issue perfectly. A mismatch of competing incentives. That seems to be a problem everywhere beyond just security. it seems that bonuses should be tied to security as well. Then maybe software engineers would be incentivized to build things securely.

A clawback of bonuses for security vulnerabilities might be an interesting idea? Or better yet bigger bonuses if there are no breaches.


> A clawback of bonuses for security vulnerabilities might be an interesting idea? Or better yet bigger bonuses if there are no breaches.

Agency problem. Those who decide budgets and project timelines get the upside in such a bonus scheme, while whose who implement get the downside. Clawback bonuses, and the engineers not only get blamed, they get laid off to make the quarterly numbers for the management who want to make up for lost bonuses.

I suspect the solution lays in the spec complexity. Haven’t thought of this much, so my wrongthink knee jerk reaction is make a set of reference test harnesses part of the SOT of the spec, and that is the first step towards more robust implementations. Would like to hear how others have approached this problem space.


All this would accomplish is engineers feeling less like they should report issues with systems.


> On top of this, I have inside knowledge that some extremely common libraries that implement SAML were built not by reading and understanding the spec, but simply by looking at sample XML documents in the wild and writing code that handled it.

Yup, and its complexity leads to something of a shortage of libraries in the first place so there's not much choice.

There's still no good modern maintained OSS .NET 5 library for doing SAML. There are some commercial offerings (ComponentSpace have a neat product and they weren't affected by VU#475445 either) but my worry is that folks will end up using whatever free libraries they can find without thinking too much about the consequences.


By "doing SAML" do you mean from the server perspective or the client perspective? This simple client library works with .NET 5.

https://github.com/jitbit/AspNetSaml/


If I'm not mistaken, the correct terms are identity provider and service provider, respectively. After all, they're both servers, in the sense that they both receive requests from the other.


The coolest part imo is that the the servers aren't communicating directly to one another. The client brokers the communication so that both IDP and SP can be on completely different networks (bridged by the client). You probably already knew this but sharing because I find it interesting (and for the edification of those that are unfamiliar).


This client library is nowhere near enough by itself. I could tell this within five seconds, as it's a single source file only a few hundred lines long.

It checks that you get a signed assertion back, which is valid, and lets you pull out attributes. That's great, but there are a gajillion tags that can be set within an assertion that have genuine, important security implications and zero of them are handled by this library.

If your IdP sends those tags along with the assertion, this library will happily ignore them unless you implement support yourself. And I would bet heavily that approximating zero of this library's users write that support.

One trivial example, there's a `<saml:Conditions>` tag. This tag can have all sorts of attributes like `NotBefore` or `NotOnOrAfter` that specify a time period for which the assertion is valid. This library does not do anything to implement support for `<saml:Conditions>` tags at all, so every consumer of this library will happily process assertions as valid even if they're outside of the duration for which the IdP is claiming it's valid. If I get my hands on an assertion at some point in time, I can hold onto it and replay it forever to unsuspecting users of this library. There are other attributes that can set conditions, and there are other tags with implications on whether or not an assertion should be considered valid. None of them are handled.

Further, only minimal document validation is performed. The signature is checked, but the Issuer is not. Multiple Issuers can share a signing key, though I suspect actual cases of this are rare. At any rate, there's a billion ways users of this library can blithely chug along with an assertion that isn't valid and shouldn't be trusted, but nobody notices or cares because the happy path works.

With authentication, yes it's important that legitimate users are able to sign in. But it's as important that illegitimate authentication attempts are disallowed. And this half is completely forgotten about in literally 100% of the common SAML service provider libraries I've seen in the wild (which is admittedly not 100% of the ones available, just the ones I've looked at).

Edit: As @tptacek points out, every user of this library is trivially susceptible to AudienceRestriction authz vulnerabilities. An IdP can include a statement in an assertion that limits it to one intended service provider (e.g., "this document authenticates user FOO for Google service BAR"). Users of this library will simply see "this document authenticates user FOO" and authenticate that user, even though this document was never intended for them. Again it's been awhile so I might be mischaracterizing it a bit but that should be the gist.


This is neat, because every program that actually uses this library probably has the AudienceRestriction authz vulnerability. It doesn't look like it's checking the Response Destination= field either.


I'd wager that most SAML implementations are a rich, bountiful source of severe security bugs. If I ever decided to get into bug bounty programs I'd absolutely start here.

FWIW, yep. It has no knowledge of AudienceRestriction:

    $ grep -i AudienceRestriction Saml.cs
    $


But why would the client need to know about the specific claims and conditions in the IDP assertion? Isn't it the SP that's supposed to verify the claims, not the client?

(note that I don't know anything about SAML except how to configure certain applications for it).


XMLSignature is one of the worst security specs i have ever read. If you want to learn what not to do, just read that.


I implemented a SAML IdP from the spec and it wasn't too difficult to get through (I implemented everything including the XML parser, XML DS (Digital Signature), XML C14N (Canonicalization), RSA implementation). I did take some shortcuts in the IdP part which just needs to produce but not validate XML DS (unless you're supporting signed requests).



This is actually something I hadn’t really considered before. Writing an IdP is in many ways probably much easier than writing a service provider because there are many fewer knobs you need to turn. Not none, but fewer. You probably have one template for assertions that contains all the clauses and conditions you care about and you set them all every time.

Clients can be much harder because you can any number of combinations of instructions thrown your way and you have to respect their intent. This is much harder to deal with.


The decryption and signature verification is the worst nightmare. Why? Because the spec signs the actual bytes of the XML, which means that if you change the order of two elements, even if everything else is exactly identical, the signature changes. Everything has to preserve the XML exactly as generated, down to the character set, or else the verification breaks.

Edit: I know the article talks about this, but my point is that his description of the round-trip instability isn't quite right. You can't change the bytes arbitrarily.


I feel that all authentication standards are bloated. Maybe there are reasons why they are like that.


Would something like this be a good use case for copilot / gpt based coding? You have massive specs in relatively structured documents, which could be used in isolated parses as prompts. The particular tuning of the model shouldn't matter if the source specifications and good examples are in the training data already.

Use few shot prompts with well put together examples, then iterating over every item in the specs, and finally running the generated code through a human review process - seems like you could shift a lot of effort towards validation, and in doubt so, devise a process subject to further automation.

Crawling the resulting codebase with various prompts like "this part of the specification could be improved by..." and then review a set of 20 completions for any real value.

This might work for xml,html, and so forth, but even if you only have a system that generates a baseline library for one standard, it could be powerful.


Comprehension of the details of a security standard is necessary for semantically correct and secure implementation of that security standard.

I can't personally fathom how throwing GPT at the problem will improve things, but I can trivially see how it could result in code that looks correct to the author but isn't to anyone who read the spec. Maybe that's a failure of imagination on my part.

Then again, I suppose that's the situation we're already in where well-meaning authors don't think to actually read the specs and so don't actually comprehend any of the details.


To an extremely coarse approximation, a lot of AI techniques boil down to similarity matching. If you've got an algorithm that is most commonly implemented wrongly (binary search is the classic example here [1]), I would fully expect something like Copilot to happily reproduce the common wrong code instead of the rare correct code. I mean, it's already been established that Copilot happily suggests completions susceptible to SQL injection attacks, and that's a security flaw that "everyone" is aware about.

And as sibling comment alludes to, security implications are very frequently areas that require very deep knowledge both of the standard and of how breaks in security protocols usually end up happening. How many commenters on this article were aware of the potential for canonicalization attacks in security before reading the article or the comments here? I'd wager relatively few.

[1] If you're wondering how almost all binary searches could be wrong, the answer is they fail to account for overflow in calculating the midpoint.


Most problems with security specs and libraries that implement them are communication problems. They involve people incompletely describing or understanding their requirements, capabilities, or threat model. Often this also involves providing/using interfaces that are not ergonomic (https://github.com/google/mundane/blob/master/DESIGN.md), which in turn can be caused by the spec trying to do too much (as XML Signature does).

I don't know how GPT could help with that. If anything I would expect it to bias toward things it has already seen, which is the opposite of what you want when writing a new spec/library aiming to avoid past mistakes.


On the flip side, I've at least one SAML implementation that was written based on the spec, that then failed in the wild because identity providers didn't follow the spec. Go figure.


> The standard is enormous

And somehow still manages not to include any helpful description of SAML.


> Let’s get rid of SAML.

Good luck, it's prolific throughout so many enterprises and academic institutions.

We might see some support start to materialise for OIDC as larger enterprises start to make use of e.g. AzureAD properly, but you're still going to see tons of universities especially running their own Shib instance. And larger enterprises are going to be difficult to shift off of SAML when they've had X years of experience with it and their supplier on-boarding processes are built around it.

The entire Jisc UK Access Management Federation is based on top of SAML, so that'll power the vast majority of academic resource authentication flows. You can argue that security is less of an issue for this use-case (particularly when e.g. Sci-Hub exists) but I think it's likely that once you have the infrastructure set-up for this it gets used for more and more services.


The largest enterprise support OIDC just fine.

Active Directory support both SAML and OIDC as of version 2016. Okta and similar support both indifferently.

SAML had a head start of a few years but OIDC caught up. It's helping that all social sign-on are exclusively OIDC.


Anecdotal but many on-premise applications I've worked with recently didn't work with our SSO when I tried using OIDC, but worked fine using SAML. The reasons were various, none were really valid.


> Good luck, it's prolific throughout so many enterprises and academic institutions.

Peasy easy. Just start mounting SAML vulnerabilities attacks on all those organizations with friendly suggestions to fix their problems, on top of juicy demonstrations of dangers. After a few will be affected, a trend will appear - in all kinds of media, among other things - that one ought to move from SAML. Of course the teachings should be done remotely, from friendly jurisdictions.

/s

We have this problem with security for years, not, decades even. Vista (Microsoft OS) was universally hated - <appropriate xkcd picture with blinking Hitler inserted> - as it tried to address gaping holes in XP security; fortunately Win 7 was much more user-friendly, so security still had some wins. But not nearly "ultimate" wins, no.


The problem is that while we're all pretty sure there are more grave SAML vulnerabilities, they're not super easy to find. You can't just read this blog post and then go attack a SAML system; finding SAML bugs is actual research.


Is that... true? I've got a moderate SAML background - written a few SAML services. What sort of attacks (or attack ideas) would you need a research background on in order to judge whether an attack is valid? As an example, one of my projects has pretty good unit/integration test coverage, and the integrations always confirm that the SAML SP will accept/reject valid/invalid assertions. Seems like it would be pretty easy to spin up a REPL and run a bunch of example docs against the service.


The audience checking authorization vulnerability is a simple example of something that test suites don't tend to cover. Case in point: there's an example of a SAML library on this thread that doesn't even know about audience restrictions.


(for a reference to "appropriate xkcd picture with blinking Hitler inserted" - https://xkcd.com/528/ )


cough eIDAS cough


> eIDAS

I'm sure a lot of governments have based regulations and standards on top of SAML... it seems exactly like the sort of thing they'd be drawn to - "mature", more complex than necessary, verbose.

Pretty sure Gov.UK Verify is SAML-based, though I get the impression the new single sign-on solution (now entirely in-house with no external IdPs?) will be OIDC based.


After having implemented this recently, I have to agree with many points. It just feels so clunky to me and there are really shitty caveats with sign out and session lifetime that I still don't fully comprehend after reading 2 dead trees worth of Azure docs.

Ignoring all other business constraints, the most secure architecture in my mind is integrated security w/ single-instance applications which are isolated to their own physical hosts.

Integrated security can include MFA as long as the additional factors are scoped exclusively to each applications' secure environment.

There is obviously a huge convenience speed bump if you leave it here, but there are small, incremental enhancements you can make to dramatically improve the UX without centralizing everything at the convenience of your attackers. For instance, you allow users to link accounts between 2 popular systems, and then rely on a simple pin code or similar for quickly jumping between them.


Well the whole critique applies directly to the XML Signature Standard, doesn't it (of which the SAML standard is just a user/application)? And sure, it is very complex and prone to bugs and security issues in implementations. Cf. e. g. the list of potential problems in http://www.w3.org/TR/xmldsig-bestpractices/


Mostly yes. But the SAML spec itself also has too many degrees of freedom, which just means too many potential bugs. SAML could get rid of 90% of the spec and still support 99% of what it is used for, while having way less security pitfalls.


The same could be said for x509 - so much crud for what boils down to a 3rd party signing a domain name asserting you have control over it. The unneeded complexity causes innumerable interoperability problems.

As for the problems caused by ambiguous representations, the most memorable one for me was the bitcoin malleability problem which boiled down to a thing people based a transaction ID on having multiple representations. Bitcoin used DER format precisely because it isn't supposed to allow multiple representations of the same thing mind you - that bug was created by openssl following John Postel's advice "be conservative in what you do, be liberal in what you accept".


I do wish the author kept it all with XML instead of doing some JSON conversion and then disclaiming it every time. I wish they used real, accurate, examples.

The author constantly disclaims their approach. Why are they demonstrating in json, and why are they demonstrating in what they call 'pseudo-saml' when everything they need is right there?

They don't prove that SAML is insecure, because there is no vulnerable SAML in the post.


> I do wish the author kept it all with XML instead of doing some JSON conversion and then disclaiming it every time. I wish they used real, accurate, examples.

I wasn't sure what the point of that was either. I found that confusing and it seemed to detract from the strengths of the article.


Having a security protocol that is so complex that it is practically impossible for a single person to understand all aspects of it sounds like a recipe for disaster. Most XML-based standards from that era seems to suffer from the disease of wanting to capture every possible use case, with the result of being so general purpose and configurable that they end up being almost meaningless - in practice, implementations will only work for a very specific profile of the standard, and that is then what you need to implement, because implementing all of the standard is an insurmountable task. Because the standard is so general, it ends up becoming unusable as a reference for validating your implementation, and as a result, you end up using non-normative descriptions of the specific profile as a reference, or even worse, you validate your implementation against a random set of example documents that you have seen on the web.

I think OIDC is better than SAML (it avoids the particular problem with malleable signatures described in TFA). You can actually read the standard to understand how the protocol works, whereas SAML is a jumble of abstract nonsense. I still think OIDC is a bit too complex, and I wonder if it really needs all those different profiles (hybrid flow, authorization code flow, implicit flow).


> Most XML-based standards from that era seems to suffer from the disease of wanting to capture every possible use case, with the result of being so general purpose and configurable that they end up being almost meaningless

Thankfully we've progressed from that era. s/XML/YAML


Another protocol that is much easier to understand (v1 is anyway) and was a really great least-bullshit introduction to how centralized auth should work is the Central Authentication Service (CAS) protocol developed at Yale[0]

> The Central Authentication Service (CAS) was developed here at Yale University between 2000-2002. In 2003, a release of the server core code was refactored in collaboration with Rutgers University and in 2004 we collectively placed the code in the public domain under the oversight of Jasig (later Apereo).

v1 of the spec is a relatively short read, and relatively easy to implement correctly. I even wrote an implementation way back when with some homegrown changes (JSON for responses). It doesn't seem to get used much outside of academia but it's very minimal and understandable, and OAuth/SAML work in a roughly similar pattern (but different, CAS is centralized though there are proxies). It's now stewarded by a company called apereo[1].

Agreed with the other commenters though, SAML login is an "enterprise" feature, so it's going to be around for a very long time -- things that warrant upselling tend to stick around.

[0]: https://developers.yale.edu/cas-central-authentication-servi...

[1]: https://apereo.github.io/cas/5.1.x/protocol/CAS-Protocol-Spe...


The vulnerabilities happened, we're aware of them now, and the major libraries (which you should be using anyway) will be sure to fix them. This isn't a compelling case to throw away SAML, just to be mindful not to make certain implementation errors.


I think this is missing a huge part of the point of the article: the complexity of the standard and the domain space it provides makes implementation errors more likely.

It’s akin to saying “making a web browser is easy, just follow the spec”; but the spec is a sprawling set of standards, with many layers of revisions and many important details left up to implementations.

In both cases, implementations may not be nearly as cautious about the finer points of the spec. That’s a huge weight for either to carry, and probably the only reason we haven’t seen a similar proliferation of browser implementations with horrific vulnerabilities is that the spec is so huge and the incentives to build a new one are so low.


> (Note: this post is all pseudo code - it’s not real SAML. Here’s a real example if you’re interested.)

*clicks link knowing full well the travesty about to be witnessed*

So many businesses want me to support SAML. I always say no.


Aha, brings back memories. Had to implement SAML login as a Service Provider once - a true abomination of a protocol, will never go near it again, and that was the easier side of things.


Looking at Keycloak as a SAML2 provider to get a beyondcorp-ish setup to enable remote work without VPN. Every employee gets a yubikey, and all web assets are protected by a reverse proxy which checks the yubikey via webauthn, authorizes the request with a KeyCloak SAML IdP which would authenticate/authorize the user against the company Active Directory.

Assuming you now require that you deploy a solution which is free of SAML2, what do you deploy instead?


keep pretty much everything you just mentioned, just with OIDC, and possibly add Zanzibar-like system for permissions with OIDC tokens being limited to just being identity bearers?


> SAML uses signatures based on computed values.

"Computed values" is almost as general as "unit". It tells me absolutely nothing.


The issue isn't that the signature is based on a computed value—the input would always be computed at some point. The issue is that the signature checking code interprets the document one way and the code using the document interprets it another way. This is less likely to occur (though not quite impossible) when the input to the signature check is defined to be a simple byte array rather than a complex XML document where only certain parts are covered by the signature.

One way to avoid this would be to require the input to be in some non-malleable normalized form: If normalization changes the document in any way then it fails the signature check. The advantages of this approach are that it doesn't require Base64-encoding the signed content and that it has no problem with embedded signature blocks.


IMO the crazy part is reusing the input directly, instead of normalizing, serializing, then verifing signature, deserializing and only ever trusting the deserialized verified data.

This way you always accept as validated exactly the same input that is signed, and any normalization or processing is irrelevant¹. OTOH, once you do it like this, I'm not sure what's the benefit of allowing malleable data in the first place.

You may need to splice data back from multiple signed parts, but I'm not sure what else would be proper, if only parts of a document are signed.

  -- Note: Only ever trust SignedData.
  playWithData :: Dangerous -> Dangerous -> Dangerous

  verifySignature
    :: Dangerous
    -> VerificationParameters
    -> (SignatureInfo, Maybe SignedData)
¹ Normalization/processing will be exposed to attackers, so it still needs to be memory and computationally safe, and you should carefully evaluate do you really need preprocessing.


I felt the same way, but I found the article’s explanation and elaboration thorough, easy to understand, and surprisingly effective at communicating the problem. You could be forgiven for stopping reading there for this reason, but if you did and you’re still interested I encourage you to read further.


Yeah, it'd much appreciated if the author explained that a bit more in the "why is saml insecure?" section


He literally does just that in the very next section; “Why is signing computed values dangerous?”


Really the only complaint I have about the article is this separate heading was unnecessary, and signaled to people who read fast with a short attention span might take that short section as “author assumed knowledge not in evidence”/“I’m not the intended audience”.


I assume he was being sarcastic


Minor nitpick: sign in with Google and Facebook don’t use SAML, they use OAuth and OpenID Connect. Leading with those as an example undermines the authors later points.


But at least Google can be used as a SAML idP for external services, which is what I think the author meant.

SAML as far as I know doesn't specify how exactly an identity provider authenticates a user but only how, once a user is authenticated, the user has a specific "identity" in the context of the service provider that initiated the authorization/authentication process. Therefore the authentication mechanism on Google/Facebook's side can be OAuth or something else, but once completed, the mechanism to convey the identity of the user to the originating service is SAML.


Why does Microsoft seem to default to SAML for organisations using Azure AD?

All our enterprise customers on the Microsoft stack indicate SAML as the only viable option, whereas those on Google Workspace or on more custom IdAM setups in my experience don’t care if you as a vendor prefer SAML or OpenID Connect.


The Enterprise Apps section is heavily SAML based, but if you want to look up how to write an app for AAD you won't likely find any SAML docs, you'll find the OIDC docs and oauth SDKs we build. If you see other places where you feel we default to SAML, I'd love to fix that.


Probably the wrong person to ask, but it would be great if there were guides on replacing SAML with OIDC, if you're already using AzureAD. Our architects are so f'ing clueless they're still telling us to use SAML rather than OAuth2/OIDC to integrate our apps with AzureAD. But if there was an official guide, I could send an e-mail blast to a few higher-ups and say "SAML is teh suck, but here is MS's guide on upgrading to OIDC, it's easy, no more excuses plz kthx"


Very much so the right person to ask! This is great to hear that there's a need for this. I think our mental model of a lot of enterprises is that "old code" stays SAML and new code (that isn't copy paste from old) is OIDC. If there's actual desire to migrate apps between the protocols, that's great. Can't promise easy though given the diversity of starting places. Thanks!


Actually our tendency is to re-use the old solution on the new code if it provides the same functionality. We still have people trying to use LDAP because nobody wants to take the time to learn something new. A migration guide makes it much more likely that they'll make an attempt to use the new thing.

Our desire to upgrade pretty much only comes from Architects telling us "thou shalt follow my shiny new standard" (and by the way, they read your docs; if you suggest SAML be dropped, they'll update their standards!). In that case we have to find time to upgrade, and of course we never just document how to do it once for our whole org, so all these engineers will be wasting time re-learning how to do the same upgrade. I'll bet you we'd save hundreds to thousands of hours per year by having really good migration docs. Even if your migration guide doesn't cover everything, they still take a significant chunk out of the time we need to figure it all out, and it lowers the mental barrier to the change.


where does it default to saml? btw. we use azure ad and only rely on openid connect.


It might just be cultural with the customers I’ve integrated with, but we’ve had a policy of requesting OIDC and then only doing SAML if that causes hiccups, and of a handful of SSO integrations with customers on the Microsoft stack there has always been hiccups. There might be other correlations here, such as the IT departments at Microsoft shops in our cases being more driven by consultants and managers.


I've definitely seen this cultural bias towards SAML. I think it may be the case that a lot of enterprises have done a recent transition into Azure AD but have the same staff who had managed a legacy AD FS and have not adjusted with the times.

My approach has been to use Keycloak as an identity broker. It's implementation is quite robust and supports a lot of flexibility in terms of mapping custom assertions and the like. But the actual application "only speaks OIDC" and relies on access tokens to be reissued by Keycloak.


ADFS on-premise support both since version 2016, some people are not aware of that. Azure support both.

The way it works in enterprise is that somebody wrote guidelines years ago that external software must support SSO with SAML. Then the guidelines were never updated and in some cases the company never realized they can support OIDC out of the box.

The exception is education, where Shibboleth is very entrenched with federations spanning all universities in some countries. Another exception for healthcare/defense that may not have updated any of their systems for 20 years, though they may not be customer if they have no internet connectivity and no SSO :D


Most likely they are talking about adding Non-Gallery Applications (Custom Applications) for SSO, the only option there is SAML

For OpenID Connect the developer has to sign up with Azure and have their app to the Gallery, you can not add a custom add yourself

Right now there are over 1100 Gallery apps using SAML, and only 500 using OpenID Connect


This is inaccurate. Multi tenant apps can simply be signed into via OIDC and they're added to your tenant, if you allow it. All Microsoft apps use OIDC and are not allowed to use SAML, but SaaS app developers are not quite as far ahead.

But yes, we don't support dynamic registration of apps for eg OIDC/oauth


So no OIDC for "internal" apps?

I had thought I'd missed something when setting up some customers, and having to refactor to use SAML for SSO where we'd used OIDC for g-suite, but you're saying unless I throw it on the gallery (public app store?), It's only SAML?


no. you register and you can either use saml or openid connect. btw. when you register your app you select if it is only for your tenant or for multiple tenants (or for microsoft accounts). your app never gets in the "public app store" unless you manually submit to. btw. this information is all public on their great docs.


> this information is all public on their great docs.

Public it may be, searchable it does not seem to be.

Every search seems to direct me towards [1] , Which is about gallery apps, and then directs me to a Table of contents entry that doesn't seem to exist (the closest I found was a tutorials page which is about connecting to a bunch of pre-existing SaaS apps, not for a "custom-developed app")

...maybe this is something where I have to burn half a week playing with Azure AD on a trial account to figure out..

[1] https://docs.microsoft.com/en-us/azure/active-directory/mana...


Steps 2 and 3 are unrelated to the gallery. The gallery entry is basically a registered url that takes you to the app, to trigger a login (2) which triggers provisioning (3) in your tenant. So you can visit any app and add it to your tenant by signing into the app. Lots of folks want a gallery entry though, so that's what the docs focus on


https://docs.microsoft.com/en-us/azure/active-directory/deve...

this is the correct link if you want to develop something with the identity platform, the other link is more or less admin documentation...


>yes, we don't support dynamic registration of apps for eg OIDC/oauth

Which is what I am talking about and ALOT of SaaS vendors do.

They have you go to AzureAD Go to Enterprise Applications, click "New Application" and then choose Application not found in Gallery.

When you do that, i can see no way to use anything other than SAML .

nothing in the Microsoft Docs, nor anything I can see on any portal gives the ability to for an Enterprise to Add their own Customer Open ID Connect application, you have to go through the process to add an App to the Gallery.


Thanks for the feedback and experience! This is the difference between provisioning (adding an instance of an app to your tenant) and registration (Azure AD knowing your app exists). The pattern you're talking about is provisioning, and sadly yes, the manual route here is very focused on SAML. But anyone at all can register an app, and then anyone else can provision it into their tenant just by signing into it.

Will check out the docs and see what we can do, it's not good this was hard to learn.


what are you talking? you can create openid connect applications in azure. and you need to sign up with azure one way or the other, saml 2.0 and openid connect applications actually need to have an app registration in azure which can only be created once your signed up?!

basically the app registration process for openid connect and saml is basically the same one?!


if I am in the AzureAD portal and go to Enterprise Applications, click "New Application" and then choose Application not found in Gallery.

When you do that, their are 3 SSO options for the new Applications, None of them are OpenID Connect

This is how 99% of all SaaS Vendors instruct people to add new Applications


You need to add an App Registration. The docs are pretty woeful!


The naming does not help make things discoverable...


Well, it is de facto standard in Microsoft world. Honestly, I don’t actually share author's opinion, but it is well presented.


I think I agree with all of this. SAML is by far the worst commonly-used industry cryptographic protocol (you could generalize and say "anything reliant on XML signatures is the worst industry cryptographic protocol", and if you go looking for stuff like that outside of SAML, you should know that a lot of the bugs are portable between systems).

I think the root cause of insecurity here is the near-universal attempt to use general-purpose XML libraries to build SAML. When the Go instability bugs were announced, I think there was a general take from the Go community that `encoding/xml` was not an appropriate foundation for SAML libraries, and, in this instance, I think the Go community was right: I think it should have been self-evident that you couldn't safely build SAML on `encoding/xml` (it doesn't even really handle namespaces!)

When I did my own SAML in Go at my last job, I wrote my own soup-to-nuts XML, including DSIG canonicalization. It was annoying, but you can't get SAML wrong; you need to be able to predict what every component in the system is going to do. What makes this worse is that most SAML systems defer the cryptographic stuff to libxml/xmlsec; for obvious reasons, some of them sane, nobody wants to implement DSIG themselves. But then they interpret the signed message in a general-purpose XML library, and now you have competing XMLs in a single system. It's bananas.

In the esoterica of DSIG, there are even worse problems. DSIG is a very flexible format; it has has pluggable canonicalizations, happily supports multiple signed subtrees under different keys in the same parent message, and supports both detached and embedded signatures. There's a famous, respawning DSIG bug where you can trick validators into verifying a signature on one subtree but passing on a different subtree to the calling application. A researcher at Duo Security tricked SAML implementations by embedding comments in text fields, which, after canonicalization, tricked parsers into returning altered text fields. Technically, SAML responses are supposed to be strictly schema-validated, so you can't sneak arbitrary subtrees into the middle of a SAML response --- but there's at least one extension field in a SAML message that has an any-typed free-form tag.

It's also worth remembering that SAML's problem domain is difficult even setting the XML part of this aside. It's pretty common to find very bad authorization vulnerabilities in multitenant SAML RPs, because you have to check mesaages carefully beyond their signatures. Many of the high-level OAuth2 protocol issues port over to SAML as well. It's tricky to audit!

People stick up for SAML because what it does (facilitating centralized SSO) is extremely valuable; it's probably more valuable than the bugs are harmful, as long as you're using well-known tools that people have already been incentivized to scrape for SAML vulnerabilities. If I was adding SAML support to something new, I'd consider beyond all the standard SAML checks also rejecting any message that doesn't have the same shape as what Okta, Onelogin, Google, or Shib generates.

But if you have the option, I'd also say avoid SAML.


What options for addressing the SSO problem would you recommend over SAML if one had the option? OIDC? Kerberos?


OIDC for everything.

Kerberos is limited to internal network and some very specific use cases (desktop auth). It's not competing.

If the company has fully integrated Active Directory/Kerberos. On any desktop computer, it's possible to get an OIDC/JWT token for the current user with a single API call. It's transparent, the user doesn't need to enter their password because they are already authenticated on the machine. That is to say, no application ever needs to support Kerberos in the current age.


We recommend OIDC, but support SAML because customers.

We implemented our own SAML processing library, too: https://github.com/FusionAuth/fusionauth-samlv2

(We pay for valid security bugs.)


How is Shibboleth SP's track record on SAML vulnerabilities, either patching them quickly or avoiding them altogether?

My company needed to implement SAML SP support in one of our products so we could get academic customers, particularly those that are part of the InCommon federation. We contracted with a company that specializes in SAML and Shibboleth to help us get it right. We decided to use Shibboleth SP running in a container; that container also has Apache httpd (as practically required by Shib SP) and a little Python shim app that generates a JWT and passes it back to our main app. Hopefully that's a good way of using the nearest thing to a canonical SAML SP implementation, without running our whole application through Apache httpd. In case anyone's interested, our Shib SP container setup, with the Python shim app, is here:

https://github.com/PneumaSolutions/shib-sp

It's probably still too specific to our application, but might be useful as a starting point for others.


Shibboleth is written by one of the authors/editors of the SAML 2.0 standard, with well funded support and a global community of very smart folks using/maintaining it. It's a pretty mature piece of software.

Things get dicier when you go to languages like JavaScript, where there aren't really well maintained SAML implementations. But then that's true for everything.


> Things get dicier when you go to languages like JavaScript, where there aren't really well maintained SAML implementations.

Elixir for me. That's why I ended up running Shib SP with a Python-based Shim app in a container.


As others have noted, many of these issues are fundamental to XML DSig, which is insecure by design. [1]

However, the “what does the future hold” of OIDC is not much brighter. OIDC is based on JSON Web Tokens (JWT), which manages to avoid some of these issues (e.g. signs the encoded value), but introduces new ones (JSON interpretation bugs, algorithm substitution bugs, etc). They’re similarly terrible by design [2].

However, what OIDC does relating to signing is far worse. In many OIDC deployments, the idea is you use something called “OIDC Discovery” [3] to discover the expected signing keys for the OIDC server. You fetch those regularly (e.g. daily), and do so over TLS. With SAML, you exchange certificates, and then rotate them every 2-3 years (with things blowing up on expiration), but with OIDC, you often end up using OIDC-Discovery, and thus can change keys daily.

This means that a single malicious TLS certificate can be used to MITM your OIDC Discovery exchange, and from there, impersonate any user from the identity provider to your system, the relying party.

I spend my days in the TLS trenches, working to improve the CA ecosystem, but I absolutely would not trust the security of all users to a TLS certificate. The reality is that BGP hijacks are still a regular thing, as are registrar hijacks. Even if you find out about a malicious certificate (via Certificate Transparency), and revoke it, virtually none of your tools doing the OIDC-Discovery fetch (like programming languages or curl) support revocation checking, and even if they did, it doesn’t work at Internet scale. To deal with this problem, some relying parties do a form of poor-man’s certificate pinning, but now they’re at risk of even greater operational failures than SAML expiration in the start.

In practice, it seems plenty of OIDC clients just shrug and go “yolo” - if they’re talking TLS to the IDP, that’s good enough, and no need to bother with signature validation of the assertion at all.

For all my hatred of XML DSig and SAML, I’ve seen few auth standards as bad as OIDC: because it looks good, but is hell to implement correctly. At least with SAML, you know it looks bad to begin with, so you’re hopefully on guard.

[1] https://www.nccgroup.com/globalassets/resources/us/presentat... [2] https://news.ycombinator.com/item?id=14292223 [3] https://openid.net/specs/openid-connect-discovery-1_0.html


> However, what OIDC does relating to signing is far worse. In many OIDC deployments, the idea is you use something called “OIDC Discovery” [3] to discover the expected signing keys for the OIDC server. You fetch those regularly (e.g. daily), and do so over TLS. With SAML, you exchange certificates, and then rotate them every 2-3 years (with things blowing up on expiration), but with OIDC, you often end up using OIDC-Discovery, and thus can change keys daily.

I would bet a lot of money that a non-trivial number of people do exactly this in the real-world using SAML (Shibboleth: FileBackedHTTPMetadataProvider or DynamicHTTPMetadataProvider). It's not always manually managed.


We do retrieve SAML federation metadata daily, but the metadata feed is signed using a pinned long-term key of the federation manager, so there's no reliance on WebPKI or even TLS. (Not Shibboleth, but it would be SignatureValidationFilter there.)


The author did a good job, but these were(are) known issues for quite a while. TLDR: implementation bugs with known countermeasures.

https://www.usenix.org/system/files/conference/usenixsecurit... (from 2012)

We've implemented SAML for 10s of millions of users and devices. The spec is verbose, but the approach solves common business issues. My suggestion is to use SAML simply: federations and passing attributes between trusted parties, allowing to verify the payloads. SAML can do a lot, but keep it simple and use OOB services for more orchestration/metadata.


The other word for "implementation bug" is "footgun". Standards with lots of footguns are bad standards. SAML has a lot of footguns. It is a bad standard.


This. Lots of SAML libraries don't fully implement the spec but I second what parent says. It does well for federation attributes.


Nobody wants to write or maintain SAML code

->

SaaS and other types of companies start solving this issue at scale

->

SAML becomes code that is mostly going through a handful of large companies that sell products in this space

->

since the protocol is now centralized, it can be updated to a better protocol


I recently worked on an integration with suomi.fi which is mentioned multiple times in this article. I even used the DVV published fork of passport-Saml. I have a few points:

- that library is not supported as such. It was published under the DVV but that’s it.

- my understanding is the SAML is legacy they are trying to move away from. This is critical gov infrastructure so I doubt they take it lightly and I would presume it’s been incredibly heavily pen-tested.

- integration was time consuming and difficult to debug errors. Would not recommend.


ASN.1 signatures aren't really much better - they go through a similar normalization step (BER vs. DER) and the payload contains the signature so you have to pull it out before validating the signature. It's just that it's been around long enough that (we hope) the kinks in the system have been discovered and worked out.


> Sure, purists may argue that storing XML inside XML as a string or bytes is ugly

Ho ho ho, they really haven't looked at a lot of XML.


I worry that this thread dissuades people from doing SAML. For our business we do only do Identity Provider SAML and it works wonderfully.

SAML has been a game changer for us. We are a SAAS business and 80% of our help desk tickets were people who could not log in. SAML has largely fixed that for us.


Just be careful that it's now not too easy for users to log in...


You can configure the application side to re-request authentication. This is similar to how Google requires you to sign in with a password when accessing sensitive endpoints like passwords.google.com.


Oh, I only meant that with SAML, the SP has to deal with maliciously-constructed assertions. A buggy SP (and as the comments on this article show, these are common) can be fooled into allowing an attacker to bypass authentication. Thereby making it a little bit too easy for users to log in. ;)


Until all of these old-ass systems that finally got to SAML move to a new SSO architecture this could largely be mitigated by using the HTTP artifact binding instead of POST or redirect.

Getting rid of SAML will be like getting rid of SMS 2FA.


This is because with the HTTP artifact binding, the SP (relying party) gets the claims directly from the IdP, and can therefore trust that the contents aren't malicious, right?

It would therefore be comparable to OpenID Connect's authorization code flow.


Yep


Thanks!


For others who may be unfamiliar with HTTP artifact binding, I found this blog post useful: https://everything1know.wordpress.com/2019/02/19/saml-2-0-ar...


Does FB use SAML for their login flows? I know Google has several different options and SAML is available, but was unaware about FB. Maybe in one of their more enterprisey apps/services?


If you want to federated eg AAD with Gsuite, it'll be via SAML, but I don't know of any consumer service that's ever supported SAML. It's so weird the author used Twitter when they were one of the classic original Oauth creators.


My thoughts exactly! A lot of the "social" things are OAuth based, not SAML. I know Google has both, but when doing consumerish things via Google, that's also OAuth based (iirc).


SAML + 2FA would still be strong vs say username/password + 2FA? An attacker would need the assertion + 2FA authenticator code.


SAML uses xml. That alone makes it at least difficult to do securely due to XXE vulnerabilities.


It’s not insecure by design: use is so complicated that you never look back when you get it working. Too complex , so too error prone for most devs makes the risks of vulnerabilities high.


... which is a way of being insecure by design. Design does not happen in a vacuum and must take the context and the audience into account.


This. I ducked out of an open security-related NPM RFC and decided my efforts are better spent recommending alternative package managers, because the maintainers/proposal advocates were exceedingly disinterested in addressing a wide variety of edge cases with a “you shouldn’t do that” attitude. Well, yeah I agree, but people can/will/do do that. If your security solution doesn’t account for how it might be misused, it’s insecure by design.

Edit: it’s the audit assertions proposal for anyone interested. I consider it so high risk that I’m actively working to move projects in my purview off NPM in case it lands.


I found it a tad bit childish with all the swearing. Swearing is acceptable and sometimes natural in live speech but writing swear words is ugly and shows a lack of education and class, even if the actual education is of the highest calibre.


Well, shit. I was surprised I made it almost to the bottom of the comments without becoming incredibly fucking annoyed by something ignorant and opinionated, but here we are. Swearing, used effectively (which I noted the article does before I got here), is a damn fine part of speech that’s valuable both for emotional expression and for relatable levity.


Note that XML is not the problem. The very same problems exist with json also. There is still no clarification on duplicate keys, key ordering, and similar unspec'd oversights.

Signing the normalized content is of course a nightmare, given that there is no spec'd normalization.


No, this is an XML-specific problem. JWT is bad, but it's not "you can create arbitrary forests of signed and unsigned trees" bad. XML sort of begs you to do that. JSON doesn't even support comments.


Though there are attempts underway to do inline signing of JWT (https://datatracker.ietf.org/doc/draft-jordan-jws-ct/), though that's still not nearly as bad as XML-DSIG of course. I'm sure some vulns will shake out of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: