*> If you write software, never be "helpful" and try to fix problems with input ...

josephcooney · on Sept 18, 2015

I don't think it is quite that clear cut. See, for example, Postel's Law AKA the Robustness Principle.

Be conservative in what you do, be liberal in what you accept from others

https://en.wikipedia.org/wiki/Robustness_principle

MichaelGG · on Sept 18, 2015

The Robustness principle isn't, and Postel's Law is pretty much a failure. This kind of thinking introduces tons of implementation compatibility issues. By being liberal in accepting from others (aka accepting malformed messages), you allow broken implementations to "work". Now those broken implementations form a de facto standard that everyone else must implement.

I demonstrated how this kind of thinking, coupled with "simple" text-based protocols, introduces security issues. SIP is a protocol with nutty parsing rules like HTTP. Lines end with CRLF, body is separated from headers by two CRLFs.

Some implementations act liberal and will accept any combination of CR and LF instead of just CRLF. So header \r\r body is OK with some implementations, and not others. Which means some stacks will read body as more headers. It's not hard to see how this creates a security problem, as you pass a message to a trusted proxy and it asserts things are OK, except the two stacks don't agree on what the headers actually are. Oops. This is a real, live, issue that affects SIP networks today and can be exploited for profit. And it's hard to fix, because some networks are actually sending non-CRLF lines, creating a compat issue. If implementations had been harsh on the CRLF requirement, those networks wouldn't be sending non-CRLF lines, as it would never have worked in the first place.

In short, being liberal just means "each implementation creates its own interpretation". This is because not all impls are going to agree on what "liberal" means. And if "liberal" could be defined, then it should be defined in the spec! No need for interpretations.

pilif · on Sept 18, 2015

The problem with

> should accept non-conformant input as long as the meaning is clear

on that article is that the definition of "clear" is somewhat murky. Let's use the MySQL string truncation that spurred this thread as an example.

When asked to put 300 character string in a 255 character wide field, for MySQL it was "clear" that the user only meant to actually store 255 characters. From the perspective of MySQL, that actually makes sense: The field is declared as storing a maximum of 255 characters, so obviously, the user intends to store a maximum of 255 characters in that field.

Now look where this interpretation of "clear" facts has lead us to.

The issues that can be caused by "fixing" up invalid data are usually much harder (and more embarrassing) to fix than an exception that happens early.

Trust me: I'd rather get an exception than a CVE when given the choice.

Furthermore: By "fixing" up data, your fixes become part of the protocol. Implementations of clients derived from the broken implementation might not notice the brokenness and suddenly you're stuck with having to fix issues the same way for all eternity because becoming more strict will cause backwards compatibility breaks.

Worse: You also lock yourself out of the ability to later actually extend the protocol in a meaninful way because you're already accepting broken data.

Let's say you have a JSON based protocol that has a flag "foo" that can be set to some value: {"foo": 12}. Now there's a broken implementation of a client around that sends {"foobar": 12}. As you're "sure" that they actually mean "foo", you add an alias "foobar" to mean "foo".

Even though you've never intended it, now "foobar" is part of the official protocol and clients start sending this all over the place.

If at a later point, you actually want to support "foobar", you can't because that's already a hack to mean "foo", so now you'll end up with some crap like {"real_foobar": 1234}.

So not only is this behaviour irresponsible to clients (see the bugzilla issue), no, it's also a sure way to make your own life harder in the future as it makes for harder to maintain code and makes you lose flexibility in protocol design.

dspillett · on Sept 18, 2015

> should accept non-conformant input as long as the meaning is clear

This is the part I don't agree with their. If the data is wrong it isn't getting into my database.

The principal breaks itself IMO: if I accept junk, then when asked for data I may have no choice but to respond with junk. Accepting too liberally precludes being in control of how conservative you are in what you output.

Of course I'm talking ideally here. In the real world sometimes there are inputs you have to accept stuff from which you have no control to correct or authority to reject, but except when that is absolutely the case reject away and demand the other side fix their output. If you do accept iffy data make sure it is marked as such as early as possible and that mark stays with it for as long as relevant, so you can identify any data that has has an unsafe transformation applied or has simply been left incorrect.