> If you write software, never be "helpful" and try to fix problems with input data you can't handle.
Yep. Accepting bad data in any way is a bug, not a feature, and it isn't always (I'd suggest it isn't commonly) intentional: for instance mySQL used to accept dates like 2015-02-30 simply because its internal date format permitted it rather than because they were explicitly allowing it - it was an accident not an attempt to be helpful.
Accidental bugs that accept bad data are often indistinguishable from intentional "help", and can get passed up through a stack of calls (App A uses library B that wraps library C which calls service D that uses library E...) so even if the help is very well documented at source that knowledge is usually lost as you move up the chain and the application dev is left with a library that accepts bad data unbeknownst to him/her which could end up masking a serious bug that will bite them later.
> "Fixing" broken input data is nothing but actually corrupting
Exactly. The old "be strict in what you send, but generous in what you receive" is often misinterpreted - you should only be generous in what you receive if everything you receive can be properly handled all the way into your stack and all the way back out again, and never be generous enough to accept bad data. Back to the date example: yes, accept any unambiguous date format the user might throw at you, but never accept an invalid date.
> If you are in a position where you can't do something based on the input data, then blow up.
Definitely. If something could fail it should fail as early as possible, and not quietly in development (caveat: failures should not be outwardly verbose in production for security reasons), that way you are more likely to catch problems early, before they compound into something worse.
The Robustness principle isn't, and Postel's Law is pretty much a failure. This kind of thinking introduces tons of implementation compatibility issues. By being liberal in accepting from others (aka accepting malformed messages), you allow broken implementations to "work". Now those broken implementations form a de facto standard that everyone else must implement.
I demonstrated how this kind of thinking, coupled with "simple" text-based protocols, introduces security issues. SIP is a protocol with nutty parsing rules like HTTP. Lines end with CRLF, body is separated from headers by two CRLFs.
Some implementations act liberal and will accept any combination of CR and LF instead of just CRLF. So header \r\r body is OK with some implementations, and not others. Which means some stacks will read body as more headers. It's not hard to see how this creates a security problem, as you pass a message to a trusted proxy and it asserts things are OK, except the two stacks don't agree on what the headers actually are. Oops. This is a real, live, issue that affects SIP networks today and can be exploited for profit. And it's hard to fix, because some networks are actually sending non-CRLF lines, creating a compat issue. If implementations had been harsh on the CRLF requirement, those networks wouldn't be sending non-CRLF lines, as it would never have worked in the first place.
In short, being liberal just means "each implementation creates its own interpretation". This is because not all impls are going to agree on what "liberal" means. And if "liberal" could be defined, then it should be defined in the spec! No need for interpretations.
> should accept non-conformant input as long as the meaning is clear
on that article is that the definition of "clear" is somewhat murky. Let's use the MySQL string truncation that spurred this thread as an example.
When asked to put 300 character string in a 255 character wide field, for MySQL it was "clear" that the user only meant to actually store 255 characters. From the perspective of MySQL, that actually makes sense: The field is declared as storing a maximum of 255 characters, so obviously, the user intends to store a maximum of 255 characters in that field.
Now look where this interpretation of "clear" facts has lead us to.
The issues that can be caused by "fixing" up invalid data are usually much harder (and more embarrassing) to fix than an exception that happens early.
Trust me: I'd rather get an exception than a CVE when given the choice.
Furthermore: By "fixing" up data, your fixes become part of the protocol. Implementations of clients derived from the broken implementation might not notice the brokenness and suddenly you're stuck with having to fix issues the same way for all eternity because becoming more strict will cause backwards compatibility breaks.
Worse: You also lock yourself out of the ability to later actually extend the protocol in a meaninful way because you're already accepting broken data.
Let's say you have a JSON based protocol that has a flag "foo" that can be set to some value: {"foo": 12}. Now there's a broken implementation of a client around that sends {"foobar": 12}. As you're "sure" that they actually mean "foo", you add an alias "foobar" to mean "foo".
Even though you've never intended it, now "foobar" is part of the official protocol and clients start sending this all over the place.
If at a later point, you actually want to support "foobar", you can't because that's already a hack to mean "foo", so now you'll end up with some crap like {"real_foobar": 1234}.
So not only is this behaviour irresponsible to clients (see the bugzilla issue), no, it's also a sure way to make your own life harder in the future as it makes for harder to maintain code and makes you lose flexibility in protocol design.
> should accept non-conformant input as long as the meaning is clear
This is the part I don't agree with their. If the data is wrong it isn't getting into my database.
The principal breaks itself IMO: if I accept junk, then when asked for data I may have no choice but to respond with junk. Accepting too liberally precludes being in control of how conservative you are in what you output.
Of course I'm talking ideally here. In the real world sometimes there are inputs you have to accept stuff from which you have no control to correct or authority to reject, but except when that is absolutely the case reject away and demand the other side fix their output. If you do accept iffy data make sure it is marked as such as early as possible and that mark stays with it for as long as relevant, so you can identify any data that has has an unsafe transformation applied or has simply been left incorrect.
Yep. Accepting bad data in any way is a bug, not a feature, and it isn't always (I'd suggest it isn't commonly) intentional: for instance mySQL used to accept dates like 2015-02-30 simply because its internal date format permitted it rather than because they were explicitly allowing it - it was an accident not an attempt to be helpful.
Accidental bugs that accept bad data are often indistinguishable from intentional "help", and can get passed up through a stack of calls (App A uses library B that wraps library C which calls service D that uses library E...) so even if the help is very well documented at source that knowledge is usually lost as you move up the chain and the application dev is left with a library that accepts bad data unbeknownst to him/her which could end up masking a serious bug that will bite them later.
> "Fixing" broken input data is nothing but actually corrupting
Exactly. The old "be strict in what you send, but generous in what you receive" is often misinterpreted - you should only be generous in what you receive if everything you receive can be properly handled all the way into your stack and all the way back out again, and never be generous enough to accept bad data. Back to the date example: yes, accept any unambiguous date format the user might throw at you, but never accept an invalid date.
> If you are in a position where you can't do something based on the input data, then blow up.
Definitely. If something could fail it should fail as early as possible, and not quietly in development (caveat: failures should not be outwardly verbose in production for security reasons), that way you are more likely to catch problems early, before they compound into something worse.