False surely, as another poster commented you want to sanitise inputs for example for user signatures to remove Javascript and other nasties. Sanitising inputs isn't just about protecting against SQL injections.
What the author actually means is the removal of apostraphes to prevent SQL injections can affect your data integrity, so paramatise your queries.
Alternatively replace single apostraphes with double apostraphes in your queries also works, but paramatising queries is a much better practise to get into.
No, TFA is right. If the user wants to post <script>alert("I am a hacker.")</script>, so be it. Display it literally. You do have to take care to escape it when you are rendering your HTML. But guess what? You have to anyways since <script> is not the only evil tag out there. XSS can be performed a number of ways and you are not going to catch them all by removing stuff from user input.
No. What you are describing is a way of doing things that inherently leads to security vulnerabilities, because it depends on someone else remembering to include something in their code, and people will always forget to do the right thing at least some of the time.
Developers should never allow input into the system that doesn't match their expectations about what is valid for that value/field/etc. If you have a user text input that only requires alphanumeric characters, space, period, and comma, then strip out any character which is not one of those things. That field is now no longer a possible source of XSS.
The problem is that there are plenty of fields where your attempts at filtering will break user expectations horribly if you filter the data even remotely strictly enough to ensure security.
Such as, say, comment fields. It'd be terribly restrictive for your users if they can't write about <script> tags on a technical forum without munging it.
And you're still not safe. All the characters needed for an SQL injection attack, for example, commonly occur in normal English usage. All the characters needed for XSS commonly occur too, so you'd need more restrictive filtering.
And have fun when a bug that causes your filter to be more restrictive than it should now means data is unretrievable because you've just stored the sanitised output of your buggy filter.
Once you've dealt with that, you're still facing the issue of changing filtering requirements: What is safe for HTML may not be safe for your CSV export. What is safe for your PDF generation may not be safe for your HTML generation, and vice versa. Suddenly you're asked to pass data via an API, with different expectations of what a "safe" value contains. Boom.
In other words, if you believe that what is in your database is safe from causing security problems, you've lost. You need to treat every piece of data that may possibly contain user input as a potential cause of problems whenever you output it or pass it on anywhere, whether or not you've (attempted) to validate and restrict the input.
A typical example I used to have to deal with: Mail systems. HTML that is entirely safe when downloaded and rendered by a mail client that contains the HTML in a document that is just for that one e-mail, can leak data all over the place and compromise the users account if left unfiltered when rendered on the web server. You can't insert it pre-filtered into the database without inserting the raw content too because the user may want to download it.
And because the only reasonably safe filtering method is white-listing tags and CSS due to evolving standards, you will regularly have to revise the filters and add functionality and people will be very annoyed if their e-mails still don't render correctly after you've fixed the bugs (and if you have to tighten the filters again, you don't want to have to re-filter all the data).
Because we can talk about <script> tag on HN, the string "<script>" surely is a valid and expected input.
> If you have a user text input that only requires alphanumeric characters, space, period, and comma, then strip out any character which is not one of those things. That field is now no longer a possible source of XSS.
Except when somebody puts such "safe" string in an unquoted HTML attribute... Seriously, thinking of data as "safe" (safe to be carelessly mishandled...) is a fragile approach.
> it depends on someone else remembering to include something in their code
Get a template engine that escapes everything everywhere by default, so you won't need to remember to escape (or "sanitize"!) each thing.
But instead you are randomly changing information, which is a terrible idea.
If you get invalid input, you have to reject it, not just silently make it fit by changing the input (the only exception is when you can be sure that changing the data will not under any circumstances change its meaning).
Your comment highlights exactly why the article makes a compelling point. I mean, you make several suggestions before you get to one thing you need to do on input:
> but paramatising queries is a much better practise to get into.
What the author actually means is the removal of apostraphes to prevent SQL injections can affect your data integrity, so paramatise your queries.
Alternatively replace single apostraphes with double apostraphes in your queries also works, but paramatising queries is a much better practise to get into.