Perhaps I was unclear, but I did not claim that there could be one single saniti...

jameshart · on Aug 2, 2013

That's not called 'sanitizing', it's called 'escaping' and 'encoding'.

The byte sequence I need to store to communicate the name "Kei$ha O'Shaughnessey, Jr." in a UTF-8 JSON string literal, a UTF-8 HTML attribute, a UTF-16 bigendian CSV file, or an ISO-8859 SQL parameter, are going to be different - but so long as all the characters I need to pass are representable in all of those domains all I have to do is perform the correct escaping and encoding. At no point do I need to 'sanitize' the name. It's a name, it's not dirty.

If there are characters there that I can't represent in the target domain, then I need to handle the loss of information.

neilk · on Aug 10, 2013

A strategy of 'escaping' assumes that the partner system does the right thing with its data. This is not always the case.

For instance, it may be perfectly fine in my system to have a user named '<script>alert("ha!")</script>'. Are you sure that's okay in your PHP-based web forum? Really sure? Every place they've ever shown a username to the user, it's well-escaped?

And even if that's true today, what about the day when someone decides to change the web forum software to something else? What about the day when someone turns on a feature that copies certain forum threads to an internal support system, also provided by a third party?