var t = document.createTextNode(msg);
content.appendChild(t);
That code sanitises all possible content in msg. I don't need to list out HTML tags, script/style tags, do special case for unicode exploits, etc.
You need to list what variables are "unsafe", but you don't need to list out the ways they might be unsafe. If it's got the potential to be unsafe, assume it's completely unsafe in every conceivable way, and don't use it in any context apart from as an unsafe text string.
The rookie code is something like:
msg.replace("something I think is unsafe", "something safer");
content.innerHTML+=msg;
And agreed. InnerHTML should be removed from browsers.
I think the point was that it's inherently less safe to allow arbitrary markup and then attempt to sanitize it, than to make a full parser that's incapable of generating unsafe HTML at any stage, all other things being equal.
The safety of widely-deployed Markdown + sanitizer libraries is largely thanks to testing at scale and a history of patches for XSS vulnerabilities.
I agree
> You sanitise everything to start with.
So you need to list everything you need to sanitise...
A better approach is to ban "innerHTML" from your code. You should always display user generated text in text nodes.