Hacker News new | past | comments | ask | show | jobs | submit login

In general: never mix user input indistinguishably from control content.

In HTML generation, that means you never dynamically parse HTML generated directly from raw data. E.g. in React, you define your content entirely in terms of components that are explicitly defined in your code (no HTML parsing - they're objects in your code, explicitly built by your code) and content for those components (still no HTML parsing, they directly become text either for attribute params, or for text nodes that sit within your HTML tags).

It's the same as protecting against SQL injection. Don't interpolate user input directly into a SQL string and then parse it. Define/generate the query in your code separately, with explicit parameter placeholders for where dynamic input will go, and then provide variables that can be safely used for those parameters, but which are never parsed as part of the query itself.

For HTML, the only time this becomes challenging is when you accept user input that is itself HTML already. In that case, you have to parse it, and you have a big problem.

Don't do that, wherever possible. Accept structured data, accept plaintext text, accept markdown (without HTML tags either disabled or _extremely_ carefully limited), but wherever possible, don't accept and output HTML given to you by a user. For similar reasons, don't accept SQL queries uploaded by a user.

Dompurify exists for the case where you can't do this, but in general it's better, and not that hard, to avoid needing it at all.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: