That is a poor example since you should not normally have to code like this when handling user input on the web.
You must escape all user input you send to the database, and this is normally done by your database driver by using parametrized queries (be they prepared or not). And when the data is retrieved from the database you generally do not have to unescape anything, especially not since the returned data is not SQL strings but in whatever binary protocol is used and is unescaped (if necessary) by your driver.
So what your code should look like is something like:
db.execute("INSERT INTO foo (a, b) VALUES ($1, $2)", a, b)
So for the database you just have two types. Queries and data. Data should always be treated as unsafe while queries is the special class that could help out if you do a lot of manual query building.
Not so. What if something fails database validation so you want to display it in the form with an error? This can lead to attacks where a malicious user gives the victim a specially crafted URL that inserts a script into their page, then when they access the URL the script executes inside their browser with their session.
For this reason any user input needs to be escaped whenever it is rendered in HTML as well. If you got it from the DB it should be safe, but if you got it from the url parameters it isn't.
Data from the database is just as unsafe as data from URL parameters which is why escaping of data must be done in the presentation layer. This is basically the same thing as SQL queries. You have a template like this (most of the HTML omitted).
<li class="<%= user_type %>"><%= username %></li>
Where username and user_type are variables containing data fetched from the database. This template is then compiled to something like the blow. Strings literals of the HTML-safe type start with s", and concatetnation to the output buffer is <<.
So to implement your own templating language and and template helper functions this is a useful concept. But query building should be in the database layer and HTML building in the presentation layer. People using the database should not have to know of or care about that the data might at some point be displayed in a HTML page.
This makes sense. You're totally right about the database. I've run into that issue before, but I had forgotten.
Your example is similar to the approach that Yesod, a Haskell web-framework, takes. The templating language, Hamlet, will accept many types, such as Text or String, but ultimately everything gets translated into RepHTML before being rendered to a real HTML string that is sent to the client. Whenever you give the template a value it automatically runs the appropriate conversion function based on type to generate something safe for display on the page.
The only way to get from a String to RepHTML is either to use the proper escaping function (usually done automatically) or use a special function to do an unsafe conversion, forcing you to explicitly state that a conversion is safe.
The type system is one of the tools used to guarantee that these things get sanitized. You can generate RepHTML in other places, such as through a widget, or other helper functions, and it knows at that point that you properly escaped it. Without the type system you could run into issues when a function generates HTML (sanitized) and then the final template re-sanitizes it. It would be very awkward to do things like layout templates if things are getting sanitized many times.
Yeah, that is how I expect a templating system to work. The one in Rails (modified ERB)works in the same way. It has a SafeBuffer (name taken from memory) class which is a subclass of String. Strings can be converted into the safe class either by escaping or through unsafe conversion which emans that we say the string is safe.
You must escape all user input you send to the database, and this is normally done by your database driver by using parametrized queries (be they prepared or not). And when the data is retrieved from the database you generally do not have to unescape anything, especially not since the returned data is not SQL strings but in whatever binary protocol is used and is unescaped (if necessary) by your driver.
So what your code should look like is something like:
So for the database you just have two types. Queries and data. Data should always be treated as unsafe while queries is the special class that could help out if you do a lot of manual query building.