HTML Validation: Does It Matter?

mdasen · on March 6, 2009

I think the point of HTML validation is that there was a time when there were many browsers that behaved incredibly differently with malformed HTML - the days when everyone was creating their own rendering engine and all of them were terrible; the days of HotJava and iCab. I'm assuming that the people back then assumed that we'd get more, not fewer, rendering engines to deal with and that if there were standard, valid ways of writing HTML, they could all conform to the standard and life would be good.

Little did they know that Microsoft would come along and tear down that world. Today, we've seen it bounce back a little, but really there are three rendering engines - Trident, Gecko, and WebKit. And, compared to what we had back in the Netscape days, they're all good engines. Sure, we might consider Trident crap by comparison to Gecko and WebKit today, but it's clearly a cut above what we were seeing in the Netscape days.

I think the reason that people today don't care about validation is that perfectly valid code can render differently in the three different rendering engines. If valid HTML doesn't give you the benefit of proper rendering, then what value does it give you? If a rendering engine is "wrong" in how it renders your page, you can bet that's your problem and not the rendering engine's problem - at least it will be according to your visitors. And three rendering engines aren't too hard to test - especially given the similarity in a lot of the rendering between WebKit and Gecko.

As for XHTML and HTML, Apple's "Surfin' Safari" blog has a great post on it: http://webkit.org/blog/68/understanding-html-xml-and-xhtml/. I guess XHTML never really fulfilled its purpose as having machine readable data in HTML - possibly eclipsed by things like RSS.

olavk · on March 6, 2009

As others have pointed out, search engines also need to parse your pages to extract text and links. You can't really test if Google parses malformed HTML the way you think it ought, so I would say better safe (valid) than sorry.

alabut · on March 6, 2009

I think the reason that people today don't care about validation is that perfectly valid code can render differently in the three different rendering engines.

This is exactly true and the reason is that web standards aren't real standards by the textbook definition of the word, mainly because they lack what's called a "reference implementation".

I forget where I read it first but the overall gist is that in other fields, say industrial design, a standard typically isn't ratified as finished until the governing body creates an example for everyone else to refer to, the reference implementation, rather than just relying on the written descriptions of how the thing should work. For example, Firewire has a reference implementation that's certified by the IEEE and that hardware makers would refer to when building out the first versions of it. You could grab the example version, plug it in the way it's supposed to, test out the data that's flowing through it, measure what speeds its capable of, etc.

Since web standards don't have that reference implementation by the W3C, browser makers are left to their own devices to parse the insanely complicated technical language on their own. And it's only getting worse - the html5 spec is something like 10x the amount of writing as the last version of html. This is one of the biggest reasons why rendering engines have effectively stagnated in the last few years - things are "good enough" that they can wait around until the spec makes sense and there's working examples in the real world, and/or they are spending a gazillion hours trying to figure out what the new specs actually mean.

This is the problem with functional specs that 37signals rails against when they say that "Functional specs only lead to an illusion of agreement":

http://gettingreal.37signals.com/ch11_Theres_Nothing_Functio...

This is a super sad state of affairs, and not just in the theoretical sense of how fast the pace of development could be for browser makers today if things were more streamlined, it's also a huge hindrance for developers in the trenches like me. I haven't thought about this for years but I used to relish reading the CSS spec and exploring new ways of pushing my work forward. Trying to read the W3C docs now is like speed reading pig latin - you know there's some cool stuff in there but it hurts your head just to even try hunting for it.

anatoli · on March 6, 2009

I wish people who know little about HTML, XHTML and CSS wouldn't write such posts just for the hell of it. Jeff is a respected programmer, but front-end developer he is not.

I'm not going to go into details of why he's wrong, because a) this has been discussed to death by people who actually matter in this space, and b) there's a ton of comments on his post explaining what's wrong with his view.

There are ways to argue that validation is unnecessary; however, they do not include: 1) bringing up examples such as "target" attribute, and 2) saying "who cares if it works anyway."

olavk · on March 6, 2009

He does seem to be pretty misinformed. For example, he chooses to validate against HTML4 strict, and then compains that attributes like target and width don't validate. However HTML4 includes two additional doctypes (beside strict): transitional and frameset, exactly to cater to people like him who like to use frame-related features and presentational HTML.

His argument about user-generated content is also spurious. If you dont parse and filter user-generated content (with a real HTML-parser, not just regexes), then you are vulnerable to script-injection attack.

biohacker42 · on March 6, 2009

I don't know any programmers that respect Jeff, I can't imagine any who value C would respect him as a programmer.

Jeff is a respected blogger.

req2 · on March 7, 2009

Jeff is a popular blogger.

tokenadult · on March 6, 2009

Two consecutive comments to the submitted article:

"Google actually ranks it's indexed pages. The more valid the (X)HTML of your pages, the higher it'll appear in a search."

"If you do write XHTML, you'd better get it right. I heard about CodeProject practically dropping off the map because of an XHTML error that caused Google to stop ranking them."

Is this indeed documented or well confirmed behavior of Google? That might be a pretty important reason to validate.

anatoli · on March 6, 2009

As the Google crawler is a just a basic HTML parser, it makes sense that certain errors could cause major problems for a site — messed up nesting, missing quotes, etc. Google likely has fairly sophisticated error recovery in place, but I'm sure it doesn't cover everything. As such, it's always to one's advantage to validate their site in order to minimize friction.

ejs · on March 6, 2009

Building an entire site, then fighting with it to make it validate is obviously going to be frustrating. Would you wire up a house, then whip out your electric code book and go back to fix everything thats wrong? Just because you don't agree with something, and it still works doesn't mean its a good idea...

Anymore I actually like validating, I use it to check to make sure nothing is blatantly wrong. There are a few rules that do not make much sense, but overall I find it very helpful. But to make use of it this way don't go and build the whole site, then try to fix all the problems and bad decisions made earlier, use it through out the design process.

I know this blog post comes to this same conclusion by the end, but it just feels so wimpy and pandering to everyone. Through the whole post it goes from validation is pointless, painful, useful, helpful... do whatever you want. Maybe the author should just take a stand and stop trying not to offend anyone.

Hexstream · on March 6, 2009

The interpretation of valid HTML is standardized. The interpretation of invalid HTML is undefined. Make your choice.

Error handling is not defined in the standards so if you rely on it you're at the mercy of every idiosyncratic error handling decision of every idiosyncratic browser implementor.

carlosrr · on March 6, 2009

HTML validation is of great value when debugging, even javascript. It can discover things that are easy to miss like unclosed tags and repeated ids.

It is difficult to write a whole website, validate it and fix the errors and at this point validating has little value. However by validating periodically you are getting more out of the tools available. I don't force myself to write 100% strict HTML, but minimizing the validation errors means that I will have less to read when it's time to.

jrockway · on March 6, 2009

Validation probably doesn't matter to your users anymore. (If everyone wrote valid markup back in the day, browsers would be faster, more secure, and more stable. But that ship is sailed, so you're stuck with broken browsers for backwards compatibility reasons.)

With that in mind, though, it's easy to write valid markup. All the rest of the code in your app has to validate, why negelct the HTML? Making sure that your markup is valid will save you time when you have to debug a weird rendering issue (or JavaScript's view of the DOM).

Finally, if you even have the opportunity to pass invalid markup from your app to the browser, you are probably doing something wrong. In my apps, I load the web designer's templates (which may be invalid HTML) with libxml2 (it has an HTML loader), process them programatically, and then serialize the DOM for the browser. This way, the browser always gets something valid -- my program's type system enforces this.

Valid HTML means you have one less thing to worry about.

jollymoon · on March 27, 2009

Never can tell??

A couple of days ago I built a new webpage and as usual tried to validate my HTML 4 strict and CSS 3 . The HTML4 validated just great with CSS2.1, but threw a tantrum when I posted the CSS3 validation code and reran HTML4 Strict.

This is the offending line:

&profile=css3&usermedium=all&warning=1

which is OK on the CSS validation page, but will not pass inspection if ran AFTER posting the WHOLE link to my web page. Everything is the same except for that last line.

Run as CSS2.1 and all is well.

Run as CSS3 and the validator gives 6 errors on this part of the code alone. It seems to hate the " & " and the " = " and even the " W " in warning?? !!

Would like to validate HTML 4.01 Strict with CSS3.

Any help??

sethg · on March 6, 2009

It seems to me that validation would be especially helpful when you're composing one Web page out of a bunch of smaller components; if, say, each component is in its own <div> and your validator confirms that it's a proper <div>, then you can have more assurance that your code will stitch them all together into proper <html>. If one of the components is invalid, you run a greater risk that when that component is included in a page with other stuff, the interaction among all the pieces will create an ugly surprise.

ars · on March 6, 2009

My rule is this: it must validate unless you have a good reason for it not to.

When I develop I use a firefox addon http://users.skynet.be/mgueury/mozilla/ that automatically tells me how many validation errors I have (it runs the validation locally).

So as much as possible my code validates, but sometimes it doesn't - but each time, there is a reason for it, and it's not just an accidental mistake.

jlujan · on March 6, 2009

Ever try debugging javascript in a dynamic site with invalid HTML... enough said. No, the user shouldn't care but so little of the code we write is actually for the user.