Hacker News new | past | comments | ask | show | jobs | submit login

Um... Incoming Naive Questions.

Why do we need yet another HTML5 Parser? What's wrong with Webkit? What's wrong with the new Gecko2 HTML5 parser?

And what license is it?




A standalone parser written in C is a great asset. Pretty much any language worth mentioning has C bindings, so they are now just a bindings implementation away from having a reasonably fast (the fact that performance was a non-goal notwithstanding), standards compliant HTML parser. This is an improvement over the status-quo where most languages have bindings to lxml which is fast but has made-up error handling and a tendency to deal poorly with quite a lot of content, and some languages have slow, native implementations of the HTML standard parsing algorithm (I wrote much of Python's html5lib so I am aware both that it is slow and that it is non-trivial to speed up).

Compared to Gecko and WebKit, this gives you just the parser, which is significantly simpler than the whole engine and all you want for many applications.


Because WebKit and Gecko are not only parsers for starters. They are much more complex layout engines. Which is a whole other can of worms.


The new Gecko parser is based on a Java->C++ translation from this standalone parser: http://about.validator.nu/htmlparser/





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: