An HTML5 parser for Javascript

I’ve been in the process of writing a port of the HTML5lib HTML5 parser to Javascript, at the moment, specifically node.js.

The parsing algorithms laid out in the spec are really excellent: The fallbacks for various cases where tags are omitted are mostly elegant and entirely clever. Supporting fragments of XML languages like SVG and MathML inline in HTML is excellent – with any luck, we’ll see a lot more rich vector graphics in web pages now, without dropping down to a box full of Flash.

The parser is currently a bit slow, and I’ll blog about why soon – suffice it to say that V8’s string-handling leaves a lot to be desired when you’re poking at numerous, tiny pieces of them, rather than larger manipulations.

Anyway, check it out.