html5lib is nice, but it's pretty slow. On a fairly large test file, lxml took 50ms and html5lib took 5 seconds, which is 100 times slower. Are there any particularly slow parts of html5lib that could be optimized? Would compiling it with Cython help?