lxml is definitely faster, but I've found BSoup to be more forgiving with poorly...

kenneth_reitz · on Oct 18, 2011

BeautifulSoup is poorly maintained — you have to be very specific with which version you're using.

Note: Lxml has a number of repair modes that allows it to parse virtually anything. Cpu cycles and memory go up quite a bit when they're activated, but it's still better than BeautifulSoup.

ericflo · on Oct 18, 2011

Thankfully lxml has a slower-but-more-forgiving mode that you can use when interacting with poorly formatted HTML, which takes advantage of BeautifulSoup http://lxml.de/elementsoup.html

sudont · on Oct 18, 2011

I’ve found the exact opposite. BSoup will choke on invalid tags in the DOM, such as: <div id=“content”><content>…</content></div>

If I try to return the innerHTML of #content, I get '<div id=“content”><content>’ as a string, nothing else.

While I know that’s inexcusable markup, it’s nothing I have control over.

lxml (if it builds on the target system) has been much better for my scripts.