Skip to content

Broken using beautifulsoup4 #276

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
nabeelio opened this issue Jul 15, 2016 · 6 comments
Closed

Broken using beautifulsoup4 #276

nabeelio opened this issue Jul 15, 2016 · 6 comments

Comments

@nabeelio
Copy link

nabeelio commented Jul 15, 2016

Started with 1.0b9

File "/Users/.../.tox/py35/lib/python3.5/site-packages/bs4/builder/_html5lib.py", line 70, in <module> class TreeBuilderForHtml5lib(html5lib.treebuilders._base.TreeBuilder): AttributeError: module 'html5lib.treebuilders' has no attribute '_base'

To replicate, just install beautifulsoup4==4.4.1, and then html5lib, just from bs4 import BeautifulSoup, and it'll break.

Not sure if this is a bs4 issue, or a html5lib problem, but bs4 hasn't been updated in quite a while.

Can the versioning also be a little more sane? It's hard to see how many 9's there are, took much longer to figure out that it was a newer version that was getting installed. With a (super-) minor version change, I don't think this interface should have really changed, but it's just hard to tell with the versioning being abnormal

@gsnedders
Copy link
Member

The verisoning number is hard to make sane at this point. It started off as a bit of a joke but it's rather ended up farcically… (It was never intended to get anywhere beyond 0.9999!)

Essentially, 0.99999999 contains most of the API breakage that's needed to reach a point where there's a sane, stable API for 1.0 (and I think I can genuinely say we're actually, finally, realistically close…); some of which was needed to fix long-standing bugs and others that we decided to ship in the same release to just have a single breaking release rather than many. As with many projects, while we've been 0.y we've never guaranteed API stability (though it has indeed been a while since we last broke anything).

I believe the majority of the breakage around BS4 is going to simply be html5lib.treebuilders._base being renamed to .base (it's not private and never has been, so shouldn't have ever been underscore prefixed); I did debate fixing this myself but when BS4 has had major bugs with html5lib for years (seemingly without any attempt to run the tests for the treebuilders…) and my patch for all of those still hasn't been commented on or anything after 7 months, I don't feel like there's much point in my writing a patch for the API changes.

@nabeelio
Copy link
Author

nabeelio commented Jul 15, 2016

@gsnedders No worries, thanks for the quick reply. Yeah, I tried disabling the html5lib parser inside BS4 (just defaulting to the built-in html.parser), but it's still loading the html5lib classes on import 👎. But I fixed the version to the Sept 2015 version.

Is it possible, within the treebuilders.__init__ to:

from . import base
_base = base

Kinda ghetto and a crap workaround. If you want, I can submit a pull-request over the weekend, and while I am philosophically against it, it might save some time with others, esp if they're using both libs.

Or if anything, in the README, just stick a note that for use with bs4, use the 0.9999999 version?

@patoroco
Copy link

We fixed it forcing version on our project requirements.

It has already been reported on https://bugs.launchpad.net/beautifulsoup/+bug/1603299

@gsnedders
Copy link
Member

@nabeelio I debated putting something like that, but it essentially just becomes a ticking time-bomb to the point at which it is removed (and probably that would be 1.0, and I hope that isn't far off). That's also an approach that only works for some of the API changes, as some are pretty fundamental and impossible to workaround (the sanitizer changes come to mind here), so it would still leave plenty of breakage. I'll try to add something to the README later today.

@nabeelio
Copy link
Author

nabeelio commented Jul 15, 2016

@gsnedders yeah, I agree. Thanks!

@gsnedders
Copy link
Member

#278 is an attempt at that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants