-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Add read_pdf to IO Tools #4556
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
lightning fast (and slightly superficial) review of the above tools:
|
I should add that that list is not comprehensive. Especially the list of projects to convert PDFs to HTML (or text or CSV). |
As in several recent FRs, I'll reiterate that pandas accepts data. we do not need to swallow We want pandas to make it easy to work with any reasonable data source, that does not Where we can provide significantly better workflow or solve a major pain point it's fine to add on Please, no more of this "library X can read Y, you should have X as a dep and add read_Y to the API", just "use X". |
related #3281
Create a read_pdf method in IO tools for reading tables from PDF documents. Many data sets are released in PDF form.
For example:
There are a number of standalone tools, projects for this:
There are also a number of site / projects to convert PDF to HTML:
The text was updated successfully, but these errors were encountered: