-
Notifications
You must be signed in to change notification settings - Fork 291
Add Hugging Face filesystem support to fsspec #1997
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Hey @lhoestq Thanks for raising this PR. I think this is super interesting! I think the PR needs a couple more things:
|
I updated pyproject.toml and added some docs :) PS: I also added the "hf" extra in pyproject.toml, lmk if this is fine |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Thanks for adding this.
@@ -95,6 +95,7 @@ Iceberg works with the concept of a FileIO which is a pluggable module for readi | |||
- **hdfs**: `PyArrowFileIO` | |||
- **abfs**, **abfss**: `FsspecFileIO` | |||
- **oss**: `PyArrowFileIO` | |||
- **hf**: `FsspecFileIO` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to allow PyArrowFileIO
as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no HF filesystem implementation in arrow C++ yet unfortunately ! But hopefully soon
just retriggered CI |
Also retriggered the CI 😄 |
all green ! thanks |
Rationale for this change
Add support for the Hugging Face filesystem in
fsspec
, which useshf://
paths.This allows to import HF datasets.
Authentication is done using the
"hf.token"
property.Are these changes tested?
I tried locally but haven't added tests in test_fsspec.py (lmk if it's a requirement)
Are there any user-facing changes?
No changes, it simply adds support for
hf://
URLs