Skip to content

Add Hugging Face filesystem support to fsspec #1997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
May 16, 2025

Conversation

lhoestq
Copy link
Contributor

@lhoestq lhoestq commented May 13, 2025

Rationale for this change

Add support for the Hugging Face filesystem in fsspec, which uses hf:// paths.
This allows to import HF datasets.

Authentication is done using the "hf.token" property.

Are these changes tested?

I tried locally but haven't added tests in test_fsspec.py (lmk if it's a requirement)

Are there any user-facing changes?

No changes, it simply adds support for hf:// URLs

@Fokko
Copy link
Contributor

Fokko commented May 14, 2025

Hey @lhoestq Thanks for raising this PR. I think this is super interesting!

I think the PR needs a couple more things:

@lhoestq
Copy link
Contributor Author

lhoestq commented May 14, 2025

I updated pyproject.toml and added some docs :)

PS: I also added the "hf" extra in pyproject.toml, lmk if this is fine

Copy link
Contributor

@kevinjqliu kevinjqliu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks for adding this.

@@ -95,6 +95,7 @@ Iceberg works with the concept of a FileIO which is a pluggable module for readi
- **hdfs**: `PyArrowFileIO`
- **abfs**, **abfss**: `FsspecFileIO`
- **oss**: `PyArrowFileIO`
- **hf**: `FsspecFileIO`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a way to allow PyArrowFileIO as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no HF filesystem implementation in arrow C++ yet unfortunately ! But hopefully soon

@kevinjqliu
Copy link
Contributor

#24 34.52 E: Failed to fetch http://deb.debian.org/debian-security/pool/updates/main/o/openjdk-11/openjdk-11-jdk-headless_11.0.26%2b4-1%7edeb11u1_amd64.deb Error reading from server - read (104: Connection reset by peer) [IP: 146.75.30.132 80]

just retriggered CI

@Fokko
Copy link
Contributor

Fokko commented May 16, 2025

Also retriggered the CI 😄

@lhoestq
Copy link
Contributor Author

lhoestq commented May 16, 2025

all green ! thanks

@kevinjqliu kevinjqliu changed the title Enable Hugging Face filesystem Add Hugging Face filesystem support to fsspec May 16, 2025
@kevinjqliu kevinjqliu merged commit 55b75ca into apache:main May 16, 2025
11 checks passed
@kevinjqliu
Copy link
Contributor

Thanks @lhoestq for the contribution and @Fokko for the review :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants