Skip to content
This repository was archived by the owner on Sep 11, 2023. It is now read-only.

Use fsspec and/or pathy to provide a unified interface to GCP, AWS, and local filesystems #153

Closed
JackKelly opened this issue Sep 22, 2021 · 2 comments · Fixed by #214, #281 or #283
Closed

Comments

@JackKelly
Copy link
Member

JackKelly commented Sep 22, 2021

At the moment, our code implements different functions to interact with data on GCP vs the local filesystem. This is exactly the problem that fsspec solves: fsspec provides a unified interface to cloud and local storage. If we use fsspec then we can probably throw away quite a lot of our filesystem functions :)

Pathy also looks very useful.

Before starting work on this issue, we should wait for #152 to be merged into main (UPDATE: #152 is now merged into main!)

We can probably remove the cloud field in the yaml config (in model.py) after making this change, because the code will be able to infer the compute platform from the protocol prefix on the paths (e.g. gs:// for Google Cloud).

@JackKelly JackKelly added this to the WP1 essential tasks milestone Sep 22, 2021
@JackKelly JackKelly self-assigned this Sep 22, 2021
@JackKelly JackKelly changed the title Use fsspec to provide a unified interface to GCP, AWS, and local filesystems Use fsspec and/or pathy to provide a unified interface to GCP, AWS, and local filesystems Sep 22, 2021
@JackKelly
Copy link
Member Author

I'm making a start on this in #152 (only where necessary to make it easy to write to local filesystem). I'll leave it to a later PR to complete this task.

@JackKelly
Copy link
Member Author

JackKelly commented Sep 29, 2021

A word of warning: Pathy.exists() seems buggy to me: Pathy('gs://bucket/foo/bar.txt').exists() sometimes hangs. Pathy('/local/file.txt').exists() sometimes returns the wrong answer (although maybe we're supposed to include a protocol prefix like 'file://local/file.txt'?) We should explore this more and, if this really does turn out to be a bug, we should report it to the Pathy project.

That said, using Pathy as a prettier way to join paths works really well. e.g. Pathy('gs://bucket') / 'foo' / 'bar.txt'

@peterdudfield peterdudfield mentioned this issue Oct 8, 2021
7 tasks
JackKelly added a commit that referenced this issue Oct 26, 2021
JackKelly added a commit that referenced this issue Oct 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.