-
Notifications
You must be signed in to change notification settings - Fork 1.2k
gs: support directories as external dependencies/outputs #2814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @willypicard ! #2678 is about a specific bug we have in directory support for s3. The issue you are reporting is related to #1654 , as we don't currently support gs directories as external outputs or dependencies. Maybe you could elaborate on what your scenario is, so we could better understand if support for gs dirs is what you really need? 🙂 |
I am using kubeflow to preprocess datasets and I would like to use dvc as a tool to manage my datasets and models. I have large datasets stored on GCS and i would like them to be versioned and on GCS. So it would be convenient to provide the directory containing the dataset instead of each file in it (some of my datasets contains hundreds of thousands of files). |
How big are those datasets? Just checking if you are also aware of a possibility to mount that bucket through s3fuse and work with it as with any local files. 🙂 |
I have a dataset that is 250GB large. So rather large... |
And obviously it is also possible to use |
@willypicard Got it. Makes sense, let's implement it. 🤝 Unfortunately, we don't have enough space in the current sprint, so if you would be willing to give it a shot, we'll try to help with everything we can. It is really not complex, as we already have all the generalized logic in place, so the only things that one would need to implement are: Make One could look at s3.py from https://github.com/iterative/dvc/pull/2619/files as an example. 🙂 Let us know what you think. Thanks for the feedback! |
I have a similar issue to #2678 but for GS.
I have a bucket with the following structure
I have then created a clean project
The output is as follows:
Adding a single file works (
dvc add gs://my_bucket/data/img1.png
).A more verbose version:
dvc --version = 0.68.1
. I am using ubuntu, I installed using conda, python 3.7.5.The text was updated successfully, but these errors were encountered: