-
Notifications
You must be signed in to change notification settings - Fork 1.2k
DVC PULL doesn't work when i pushed data from one GIT Repo to S3 remote storage and try pulling from other GIT Repo #4253
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @aswin-datakalp Short answer: It should be the same repo :) Information about dvc-tracked files is stored in git-tracked dvc-files, so you not only need to |
Hi @efiop , I think you have understood wrongly. The above solutions of yours will definitely if there are two people using the same GIT Repo where person 1 add's data and makes dush push and git push and then the other person (person 2) first has to do git pull to get the latest DVC tracked GIT files and then DVC pull will work for him. My problem statement is different. In my case, i am having two different GIT repositories where both are initialized with dvc and both are being pointed to same Remote storage (AWS S3 bucket). Now i push data from repo 1 to aws storage bucket and it is successful. Now i go to the other repo which is initialised with dvc init and then added remote storage by dvc remote add -d "name bucket_url" and adding is also successful. This confirms that both shares the same Remote Storage. When i do DVC pull from the repo 2 ( data was added and push to remote from repo 1 and im pull that data from repo 2), data pulling is not happening. Can you help me how to do this kind of explicit pulling where you push to AWS from one repo and then pull completely from a different repo which also has same remote storage cofigured and nothing else !! I hope now you understood the problem happening ! |
@aswin-datakalp That won't work. Dvc remote is just a content based storage that doesn't know names of the files it stores. Names are stored in your git repo (e.g. in What you probably want is |
@efiop https://dvc.org/doc/use-cases/sharing-data-and-model-files gives 404 error. I tried pulling repo from GitHub having .dvc file in new repo and used dvc pull but it is giving error as below. WARNING: No file hash info found for '/dhaval/dvc_test/DVC_test/nlp_outbound/normalization/config'. It won't be created. |
@dparmar61 It looks like all the data was pulled except for |
@dberenbaum So config contains some data required to run code.I do not want to keep it on GitHub. So I have moved config directory to s3 and I have transferred only config.dvc file with all other required dvc files like config and .gitignore to GitHub. I think using hash present in this .dvc file "dvc pull" will be able to fetch original data from s3. |
@dberenbaum It is working may be it was due to different dvc version while using push and pull. |
HI ,
Details**
DVC version: 1.1.7
Python version: 3.6.9
Platform: Linux-5.3.0-62-generic-x86_64-with-Ubuntu-18.04-bionic
Binary: False
Package: pip
Supported remotes: http, https, s3
problem***
I have two GIT REPOS, for example namely "Repo_1" and "Repo_2".
Lets say i initialize DVC in Repo_1 and add few data files and also add remote s3 storage and push data there through dvc push command. Able to see the data in AWS S3 bucket.
Now i come to "REPO_2" and initialize dvc by "dvc init" and also i add the same s3 bucket as remote storage and try to pull the data by "dvc pull". But data pulling that is not happening !!
Can you explain me why that way of linking two GIT REPOS where you can push data from one repo to aws and pull from the other repo which has same aws remote storage is not possible ??
Thanks
The text was updated successfully, but these errors were encountered: