-
Notifications
You must be signed in to change notification settings - Fork 1.2k
dvc push after dvc push is slow #2867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @JohanMollevik ! I suppose it was "Collecting information from remote cache" stage that was talking a lot of time, right? If it was it, then it is caused by the fact that dvc needs to check each local file with the remote to make sure that all files from that data set are present, this is needed to provide per-file granularity for directories. It is a known optimization issue and we are solving it in #2147 . If your dataset is fairly static, I would consider adding it to dvc as an archive and unpacking it with something like |
I do not think it said, the progress bar was only 80 chars wide so some data might have been hidden. |
@JohanMollevik Hm, we are in the middle of revisiting ui for pull/push/etc, so i might be missing something, but from the description it does seem to be the case. First pbar was collecting local status and the second one was collecting remote status. Please see my updated comment up above. |
@JohanMollevik I'll close this ticket in favor of #2147 . Please ping us if you have any follow up questions :) |
Please provide information about your setup
DVC version(i.e.
dvc --version
), Platform and method of installation (pip, homebrew, pkg Mac, exe (Windows), DEB(Linux), RPM(Linux))$ dvc --version
0.71.0
I have a large dataset (#2512 ) and have been trying to debug performance to evaluate if dvc will work for this type of data.
I did one dvc push against azure taking 4 hours for 132 GB data in 2.5M files 1 .dvc file. That is ok assuming there was changes. I then immediately again ran dvc push and it is taking 4 hours again.
Why does dvc not compleat much faster on the second run. There should be no changes between the remote and local cache so I was expecting it to finish after a few minutes.
The text was updated successfully, but these errors were encountered: