-
Notifications
You must be signed in to change notification settings - Fork 1.2k
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Change remote globally in Git history #2960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@dmpetrov This is mostly about properly maintaining your project. E.g. using tags or branches for released versions, which can be easilly updated to point to the new bucket. |
E.g. say some project died and closed its remote, but someone has the cache somwhere. In that case, you fork it and put your own remote and rely on it. Or, you could specify |
@efiop could you please elaborate? How should the team properly maintain the project to minimize this amount of work? Use case: A team has a long-running project (year+), hundreds of commits, dozens of releases (with tags). All Git history is important. Some clients might still use old ML models, all changes in data sources have to be fully tracked from the inception. Not released commits (without tags) might be also important (we decided to release an old but a promising experiment). Suddenly the team decided to change a cloud provider. It looks like a tremendous amount of work has to be done to make it happen. It would be great if DVC can make it in a few commands. |
Oops, sorry for the delay.
Well, giving that to your customers was a bad idea from beginning 🙂 So say you have some tag like v1 that was using s3. Now you are migrating to gs, so what you do is you go to v1, create a branch from it, adjust the remote to point to the new gs location, commit, move v1 to this new commit. Then you'll have to make your users update. But if you, as a maintainer, would take it more seriously earlier, you would create some kind of proxy remote. E.g. http remote (like we do in dvc core project) that you will be able to trivially switch from s3 to gs without needing to adjust anything in the projects themselves. |
@efiop these are good workarounds. I urge everyone to think about a holistic solution instead. Ideally, we need the same solution as Git has - I can easily move a repo with the entire history from, for example, GitHub to GitLab by a couple of commands just by changing remote, pushing and removing the old one. Is there a way to implement something similar in DVC? |
If you don't care about modifying your Git history, |
@MrOutis sure, you can use the tools. But it is not clear how to change the links in Git history - you will still have Modifying the commit - yes, it is a possibility. Is there any better way to define and change data remotes? (most likely it should not be committed to config) |
👍 on my end, it feels that there should be a better solution to this. |
Sure:
Maybe I don't quite understand how you propose to "modify git history". The workarounds I've provided initially update the branches/tags that are used or propose to use some type of proxy to route your |
How about we always rely on the latest commit in a branch to determine the actual remote? No matter what is committed in the history. |
@shcheklein sounds fragile and non-obvious. Plus it again won't work until the user |
@efiop The code you provided looks like another workaround, not a holistic solution for changing remotes "globally". The problem - if a user checks out an old revision (clones or imports @shcheklein thank you! It is definitely a global solution that might work. There are some issues with this approach (thanks @efiop to pointing to this) but at least we have something to consider or/and improve. |
It looks like you simply want to rewrite the history for config file. EDIT. Another option is completely separating config from git history, which we already support in the form of global/user/local configs. |
@efiop can you please elaborate or point to the details of proxy-remote implementation? Also to summarize possible solutions I see in the thread as I am facing the same inconvenience (I want to
|
Hi @dimitry12, for this option, would |
@dberenbaum yes local config will also work, as long as you manage it properly, i.e. update it over all your working copies. |
Thanks, @Suor! I'm wondering if this issue can be closed then since it seems that the introduction of the local config makes changing remotes globally possible and on par with Git. |
Local config works for me. |
@dberenbaum my 2cs on this: local config can be a good temporary solution, but it breaks a bit the point of repos being self descriptive/self contained. Why it can be important? For two reasons (and may be I'm missing something else):
|
Another argument for why local config is insufficient: for data registry repos where the data is being fetched from outside the project via |
This issue was moved to a discussion.
You can continue the conversation there. Go to discussion →
Uh oh!
There was an error while loading. Please reload this page.
Changing buckets and cloud easily for a single project is a very compelling feature.
But in fact, when I need to transfer a data-remote from one bucket to another (or another cloud) I can do that properly only on HEAD of the repo. All the old commit will have old remote (in
.dvc/config
). As a result, when I checkout back in Git history an old remote will be used.So, I need to keep old data remote (bucket) or I'll have troubles using my old Git commits.
Is it possible to make remote settings "global"? A single remote change should change it everywhere in the Git history. How Git does that and can it work for DVC? Are there other options?
All ideas are welcome!
The text was updated successfully, but these errors were encountered: