Closed
Description
Part of #7995
Description
Cloud versioning does not support multiple remotes.
Reproduction
You can reproduce with the script below. Replace CLOUD_REMOTE_1
and CLOUD_REMOTE_2
with your own s3 paths and configure your credentials.
BUCKET=mybucket
REMOTE_1=remote1
REMOTE_2=remote2
export AWS_PROFILE=iterative-sandbox
echo "Get repo."
rm -rf repo
git init repo
cd repo
dvc init
echo "Add two cloud-versioned DVC remotes."
dvc remote add -d cloud-1 s3://$BUCKET/$REMOTE_1
dvc remote modify cloud-1 version_aware true
dvc remote modify cloud-1 worktree true
dvc remote add cloud-2 s3://$BUCKET/$REMOTE_2
dvc remote modify cloud-2 version_aware true
dvc remote modify cloud-2 worktree true
git add .
git commit -m "initialized repo"
echo "Add data"
mkdir data
echo image1 > data/image1.png
echo image2 > data/image2.png
echo model > model.h5
dvc add data
dvc add model.h5
git add .
git commit -m "add data"
echo "Push data to default remote"
dvc push
git --no-pager diff
git commit -am "push data to default remote"
echo "Push data to other remote"
dvc push -r cloud-2
git --no-pager diff # Problem 1: overwrites all version_ids
git commit -am "push data to other remote"
echo "Push to different remotes per output"
echo " remote: cloud-2" >> model.h5.dvc
dvc push
git --no-pager diff # Problem 2: pushes model.h5 to the default remote
echo "See model.h5 versions on remote 1"
aws s3api list-object-versions --bucket $BUCKET --prefix $REMOTE_1/model.h5
echo "See model.h5 versions on remote 2"
aws s3api list-object-versions --bucket $BUCKET --prefix $REMOTE_2/model.h5
Expected
- DVC should keep track of the
remote
for eachversion_id
, so that when pushing to a different remote, it appends aversion_id
for the other remote instead of overwriting it. - When using the remote-per-output
remote:
syntax in a.dvc
file, DVC should push to that remote instead of the default.
Metadata
Metadata
Assignees
Labels
No labels