Skip to content

push: cannot push to ssh cache #5358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pedroasad opened this issue Jan 28, 2021 · 2 comments
Closed

push: cannot push to ssh cache #5358

pedroasad opened this issue Jan 28, 2021 · 2 comments
Labels
awaiting response we are waiting for your reply, please respond! :)

Comments

@pedroasad
Copy link

pedroasad commented Jan 28, 2021

Bug Report

Description

When I try to push my local cache to a SSH remote cache, I get a permission error with a message similar to

ERROR: failed to upload .dvc/cache/2d/e4073152d5f0eb75b2ce8f1d7d5543 to ssh://user@example-ip/cache-directory/2d/e4073152d5f0eb75b2ce8f1d7d5543

In this example (see steps to reproduce below), a SSH remote named ssh-cache, with URL equal to ssh://user@example-ip/cache-directory was created and set as SSH cache. I believe this is caused by paraminko seeing the remote paths as absolute filesystem paths, instead of relative to the remote user's home directory (see suggestions section below),

Reproduce

git init && dvc init
dvc remote add ssh-cache ssh://user@example-ip/cache-directory
dvc config cache.ssh ssh-cache
dvc run -n echo_message -o output.txt "echo 'Hello SSH remote' > output.txt"
dvc push -r ssh-cache

Traceback

Traceback (most recent call last):
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 117, in makedirs
    self.sftp.mkdir(path)
  File "/home/pedro/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/paramiko/sftp_client.py", line 460, in mkdir
    self._request(CMD_MKDIR, path, attr)
  File "/home/pedro/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/paramiko/sftp_client.py", line 813, in _request
    return self._read_response(num)
  File "/home/pedro/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/paramiko/sftp_client.py", line 865, in _read_response
    self._convert_status(msg)
  File "/home/pedro/.pyenv/versions/anaconda3-2020.07/lib/python3.8/site-packages/paramiko/sftp_client.py", line 896, in _convert_status
    raise IOError(errno.EACCES, text)
PermissionError: [Errno 13] Permission denied

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/remote/base.py", line 35, in wrapper
    func(from_info, to_info, *args, **kwargs)
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/base.py", line 377, in upload
    self._upload(  # noqa, pylint: disable=no-member
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/ssh/__init__.py", line 267, in _upload
    ssh.upload(
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 218, in upload
    self.makedirs(posixpath.dirname(dest))
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 113, in makedirs
    self.makedirs(head)
  File "/home/pedro/.pyenv/versions/3.8.5/lib/python3.8/site-packages/dvc/tree/ssh/connection.py", line 122, in makedirs
    raise DvcException(
dvc.exceptions.DvcException: unable to create remote directory '/cache-directory'

Suggestions

I believe the problematic code lies in dvc/tree/ssh/connection.py:98. Apparently, when the SSHConnection.makedirs(path) method attempts to create remote directories (which it does in recursive fashion, creating parent directories first), it receives a remote path equal to /cache-directory/2d/e4073152d5f0eb75b2ce8f1d7d5543, and this path seems to be interpreted as an absolute filesystem path by paraminko, because of the leading slash. I verified, while debugging the code, that if the leading slash is removed from the path, the successive attempts to create cache-directory, cache-directory/2d, and cache-directory/2d/e4073152d5f0eb75b2ce8f1d7d5543 will now succeed.

Environment information

Output of dvc version:

$ dvc version
DVC version: 1.11.11 (pip)
---------------------------------
Platform: Python 3.8.5 on Linux-5.4.0-65-generic-x86_64-with-glibc2.29
Supports: azure, gdrive, gs, hdfs, http, https, s3, ssh, oss, webdav, webdavs
Cache types: hardlink, symlink
Cache directory: ext4 on /dev/sda5
Caches: local, ssh
Remotes: s3, ssh
Workspace directory: ext4 on /dev/sda5
Repo: dvc (subdir), git

Note: the bug persists in version 1.11.13.

Additional information

A similar error occurs if I try to use external outputs, that is, by modifying the example to

git init && dvc init
dvc remote add ssh-cache ssh://user@example-ip/cache-directory
dvc config cache.ssh ssh-cache
dvc run -n echo_message --external -o ssh://user@example-ip/destination/directory/output.txt "echo 'Hello SSH remote' > output.txt && scp output.txt user@example-ip:destination/directory"

In this case, I tried both using -o ssh://user@example-ip/destination/directory/output.txt and '-o remote://ssh-dest/destination/directory/output.txt` by first setting up a second named remote with

dvc remote add ssh-dest ssh://user@example-ip/destination/directory

However, in either case, I get an error like

ERROR: output 'remote://ssh-dest/hello.txt' does not exist 

Notes:

  • I'm aware of remote: should DVC prevent external cache overlap default remote? #3703, which is why you can see the SSH URLs for the cache and destination are slightly different.
  • I'm not sure whether this should be considered as a separate bug report, and I hope it does not constitute overwhelming information. If the developers think so, I'll be happy to open a separate issue report.
@pmrowla
Copy link
Contributor

pmrowla commented Jan 29, 2021

This is a known issue and there is an existing feature request for handling it: #4167

Are you able to use the absolute path in your SSH URL as a workaround?

@pmrowla pmrowla added the awaiting response we are waiting for your reply, please respond! :) label Jan 29, 2021
@pedroasad
Copy link
Author

Oh, I'm sorry I have duplicated the issue, then. I searched in the open issues but didn't notice #4167 was about the same topic. And yes, that's one workaround I'm considering. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting response we are waiting for your reply, please respond! :)
Projects
None yet
Development

No branches or pull requests

2 participants