Skip to content

pull: Fails to pull some files from azure remote #7605

Closed
@plomtb

Description

@plomtb

Bug Report

Description

Can't pull all files from azure remote. The files that fail do exist on the remote. This happens for images and text data.
dvc pull -v for one of the files:

2022-04-20 14:31:37,043 ERROR: failed to transfer 'md5: 2b764da58921d765c111b8dfddb181d2'
------------------------------------------------------------
Traceback (most recent call last):
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\data\transfer.py", line 25, in wrapper
    func(fs_path, *args, **kwargs)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\data\transfer.py", line 162, in func
    return dest.add(
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\objects\db.py", line 117, in add
    self._add_file(
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\objects\db.py", line 89, in _add_file
    return fs.utils.transfer(
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\utils.py", line 96, in transfer
    _try_links(links, from_fs, from_path, to_fs, to_path)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\utils.py", line 66, in _try_links
    return _copy(from_fs, from_path, to_fs, to_path)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\utils.py", line 47, in _copy
    return from_fs.download_file(from_path, to_path)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\base.py", line 292, in download
    return self._download_file(from_info, to_info, callback=callback)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\base.py", line 354, in _download_file
    self.get_file(from_info, tmp_file, callback=callback)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\fsspec_wrapper.py", line 251, in get_file
    total: int = self.getsize(from_info)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\base.py", line 177, in getsize
    return self.info(path).get("size")
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\dvc\fs\fsspec_wrapper.py", line 106, in info
    return self.fs.info(path)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\adlfs\spec.py", line 627, in info
    return sync(self.loop, self._info, path, refresh)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\fsspec\asyn.py", line 65, in sync
    raise return_result
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\fsspec\asyn.py", line 25, in _runner
    result[0] = await coro
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\adlfs\spec.py", line 654, in _info
    out = await self._ls(path, invalidate_cache=invalidate_cache, **kwargs)
  File "C:\Users\user\Miniconda3\envs\test\lib\site-packages\adlfs\spec.py", line 876, in _ls
    raise FileNotFoundError
FileNotFoundError
------------------------------------------------------------

I tried different versions of dvc, error message is for this one:

DVC version: 2.10.1 (conda)
---------------------------------
Platform: Python 3.9.12 on Windows-10-10.0.19042-SP0
Supports:
        azure (adlfs = 2022.4.0, knack = 0.6.3, azure-identity = 1.9.0),
        webhdfs (fsspec = 2022.3.0),
        http (aiohttp = 3.8.1, aiohttp-retry = 2.4.6),
        https (aiohttp = 3.8.1, aiohttp-retry = 2.4.6)
Cache types: hardlink
Cache directory: NTFS on C:\
Caches: local
Remotes: azure
Workspace directory: NTFS on C:\
Repo: dvc, git
Remotes: azure
Workspace directory: NTFS on F:\
Repo: dvc, git

Metadata

Metadata

Assignees

No one assigned

    Labels

    A: data-syncRelated to dvc get/fetch/import/pull/pushP: windowsRelated to the Platform: Windowsawaiting responsewe are waiting for your reply, please respond! :)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions