Skip to content

push: can't use user key in webhdfs remote configuration #10062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
gdiepen opened this issue Oct 31, 2023 · 0 comments · Fixed by #10075
Closed

push: can't use user key in webhdfs remote configuration #10062

gdiepen opened this issue Oct 31, 2023 · 0 comments · Fixed by #10075

Comments

@gdiepen
Copy link
Contributor

gdiepen commented Oct 31, 2023

Bug Report

Description

When using webHDFS as the remote backend, in the documentation it is stated you can provide the user key (see documentation at https://dvc.org/doc/user-guide/data-management/remote-storage/hdfs#webhdfs-configuration-parameters)

However, if you try to use provide a user key it is stated that this key is not expected.

I am trying to connect to a webhdfs server behind a proxy that requires basic auth. For this to work, I at least needed the user (but also a password as well as a data_proxy key).

I have already created a PR for the filesystem_spec for webhdfs (fsspec/filesystem_spec#1409) that adds support for basic authentication.

As soon as that is merged, I have the code already ready for dvc to include support for the basic authentication. After the filesystem_spec PR is merged, I will create a PR with the modifications for DVC

Reproduce

  1. dvc init
  2. dvc remote add foobar webhdfs://server
  3. dvc remote modify foobar user aaa

Expected

No error

Environment information

Output of dvc doctor:

DVC version: 3.27.0 (pip)
-------------------------
Platform: Python 3.10.13 on Linux-5.15.0-87-generic-x86_64-with-glibc2.35
Subprojects:
        dvc_data = 2.18.1
        dvc_objects = 1.0.1
        dvc_render = 0.6.0
        dvc_task = 0.3.0
        scmrepo = 1.4.0
Supports:
        http (aiohttp = 3.8.6, aiohttp-retry = 2.8.3),
        https (aiohttp = 3.8.6, aiohttp-retry = 2.8.3),
        s3 (s3fs = 2023.9.2, boto3 = 1.28.17),
        webhdfs (fsspec = 2023.9.2)
Config:
        Global: /home/guido/.config/dvc
        System: /home/guido/.config/kdedefaults/dvc
Cache types: <https://error.dvc.org/no-dvc-cache>
Caches: local
Remotes: webhdfs
Workspace directory: ext4 on /dev/sda2
Repo: dvc, git
Repo.site_cache_dir: /var/tmp/dvc/repo/05bc8ac7d6f64ba8a1c0383c98f12453

Additional Information (if any):

As mentioned above, I am awaiting the fsspec PR to be merged. After that, will create a small PR to enable the support for the new features in fsspec in DVC.

I have this already working in a locally patched dvc + fsspec

Related other PRs:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants