Skip to content

The prefix file:// results in ignoring the first path component. #108

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
pedro-r-marques opened this issue Jun 21, 2023 · 7 comments
Closed
Labels
bug 🐛 Something isn't working

Comments

@pedro-r-marques
Copy link

I'm seeing this issue in both macos and linux

PATHS = ['/var/tmp', 'file:/var/tmp', 'file://var/tmp', 'file:///var/tmp']
{p:list(upath.UPath(p).iterdir()) for p in PATHS}

The variant "file://var/tmp" gives me the contents of "/tmp". The other variants work as expected.

@ap--
Copy link
Collaborator

ap-- commented Jun 21, 2023

Hi @pedro-r-marques

thank you for opening an issue!

I do have a pretty good idea where the error is occurring here, but I'd love to have some more information from you beforehand.

  1. Could you tell me what the first thing you tried was, before you were investigating the behavior of the different amount of slashes after file:, i.e. how did you generate your URIs?
  2. Could you have a short read of this: https://en.wikipedia.org/wiki/File_URI_scheme#How_many_slashes? and let me know if it is clear to you why the behavior is different
  3. Would a warning have helped to address this?

Basically file://var/tmp is interpreted as host=var and path=/tmp and we currently seem to ignore the host for local paths. The question is, how we should address this in the best way. Very likely with both: better documentation AND actually using the host and erroring with a clear error msg.

Cheers,
Andreas 😃

@pedro-r-marques
Copy link
Author

I initially tried "file://" expecting it to follow other url-like parameters. If the implementation interprets the first string as hostname it should make sure that the value is usable and throw an error otherwise. i.e. if hostname != 'localhost' and hostname != '' then it should imho raise an ArgumentError.

@ap-- ap-- added the bug 🐛 Something isn't working label Aug 2, 2023
@ap--
Copy link
Collaborator

ap-- commented Aug 28, 2023

Notes:

We should provide a correct error message for now, and in case pathlib or we decide to support parsing relative path uri's we can switch, see:

moved to new issue #142

@barneygale
Copy link

file://foo/bar does not represent a relative path on Windows - it represents a UNC path \\foo\bar.

In my CPython PR I made it work the same on POSIX, and so:

>>> pathlib.Path.from_uri('file://var/tmp')
PosixPath('//var/tmp')

Note that paths with two leading slashes are allowed by POSIX:

A pathname consisting of a single slash shall resolve to the root directory of the process. A null pathname shall not be successfully resolved. A pathname that begins with two successive slashes may be interpreted in an implementation-defined manner, although more than two leading slashes shall be treated as a single slash.

@ap--
Copy link
Collaborator

ap-- commented Sep 3, 2023

Thanks for the clarification @barneygale!

So does that mean file://localhost/absolute/path/file.txt would actually not map to /absolute/path/file.txt on posix systems, but to the path //localhost/absolute/path/file.txt?

how does that fit together with rfc8089? Are "file" ":" "//" "auth-part" -style URIs with a host not compatible with posix?

@barneygale
Copy link

I've never come across file URIs that use file://localhost/ in the wild, but they should probably be handled specially. I'll adjust the PR.

The RFC also says:

   The "host" is the fully qualified domain name of the system on which
   the file is accessible.  This allows a client on another system to
   know that it cannot access the file system, or perhaps that it needs
   to use some other local mechanism to access the file.

... which works fine on Windows, but not on other systems :(

@ap--
Copy link
Collaborator

ap-- commented Feb 15, 2024

Fixed in universal_pathlib==0.2.0. Behavior for "file://path" uris is now identical to fsspec.

>>> import os
>>> os.getcwd()
'/home/user'
>>> import upath
>>> PATHS = ['/tmp/test', 'file:/tmp/test', 'file://tmp/test', 'file:///tmp/test']
>>> {p: upath.UPath(p) for p in PATHS}
{
  '/tmp/buh': PosixUPath('/tmp/buh'),
  'file:/tmp/buh': FilePath('file:///tmp/buh'),
  'file://tmp/buh': FilePath('file:///home/user/tmp/buh'),
  'file:///tmp/buh': FilePath('file:///tmp/buh'),
}

@ap-- ap-- closed this as completed Feb 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants