Skip to content

os.path.normpath of relative path r".\C:\x" returns absolute path r"C:\x" on Windows, similar in pathlib #100162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gpshead opened this issue Dec 10, 2022 · 7 comments
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@gpshead
Copy link
Member

gpshead commented Dec 10, 2022

os.path.normpath normalizes "./" or "../" path elements to clean the path.

As os.path.normpath doesn't consider a case where the normalized path starts with a drive letter, a relative path with a drive letter-like path element will be normalized to an absolute path. This behavior can result in a path traversal, depending on the implementation.

The minimal snippet to reproduce this behavior is the following:

> os.path.normpath("./C:/Windows/System32")

This snippet will return "C:\Windows\System32", which is an absolute path on Windows. (tested with Python 3.11.1 on Windows)

This vulnerability is similar to CVE-2022-29804 of Go.

as reported to security@ by RyotaK on 2022-12-09.


Golang solved their issue in golang/go#52476 which may be helpful to look at.

The Python Security Response Team agreed that this issue did not require an embargo.

It looks like pathlib probably also has issues in this area:

>>> p1 = pathlib.PureWindowsPath('P:\\windows')
>>> p2 = pathlib.PureWindowsPath('.\\P:\\windows')
>>> p1 == p2
False
>>> p1
PureWindowsPath('P:/windows')
>>> p2
PureWindowsPath('P:/windows')
>>> p1.is_absolute()
True
>>> p2.is_absolute()
False
>>> str(p1) == str(p2)
True
@gpshead gpshead added type-bug An unexpected behavior, bug, or error type-security A security issue OS-windows stdlib Python modules in the Lib dir labels Dec 10, 2022
@eryksun
Copy link
Contributor

eryksun commented Dec 10, 2022

I wouldn't hold up ".\C:\Windows\System32" as a shining example because it's an invalid path. Removing the leading "." component in this case doesn't break a path that would otherwise work. That said, removing the leading "." component leads to a couple of other issues that have been discussed previously.

The first is the need to use a leading "." component in order to avoid ambiguity with a single-letter filename that contains named file streams. For example, ".\C:spam" refers to a data stream named "spam" in a file or directory named "C" in the current working directory. Without the leading "." component, "C:spam" is a file or directory named "spam" relative to the current working directory of drive "C:". (Each drive has its own working directory, which defaults to the root directory.)

The other issue is that removing the leading "." component can change the meaning of a path of an executable that's passed to POSIX exec*() or WinAPI CreateProcess*W() or SearchPathW(). In a search context, the path "./spam" is limited to the current directory, while just "spam" is searched for. (WinAPI CreateProcess*W() and SearchPathW() also search for "spam\eggs", but ".\spam\eggs" is limited to the current directory.)

IIRC, neither of these issues has been seen as urgent enough to change the long-standing behavior normpath().

@eryksun
Copy link
Contributor

eryksun commented Dec 11, 2022

It's worth noting that WinAPI PathCchCanonicalizeEx() has the same behavior for the garbage in, garbage out case of r".\C:\x". Also, PathCchCombineEx() canonicalizes the result the same way. For example:

>>> import ctypes
>>> winpath = ctypes.OleDLL('api-ms-win-core-path-l1-1-0')
>>> out_path = (ctypes.c_wchar * 1000)()

>>> winpath.PathCchCanonicalizeEx(out_path, 1000, r'.\C:\spam', 0)
0
>>> out_path.value
'C:\spam'

>>> winpath.PathCchCombineEx(out_path, 1000, r'.\C:\spam', 'eggs', 0)
0
>>> out_path.value
'C:\\spam\\eggs'

On Windows, the internal functions _Py_join_relfile() and _Py_add_relfile() call PathCchCombineEx(). _Py_join_relfile() isn't used on Windows. _Py_add_relfile() is used by getpath_joinpath() in "Modules/getpath.c". This is the joinpath() function in "Modules/getpath.py". However, I don't think it's used on Windows in a way that's vulnerable. Unless I missed a case, the first item is always a qualified path such as the prefix directory or the directory of the executable or interpreter DLL.

@zooba
Copy link
Member

zooba commented Dec 12, 2022

Agreed, I don't think Python itself is vulnerable here, but our users may have written code that is exploitable by an error case becoming a legitimate case:

path = input("Provide your path:")
# path = r".\D:\secrets.txt"
assert not os.path.isabsolute(path)
path = os.path.normpath(path)
print("Reading from", path)
...

First immediate fix should be to update the docs to discourage calling normpath on relative paths if you're going to use them later (it's fine enough for display purposes). You should always join to an absolute path first. We jump through as many hoops as we can to make it work, but it's still going to be less reliable than not doing it. (And if you're really concerned about symlinks or junctions, you shouldn't normpath at all, though you're going to run into inconsistencies all over in that case anyway.)

I'm inclined to think that the best fix is for normpath to preserve one leading ./ segment on all platforms (unless we have a leading ../, or an absolute path). Right now, it explicitly skips that one segment and drops it from the result, so really we're just going to bring it back at the end of the algorithm unless we start with ../.

I can't come up with a reason to not do this on all platforms, especially given it's common code for all platforms, but maybe there's something I missed?

@zooba zooba added 3.11 only security fixes 3.10 only security fixes 3.9 only security fixes 3.8 (EOL) end of life 3.7 (EOL) end of life 3.12 only security fixes labels Dec 12, 2022
@eryksun
Copy link
Contributor

eryksun commented Dec 13, 2022

I'm inclined to think that the best fix is for normpath to preserve one leading ./ segment on all platforms

We would also have to handle cases such as "spam/../C:eggs", which is a valid path for a file named "C" containing a data stream named "eggs" . This example should be normalized as r".\C:eggs".

I suggest the following change to the final step of the pure Python implementation:

    # Prepend curdir either for an empty path or to avoid changing the
    # meaning of a path (e.g. avoid "spam/../C:eggs" -> "C:eggs").
    if not prefix and (not comps or comps[0][1:2] == colon):
        comps.insert(0, curdir)
    return prefix + sep.join(comps)

This is a minimal change, which doesn't always preserve an initial "." component in the path argument. We could easily implement the latter as well by checking the initial value of comps[0] before processing the path components.

@zooba
Copy link
Member

zooba commented Dec 13, 2022

Good point. Perhaps this check may also work?

new_path = old_normpath(path)
try:
    i = new_path.index(':')
    if i < path.index(':'):
        # colon moved closer to the start
        try:
            new_path.index('\\', 0, i) # find slash before colon
        except ValueError:
            new_path = ".\\" + new_path # colon in first segment
except ValueError:
    pass # no colon

Should be easy to tack onto the end of either approach, and will avoid changing anything if the path didn't change. There shouldn't be any scenarios where the first colon moves later in the path, and anywhere it stays at the same location we can ignore.

@eryksun
Copy link
Contributor

eryksun commented Dec 13, 2022

Consider the valid path "foo/../spam/c:eggs", which normalizes as r'spam\c:eggs', for which the colon moves closer to the start of the path. This result is correct.

Consider the invalid path "foo/../c:/bar", which normalizes as r'c:\bar', for which the colon moves closer to the start of the path. This is a problem because an invalid path normalizes as a valid path. It's a case of garbage in, garbage out, but we should try to return a result that's still an invalid path, such as r'.\c:\bar'.


Primarily, there's a serious issue if the initial path has no drive or root (i.e. prefix is empty), and the first component after normalization has a colon as its second character. If the component has just two characters (e.g. "C:"), then the initial path is invalid, and the result should be invalid. If the component has more than two characters (e.g. "C:spam"), then the initial path component is a file stream, which should not be returned as a relative drive path. Prepending a "." component fixes both cases.

Secondarily, there's a problem if the path intentionally begins with a "." component that gets normalized away. There's no difference if the path is opened, but it changes the intent of the path in a search context, such as exec*(), CreateProcess*W(), or SearchPathW(). This is especially the case on Windows, for which any relative path that doesn't begin with an explicit reference to the current working directory (i.e. a "." or ".." component) is searched for, even if the path contains slashes. For example, with a search path that consists of just "C:\Temp":

>>> import os
>>> from win32api import SearchPath
>>> open(r'C:\Temp\test\spam.exe', 'w').close()
>>> os.getcwd()
'C:\\'

>>> SearchPath(r'C:\Temp', r'.\test\spam', '.exe')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
pywintypes.error: (2, 'SearchPath', 'The system cannot find the file specified.')

>>> SearchPath(r'C:\Temp', r'test\spam', '.exe')
('C:\\Temp\\test\\spam.exe', 13)

@zooba
Copy link
Member

zooba commented Dec 13, 2022

Thanks. So this leads us back to:

  • we should preserve a leading .\ if it's there explicitly
    ** (maybe remove it if it gets replaced by ..\, but nothing else)
    ** could also do this on POSIX (for ./, of course)
  • if the initial path has no root, and the final path starts with a single letter and a colon, we should insert a .\

Both are potentially breaking changes, unfortunately. The second part is probably justifiable as a security fix, but I'm less sure about the first change.

I don't like framing the fix in terms of the Python implementation, because it doesn't look anything like the native implementation anymore. Let's agree on behaviour in terms of the input value, and then we can implement that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.7 (EOL) end of life 3.8 (EOL) end of life 3.9 only security fixes 3.10 only security fixes 3.11 only security fixes 3.12 only security fixes OS-windows stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests

3 participants