Skip to content

[3.11] gh-123270: Replaced SanitizedNames with a more surgical fix. (GH-123354) #123425

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 4, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 66 additions & 6 deletions Lib/test/test_zipfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -3653,7 +3653,11 @@ def test_extract_orig_with_implied_dirs(self, alpharep):

def test_malformed_paths(self):
"""
Path should handle malformed paths.
Path should handle malformed paths gracefully.

Paths with leading slashes are not visible.

Paths with dots are treated like regular files.
"""
data = io.BytesIO()
zf = zipfile.ZipFile(data, "w")
Expand All @@ -3662,11 +3666,67 @@ def test_malformed_paths(self):
zf.writestr("../parent.txt", b"content")
zf.filename = ''
root = zipfile.Path(zf)
assert list(map(str, root.iterdir())) == [
'one-slash.txt',
'two-slash.txt',
'parent.txt',
]
assert list(map(str, root.iterdir())) == ['../']
assert root.joinpath('..').joinpath('parent.txt').read_bytes() == b'content'

def test_unsupported_names(self):
"""
Path segments with special characters are readable.

On some platforms or file systems, characters like
``:`` and ``?`` are not allowed, but they are valid
in the zip file.
"""
data = io.BytesIO()
zf = zipfile.ZipFile(data, "w")
zf.writestr("path?", b"content")
zf.writestr("V: NMS.flac", b"fLaC...")
zf.filename = ''
root = zipfile.Path(zf)
contents = root.iterdir()
assert next(contents).name == 'path?'
assert next(contents).name == 'V: NMS.flac'
assert root.joinpath('V: NMS.flac').read_bytes() == b"fLaC..."

def test_backslash_not_separator(self):
"""
In a zip file, backslashes are not separators.
"""
data = io.BytesIO()
zf = zipfile.ZipFile(data, "w")
zf.writestr(DirtyZipInfo.for_name("foo\\bar", zf), b"content")
zf.filename = ''
root = zipfile.Path(zf)
(first,) = root.iterdir()
assert not first.is_dir()
assert first.name == 'foo\\bar'


class DirtyZipInfo(zipfile.ZipInfo):
"""
Bypass name sanitization.
"""

def __init__(self, filename, *args, **kwargs):
super().__init__(filename, *args, **kwargs)
self.filename = filename

@classmethod
def for_name(cls, name, archive):
"""
Construct the same way that ZipFile.writestr does.

TODO: extract this functionality and re-use
"""
self = cls(filename=name, date_time=time.localtime(time.time())[:6])
self.compress_type = archive.compression
self.compress_level = archive.compresslevel
if self.filename.endswith('/'): # pragma: no cover
self.external_attr = 0o40775 << 16 # drwxrwxr-x
self.external_attr |= 0x10 # MS-DOS directory flag
else:
self.external_attr = 0o600 << 16 # ?rw-------
return self


class EncodedMetadataTests(unittest.TestCase):
Expand Down
69 changes: 8 additions & 61 deletions Lib/zipfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2213,7 +2213,7 @@ def _parents(path):
def _ancestry(path):
"""
Given a path with elements separated by
posixpath.sep, generate all elements of that path
posixpath.sep, generate all elements of that path.

>>> list(_ancestry('b/d'))
['b/d', 'b']
Expand All @@ -2225,9 +2225,14 @@ def _ancestry(path):
['b']
>>> list(_ancestry(''))
[]

Multiple separators are treated like a single.

>>> list(_ancestry('//b//d///f//'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if this should be obvious, but is this doctest run as part of the test suite? If not, wouldn't that mean removing //two-slash.txt from the test_malformed_paths test means the security fix for the infinite loop doesn't have test coverage in 3.8..3.11?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. Thanks for asking. I thought about this too.

The doctests are not run as part of the test suite. Perhaps that's something worth pursuing, but not something I've yet invested time or energy on. Since I'm the primary or sole maintainer on these projects that are preferably developed upstream in the third-party repos, I'm comfortable relying on the doctests, but you're absolutely right that it's a risk. There are other tests too, like "test_complexity" that exist only in the third-party repo (due to the dependency on other third-party packages to perform the tests).

I had thought that the //two-slash.txt was still part of the unit test, but I see now, looking at the diff, that it is indeed missing. Looking at the same test in zipp, the lines are still present, so I need to assess where those lines went missing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your comment implies, the lines are still there for Python 3.12, so the lines must have been lost when porting the test to 3.11. I don't believe that's intentional nor necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch! Indeed, those lines were omitted by accident. I don't recall why; perhaps I failed to save the full file or pass -a to git or something. Regardless, adding those lines back in restores the coverage for the security issue (0478c46).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll cherry-pick that commit to the other PRs as well.

Copy link
Contributor

@obfusk obfusk Sep 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As your comment implies, the lines are still there for Python 3.12.

Yes. They are present in the PRs for main, 3.13 and 3.12, but gone in 3.11 (this PR), 3.10, 3.9, and 3.8.

['//b//d///f', '//b//d', '//b']
"""
path = path.rstrip(posixpath.sep)
while path and path != posixpath.sep:
while path.rstrip(posixpath.sep):
yield path
path, tail = posixpath.split(path)

Expand All @@ -2244,65 +2249,7 @@ def _difference(minuend, subtrahend):
return itertools.filterfalse(set(subtrahend).__contains__, minuend)


class SanitizedNames:
"""
ZipFile mix-in to ensure names are sanitized.
"""

def namelist(self):
return list(map(self._sanitize, super().namelist()))

@staticmethod
def _sanitize(name):
r"""
Ensure a relative path with posix separators and no dot names.
Modeled after
https://github.com/python/cpython/blob/bcc1be39cb1d04ad9fc0bd1b9193d3972835a57c/Lib/zipfile/__init__.py#L1799-L1813
but provides consistent cross-platform behavior.
>>> san = SanitizedNames._sanitize
>>> san('/foo/bar')
'foo/bar'
>>> san('//foo.txt')
'foo.txt'
>>> san('foo/.././bar.txt')
'foo/bar.txt'
>>> san('foo../.bar.txt')
'foo../.bar.txt'
>>> san('\\foo\\bar.txt')
'foo/bar.txt'
>>> san('D:\\foo.txt')
'D/foo.txt'
>>> san('\\\\server\\share\\file.txt')
'server/share/file.txt'
>>> san('\\\\?\\GLOBALROOT\\Volume3')
'?/GLOBALROOT/Volume3'
>>> san('\\\\.\\PhysicalDrive1\\root')
'PhysicalDrive1/root'
Retain any trailing slash.
>>> san('abc/')
'abc/'
Raises a ValueError if the result is empty.
>>> san('../..')
Traceback (most recent call last):
...
ValueError: Empty filename
"""

def allowed(part):
return part and part not in {'..', '.'}

# Remove the drive letter.
# Don't use ntpath.splitdrive, because that also strips UNC paths
bare = re.sub('^([A-Z]):', r'\1', name, flags=re.IGNORECASE)
clean = bare.replace('\\', '/')
parts = clean.split('/')
joined = '/'.join(filter(allowed, parts))
if not joined:
raise ValueError("Empty filename")
return joined + '/' * name.endswith('/')


class CompleteDirs(SanitizedNames, ZipFile):
class CompleteDirs(ZipFile):
"""
A ZipFile subclass that ensures that implied directories
are always included in the namelist.
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Applied a more surgical fix for malformed payloads in :class:`zipfile.Path`
causing infinite loops (gh-122905) without breaking contents using
legitimate characters.
Loading