Skip to content

gh-121999: Change default tarfile filter to 'data' #122002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jul 26, 2024
Merged
8 changes: 5 additions & 3 deletions Doc/library/shutil.rst
Original file line number Diff line number Diff line change
Expand Up @@ -706,11 +706,9 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.

The keyword-only *filter* argument is passed to the underlying unpacking
function. For zip files, *filter* is not accepted.
For tar files, it is recommended to set it to ``'data'``,
For tar files, it is recommended to use the default, ``'data'``,
unless using features specific to tar and UNIX-like filesystems.
(See :ref:`tarfile-extraction-filter` for details.)
The ``'data'`` filter will become the default for tar files
in Python 3.14.

.. audit-event:: shutil.unpack_archive filename,extract_dir,format shutil.unpack_archive

Expand All @@ -721,6 +719,10 @@ provided. They rely on the :mod:`zipfile` and :mod:`tarfile` modules.
the *extract_dir* argument, e.g. members that have absolute filenames
starting with "/" or filenames with two dots "..".

The default filter is set to ``filter='data'`` to prevent the most
dangerous security issues. Read the :ref:`tarfile-extraction-filter`
section for details.

.. versionchanged:: 3.7
Accepts a :term:`path-like object` for *filename* and *extract_dir*.

Expand Down
35 changes: 15 additions & 20 deletions Doc/library/tarfile.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,12 @@ Some facts and figures:
Archives are extracted using a :ref:`filter <tarfile-extraction-filter>`,
which makes it possible to either limit surprising/dangerous features,
or to acknowledge that they are expected and the archive is fully trusted.
By default, archives are fully trusted, but this default is deprecated
and slated to change in Python 3.14.

.. versionchanged:: 3.14
Set the default extraction filter to :func:`data <data_filter>`,
which disallows dangerous features such as links to absolute paths
or paths outside of the destination. Previously, the filter strategy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about this one but the outside of now feels weird to me :')

@AA-Turner As a native speaker (you're the only one I know...), should it be "outside the destination", or "outside of"? (or something else entirely?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think "outside of" is conventional in North America, less conventional in the UK.
https://learningenglish.voanews.com/a/should-we-think-outside-or-outside-of-the-box-/6434530.html

I think I was reading from PEP 706 when I wrote that update.

Refuse to extract links (hard or soft) which end up linking to a path outside of the destination. (On systems that don’t support links, tarfile will, in most cases, fall back to creating regular files. This proposal doesn’t change that behaviour.)

was :func:`fully_trusted <fully_trusted_filter>`.

.. function:: open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs)

Expand Down Expand Up @@ -495,8 +498,8 @@ be finalized; only the internally used file object will be closed. See the
The *filter* argument specifies how ``members`` are modified or rejected
before extraction.
See :ref:`tarfile-extraction-filter` for details.
It is recommended to set this explicitly depending on which *tar* features
you need to support.
It is recommended to set this explicitly only if unusual *tar* features
are required.

.. warning::

Expand All @@ -505,8 +508,9 @@ be finalized; only the internally used file object will be closed. See the
that have absolute filenames starting with ``"/"`` or filenames with two
dots ``".."``.

Set ``filter='data'`` to prevent the most dangerous security issues,
and read the :ref:`tarfile-extraction-filter` section for details.
The default filter is set to ``filter='data'`` to prevent the most
dangerous security issues. Read the :ref:`tarfile-extraction-filter`
section for details.

.. versionchanged:: 3.5
Added the *numeric_owner* parameter.
Expand Down Expand Up @@ -538,8 +542,9 @@ be finalized; only the internally used file object will be closed. See the

See the warning for :meth:`extractall`.

Set ``filter='data'`` to prevent the most dangerous security issues,
and read the :ref:`tarfile-extraction-filter` section for details.
The default filter is set to ``filter='data'`` to prevent the most
dangerous security issues. Read the :ref:`tarfile-extraction-filter`
section for details.

.. versionchanged:: 3.2
Added the *set_attrs* parameter.
Expand Down Expand Up @@ -603,12 +608,7 @@ be finalized; only the internally used file object will be closed. See the
argument to :meth:`~TarFile.extract`.

If ``extraction_filter`` is ``None`` (the default),
calling an extraction method without a *filter* argument will raise a
``DeprecationWarning``,
and fall back to the :func:`fully_trusted <fully_trusted_filter>` filter,
whose dangerous behavior matches previous versions of Python.

In Python 3.14+, leaving ``extraction_filter=None`` will cause
calling an extraction method without a *filter* argument will cause
extraction methods to use the :func:`data <data_filter>` filter by default.

The attribute may be set on instances or overridden in subclasses.
Expand Down Expand Up @@ -992,12 +992,7 @@ can be:

* ``None`` (default): Use :attr:`TarFile.extraction_filter`.

If that is also ``None`` (the default), raise a ``DeprecationWarning``,
and fall back to the ``'fully_trusted'`` filter, whose dangerous behavior
matches previous versions of Python.

In Python 3.14, the ``'data'`` filter will become the default instead.
It's possible to switch earlier; see :attr:`TarFile.extraction_filter`.
If that is also ``None`` (the default), the ``'data'`` filter will be used.

* A callable which will be called for each extracted member with a
:ref:`TarInfo <tarinfo-objects>` describing the member and the destination
Expand Down
8 changes: 1 addition & 7 deletions Lib/tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -2248,13 +2248,7 @@ def _get_filter_function(self, filter):
if filter is None:
filter = self.extraction_filter
if filter is None:
import warnings
warnings.warn(
'Python 3.14 will, by default, filter extracted tar '
+ 'archives and reject files or modify their metadata. '
+ 'Use the filter argument to control this behavior.',
DeprecationWarning, stacklevel=3)
return fully_trusted_filter
return data_filter
if isinstance(filter, str):
raise TypeError(
'String names are not supported for '
Expand Down
3 changes: 0 additions & 3 deletions Lib/test/test_shutil.py
Original file line number Diff line number Diff line change
Expand Up @@ -2145,9 +2145,6 @@ def check_unpack_archive_with_converter(self, format, converter, **kwargs):
def check_unpack_tarball(self, format):
self.check_unpack_archive(format, filter='fully_trusted')
self.check_unpack_archive(format, filter='data')
with warnings_helper.check_warnings(
('Python 3.14', DeprecationWarning)):
self.check_unpack_archive(format)

def test_unpack_archive_tar(self):
self.check_unpack_tarball('tar')
Expand Down
64 changes: 23 additions & 41 deletions Lib/test/test_tarfile.py
Original file line number Diff line number Diff line change
Expand Up @@ -694,10 +694,11 @@ def test_extract_hardlink(self):
def test_extractall(self):
# Test if extractall() correctly restores directory permissions
# and times (see issue1735).
tar = tarfile.open(tarname, encoding="iso8859-1")
DIR = os.path.join(TEMPDIR, "extractall")
os.mkdir(DIR)
try:
DIR = pathlib.Path(TEMPDIR) / "extractall"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar
):
directories = [t for t in tar if t.isdir()]
tar.extractall(DIR, directories, filter='fully_trusted')
for tarinfo in directories:
Expand All @@ -718,9 +719,24 @@ def format_mtime(mtime):
format_mtime(file_mtime),
path)
self.assertEqual(tarinfo.mtime, file_mtime, errmsg)
finally:
tar.close()
os_helper.rmtree(DIR)

@staticmethod
def test_extractall_default_filter():
# Test that the default filter is now "data", and the other filter types are not used.
DIR = pathlib.Path(TEMPDIR) / "extractall_default_filter"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar,
unittest.mock.patch("tarfile.data_filter", wraps=tarfile.data_filter) as mock_data_filter,
unittest.mock.patch("tarfile.tar_filter", wraps=tarfile.tar_filter) as mock_tar_filter,
unittest.mock.patch("tarfile.fully_trusted_filter", wraps=tarfile.fully_trusted_filter) as mock_ft_filter
):
directories = [t for t in tar if t.isdir()]
tar.extractall(DIR, directories)

mock_data_filter.assert_called()
mock_ft_filter.assert_not_called()
mock_tar_filter.assert_not_called()

@os_helper.skip_unless_working_chmod
def test_extract_directory(self):
Expand All @@ -738,31 +754,6 @@ def test_extract_directory(self):
finally:
os_helper.rmtree(DIR)

def test_deprecation_if_no_filter_passed_to_extractall(self):
DIR = pathlib.Path(TEMPDIR) / "extractall"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar
):
directories = [t for t in tar if t.isdir()]
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
tar.extractall(DIR, directories)
# check that the stacklevel of the deprecation warning is correct:
self.assertEqual(cm.filename, __file__)

def test_deprecation_if_no_filter_passed_to_extract(self):
dirtype = "ustar/dirtype"
DIR = pathlib.Path(TEMPDIR) / "extractall"
with (
os_helper.temp_dir(DIR),
tarfile.open(tarname, encoding="iso8859-1") as tar
):
tarinfo = tar.getmember(dirtype)
with self.assertWarnsRegex(DeprecationWarning, "Use the filter argument") as cm:
tar.extract(tarinfo, path=DIR)
# check that the stacklevel of the deprecation warning is correct:
self.assertEqual(cm.filename, __file__)

def test_extractall_pathlike_dir(self):
DIR = os.path.join(TEMPDIR, "extractall")
with os_helper.temp_dir(DIR), \
Expand Down Expand Up @@ -4011,15 +4002,6 @@ def test_data_filter(self):
self.assertIs(filtered.name, tarinfo.name)
self.assertIs(filtered.type, tarinfo.type)

def test_default_filter_warns(self):
"""Ensure the default filter warns"""
with ArchiveMaker() as arc:
arc.add('foo')
with warnings_helper.check_warnings(
('Python 3.14', DeprecationWarning)):
with self.check_context(arc.open(), None):
self.expect_file('foo')

def test_change_default_filter_on_instance(self):
tar = tarfile.TarFile(tarname, 'r')
def strict_filter(tarinfo, path):
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
The default extraction filter for the :mod:`tarfile` module is now
set to :func:`'data' <tarfile.data_filter>`.
Loading