Skip to content

[3.12] Fix use-after-free in the unicode-escape decoder with error handler (GH-133767) #134255

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

mcepl
Copy link
Contributor

@mcepl mcepl commented May 19, 2025

(backport of #133944 from 3.13, and originally from #129648)

If the error handler is used, a new bytes object is created to set as the object attribute of UnicodeDecodeError, and that bytes object then replaces the original data. A pointer to the decoded data will became invalid after destroying that temporary bytes object. So we need other way to return the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not use the error handlers registry, but it should be changed for compatibility with _PyUnicode_DecodeUnicodeEscapeInternal().

Still I haven’t managed to fix it completely. Still failing on one test:

[  778s] ======================================================================
[  778s] FAIL: test_warning (test.test_codeop.CodeopTests.test_warning)
[  778s] ----------------------------------------------------------------------
[  778s] Traceback (most recent call last):
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/test/test_codeop.py", line 283, in test_warning
[  778s]     with warnings_helper.check_warnings(
[  778s]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/contextlib.py", line 144, in __exit__
[  778s]     next(self.gen)
[  778s]   File "/home/abuild/rpmbuild/BUILD/python312-3.12.10-build/Python-3.12.10/Lib/test/support/warnings_helper.py", line 185, in _filterwarnings
[  778s]     raise AssertionError("unhandled warning %s" % reraise[0])
[  778s] AssertionError: unhandled warning {message : SyntaxWarning("'\\e' is an invalid escape sequence. "), category : 'SyntaxWarning', filename : '<input>', lineno : 1, line : None}
[  778s] 
[  778s] ----------------------------------------------------------------------
[  778s] Ran 1 test in 0.001s

Complete build log

Any idea how to fix it?

If the error handler is used, a new bytes object is created to set as
the object attribute of UnicodeDecodeError, and that bytes object then
replaces the original data. A pointer to the decoded data will became invalid
after destroying that temporary bytes object. So we need other way to return
the first invalid escape from _PyUnicode_DecodeUnicodeEscapeInternal().

_PyBytes_DecodeEscape() does not have such issue, because it does not
use the error handlers registry, but it should be changed for compatibility
with _PyUnicode_DecodeUnicodeEscapeInternal().
@bedevere-app bedevere-app bot added topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue awaiting review labels May 19, 2025
@mcepl
Copy link
Contributor Author

mcepl commented May 29, 2025

Duplicate of #134337

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting review topic-unicode type-crash A hard crash of the interpreter, possibly with a core dump type-security A security issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants