Skip to content

Incorrect handling of start and end values in codecs error handlers #126004

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
picnixz opened this issue Oct 26, 2024 · 2 comments
Closed

Incorrect handling of start and end values in codecs error handlers #126004

picnixz opened this issue Oct 26, 2024 · 2 comments
Assignees
Labels
3.14 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-crash A hard crash of the interpreter, possibly with a core dump

Comments

@picnixz
Copy link
Member

picnixz commented Oct 26, 2024

Crash report

What happened?

./python -c "import codecs; codecs.xmlcharrefreplace_errors(UnicodeEncodeError('bad', '', 0, 1, 'reason'))"
python: ./Include/cpython/unicodeobject.h:339: PyUnicode_READ_CHAR: Assertion `index >= 0' failed.
Aborted (core dumped)
./python -c "import codecs; codecs.backslashreplace_errors(UnicodeDecodeError('utf-8', b'00000', 9, 2, 'reason'))"
Traceback (most recent call last):
  File "<string>", line 1, in <module>
SystemError: Negative size passed to PyUnicode_New
./python -c "import codecs; codecs.replace_errors(UnicodeTranslateError('000', 1, -7, 'reason'))"
python: Python/codecs.c:743: PyCodec_ReplaceErrors: Assertion `PyUnicode_KIND(res) == PyUnicode_2BYTE_KIND' failed.
Aborted (core dumped)

See #123378 for the root cause. Since we are still wondering how to fix the getters and setters, I suggest we first fix the crash by adding the checks inside at the handler's level (for now). I'm not sure if the handler itself is handling corner cases correctly as well.

Linked PRs

@picnixz picnixz added type-crash A hard crash of the interpreter, possibly with a core dump topic-C-API labels Oct 26, 2024
@picnixz picnixz self-assigned this Oct 26, 2024
@picnixz picnixz added the interpreter-core (Objects, Python, Grammar, and Parser dirs) label Oct 26, 2024
@picnixz picnixz changed the title Incorrect handling of start and end values in codecs.xmlcharrefreplace_errors handler Incorrect handling of start and end values in codecs error handlers Nov 29, 2024
@picnixz
Copy link
Member Author

picnixz commented Nov 29, 2024

FTR: I prepared three branches that fix (and cleanup) the error handlers.

Those branches include #123380 and thus I'll wait for the latter to be merged since otherwise there will be a crash even before the actual "bad" code happens.

The fixes are actually quite straightforward: there is actually no check on the consistency of start and end after they've been retrieved via *GetStart and *GetEnd, and this leads to OOB issues or other kind of issues. So, even if we fix the getters, we still need to handle the consistency of the bounds.

cc @encukou

@picnixz
Copy link
Member Author

picnixz commented Dec 6, 2024

Since #123378 is only 3.14+ (not backported), the fixes for those handlers will also be 3.14+ only.

@picnixz picnixz added the 3.14 bugs and security fixes label Dec 6, 2024
picnixz added a commit that referenced this issue Jan 23, 2025
#127675)

This fixes how `PyCodec_XMLCharRefReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
picnixz added a commit that referenced this issue Jan 23, 2025
This fixes how `PyCodec_ReplaceErrors` handles the `start` and `end` attributes
of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
picnixz added a commit that referenced this issue Jan 23, 2025
…#127676)

This fixes how `PyCodec_BackslashReplaceErrors` handles the `start` and `end`
attributes of `UnicodeError` objects via the `_PyUnicodeError_GetParams` helper.
picnixz added a commit that referenced this issue Jan 23, 2025
…7680)

We remove the safeguards that were added in `Lib/test/test_capi/test_codecs.py`
since they are now redundant (see 32e07fd
for additional context).

Indeed, the codecs handlers now correctly handle the `start` and `end` positions
of `UnicodeError` objects and thus should not crash.
@picnixz picnixz closed this as completed Jan 23, 2025
@github-project-automation github-project-automation bot moved this from In Progress to Done in Codecs and encodings issues Jan 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.14 bugs and security fixes interpreter-core (Objects, Python, Grammar, and Parser dirs) topic-C-API type-crash A hard crash of the interpreter, possibly with a core dump
Projects
Development

No branches or pull requests

1 participant