-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
bpo-40120: Fix unbounded struct char[] undefined behavior. #19232
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It is undefined behaviour if index is beyond array size. The workaround for this is standard C99 feature known as "Flexible array member". https://en.wikipedia.org/wiki/Flexible_array_member
(yes a BPO issue is needed, i'm starting with the PR to see how the CI bots fare) |
It looks like we thankfully got rid of any use of sizeof(PyBytesObject) a long time ago (which never really made much sense anyways) as part of https://bugs.python.org/issue4445 so I believe this should have no negative consequences. |
There are other instances of this pattern in our codebase. Objects/unicodeobject.c has encoding_map who's level32 member appears to use this? Grepping I don't immediately see others. This one was found by a coworker working to enable clang analyzer. |
Well, there was no other choice in ISO C89 than using Is You can use the new buildbot label to test you change on more platforms. |
Moving discussion to the bug. (TL;DR I will do the buildbot tests and add this feature to PEP 7 unless we find an important platform that balks about it). Not being explicit in PEP7 yet suggests this should not be backported. |
This is really nothing more than a change test, it needs to match the C struct when no level23 mappings will be allocated. An empty char[] does not consume any C sizeof() space whereas the previous `char[1]` consumed one alignment size worth, even when unused.
@@ -28,6 +28,9 @@ _Py_IDENTIFIER(__bytes__); | |||
|
|||
Using PyBytesObject_SIZE instead of sizeof(PyBytesObject) saves | |||
3 bytes per string allocation on a typical system. | |||
|
|||
The + 1 accounts for the trailing \0 byte that we include as a safety | |||
measure for code that treats the underlying char * as a C string. | |||
*/ | |||
#define PyBytesObject_SIZE (offsetof(PyBytesObject, ob_sval) + 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wait, you made PyBytesObject 1 byte smaller, and you didn't have to substract 1 somewhere? Does it mean that Python overallocates 1 byte since forever?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thankfully not. Because this code used offsetof
the macro did not need to change. The offset never changes. The existing + 1
in the macro is easy to misinterpret as having been there to account for the [1]
but that isn't how it was used - PyBytesObject
is always allocated with that one extra byte that is filled with a NUL for C safety for PyBytes_AsString
users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The [1]
thing is used in more places:
PyObject *f_localsplus[1];
in PyFrameObjectPyObject *ob_item[1];
in PyTupleObjectvoid *elements[1];
in asdl_seq and int elements[1];` in asdl_int_seqdigit ob_digit[1];
in PyLongObjectPy_ssize_t ob_array[1];
in PyMemoryViewObject- _tracemalloc.c:
frame_t frames[1];
in traceback_t
Do we need to update all of them? Do you want to update them all?
Refleaks buildbots are likely caused by https://bugs.python.org/issue40115 (known leak) |
Yep, this is supported by xlc. |
@@ -8208,14 +8208,14 @@ struct encoding_map { | |||
PyObject_HEAD | |||
unsigned char level1[32]; | |||
int count2, count3; | |||
unsigned char level23[1]; | |||
unsigned char level23[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gpshead: Can you please write a PR which only contains this change? Since it doesn't touch the public C API, C++ compatibility is not an issue (I don't think that Python can be built with a C++ compiler). We can start with this change and see how it goes? See also the discussion at python/peps#1349
@@ -5,7 +5,7 @@ | |||
typedef struct { | |||
PyObject_VAR_HEAD | |||
Py_hash_t ob_shash; | |||
char ob_sval[1]; | |||
char ob_sval[]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest to:
- Leave the structure as it is
- Use a code search to check which C extension access directly PyBytesObject structure members (well, first check Cython and pybind11)
- Prepare affected C extensions
- Once the number of affected C extensions is low enough: make the structure opaque, enforce the usage of public C functions.
see the bug, a new PR can be made. |
It is undefined behaviour if index is beyond array size. The workaround
for this is the standard C99 feature known as "Flexible array member".
https://en.wikipedia.org/wiki/Flexible_array_member
https://bugs.python.org/issue40120