Skip to content

PyTupleObject::ob_item[1] with out-of-bounds access => undefined behavior #94250

Closed as not planned
@MaskRay

Description

@MaskRay

Include/cpython/tupleobject.h has

typedef struct {
    PyObject_VAR_HEAD
    /* ob_item contains space for 'ob_size' elements.
       Items must normally not be NULL, except during construction when
       the tuple is not yet visible outside the function that builds it. */
    PyObject *ob_item[1];
} PyTupleObject;

CPython may allocate the object with trailing elements, then access it with something like
PyUnicode_InternInPlace(&_PyTuple_ITEMS(tuple)[i]); where i > 0.

This out-of-bounds access is UB. (https://stackoverflow.com/questions/44745677/flexible-array-members-can-lead-to-undefined-behavior mentioned that before C99 TC2 there was a non-normative example which suggested that [1] can be used. That was incorrect and was removed by TC2.)

The 2022-06-24 Clang -fstrict-flex-arrays commit (https://reviews.llvm.org/D126864 https://reviews.llvm.org/rG886715af962de2c92fac4bd37104450345711e4a) made -fsanitize=array-bounds stricter and would catch such UB. Note: the Clang patch appears non-comprehensive. It misses many similar UB cases but catches the CPython UB.

Reproduce (with clang compiled from latest llvm-project):

% mkdir -p out/bounds && cd out/bounds
% ../../configure CC=/tmp/RelA/bin/clang CXX=/tmp/RelA/bin/clang++ CFLAGS=-fsanitize=bounds LDFLAGS=-fsanitize=bounds
% make -j 60
 CC='/tmp/RelA/bin/clang' LDSHARED='/tmp/RelA/bin/clang -shared -fsanitize=bounds   ' OPT='-DNDEBUG -g -fwrapv -O3 -Wall'       ./python -E ../../setup.py  build
../../Objects/codeobject.c:49:34: runtime error: index 2 out of bounds for type 'PyObject *[1]' (aka 'struct _object *[1]')
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior ../../Objects/codeobject.c:49:34 in 
running build
running build_ext

See also https://gcc.gnu.org/onlinedocs/gcc/Zero-Length.html GCC appears to be more permissive, at least til now: "Although using one-element arrays this way is discouraged, GCC handles accesses to trailing one-element array members analogously to zero-length arrays."

There are multiple suspicious places in CPython:

% rg 'ob_.*\[1\]'
Include/memoryobject.h
65:    Py_ssize_t ob_array[1];       /* shape, strides, suboffsets */

Tools/gdb/libpython.py
876:               digit ob_digit[1];

Include/cpython/tupleobject.h
10:    PyObject *ob_item[1];

Include/cpython/longintrepr.h
81:    digit ob_digit[1];

Include/cpython/bytesobject.h
8:    char ob_sval[1];

Metadata

Metadata

Assignees

No one assigned

    Labels

    type-bugAn unexpected behavior, bug, or error

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions