Skip to content

CLN validate some Py_* Functions in C Extensions check for NULL #50997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions pandas/_libs/src/parser/io.c
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,9 @@ void *buffer_rd_bytes(void *source, size_t nbytes, size_t *bytes_read,
args = Py_BuildValue("(i)", nbytes);

func = PyObject_GetAttrString(src->obj, "read");
if (func == NULL) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you also need to Py_DECREF(args) - looks to be Py_XDECREF a few lines down but won't happen if you return here.

As a side note - I think we also too liberally use Py_XDECREF. But that's another issue for another day

}

/* TODO: does this release the GIL? */
result = PyObject_CallObject(func, args);
Expand Down
6 changes: 6 additions & 0 deletions pandas/_libs/src/ujson/python/JSONtoObj.c
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,9 @@ JSOBJ Object_npyNewArray(void *prv, void *_decoder) {
if (decoder->curdim <= 0) {
// start of array - initialise the context buffer
npyarr = decoder->npyarr = PyObject_Malloc(sizeof(NpyArrContext));
if (npyarr == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit redundant with L116, though I find the npyarr == NULL approach more explicit. I also think the position of this is better than L116 but need to combine these pieces of code

return NULL;
}
decoder->npyarr_addr = npyarr;

if (!npyarr) {
Expand All @@ -119,6 +122,9 @@ JSOBJ Object_npyNewArray(void *prv, void *_decoder) {
npyarr->labels[0] = npyarr->labels[1] = NULL;

npyarr->shape.ptr = PyObject_Malloc(sizeof(npy_intp) * NPY_MAXDIMS);
if (npyarr->shape.ptr == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point in the code you have malloc'ed decoder->npyarr, so you also need to free it (using PyObject_Free instead of the built-in)

return NULL;
}
npyarr->shape.len = 1;
npyarr->ret = NULL;

Expand Down
4 changes: 4 additions & 0 deletions pandas/_libs/src/ujson/python/date_conversions.c
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,10 @@ char *PyDateTimeToIso(PyObject *obj, NPY_DATETIMEUNIT base,

*len = (size_t)get_datetime_iso_8601_strlen(0, base);
char *result = PyObject_Malloc(*len);
if (result == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PyErr_NoMemory();
return NULL;
}
// Check to see if PyDateTime has a timezone.
// Don't convert to UTC if it doesn't.
int is_tz_aware = 0;
Expand Down
24 changes: 24 additions & 0 deletions pandas/_libs/src/ujson/python/objToJSON.c
Original file line number Diff line number Diff line change
Expand Up @@ -149,6 +149,9 @@ static TypeContext *createTypeContext(void) {
TypeContext *pc;

pc = PyObject_Malloc(sizeof(TypeContext));
if (pc == NULL) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment here about PyErr_NoMemory

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though again this is the same as L155, though I think the pc == NULL check is a lot more explicit

}
if (!pc) {
PyErr_NoMemory();
return NULL;
Expand Down Expand Up @@ -183,11 +186,17 @@ static PyObject *get_values(PyObject *obj) {
// without going through and object array of Timestamps.
if (PyObject_HasAttrString(obj, "tz")) {
PyObject *tz = PyObject_GetAttrString(obj, "tz");
if (tz == NULL) {
return NULL;
}
if (tz != Py_None) {
// Go through object array if we have dt64tz, since tz info will
// be lost if values is used directly.
Py_DECREF(tz);
values = PyObject_CallMethod(obj, "__array__", NULL);
if (values == NULL) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm +/-0 on having to do this here. If values is NULL the return values naturally propagates up the caller, who is also responsible for checking. But if it helps enforce patterns then this is harmless

return NULL;
}
return values;
}
Py_DECREF(tz);
Expand All @@ -211,10 +220,19 @@ static PyObject *get_values(PyObject *obj) {

if (values == NULL) {
PyObject *typeRepr = PyObject_Repr((PyObject *)Py_TYPE(obj));
if (typeRepr == NULL) {
return NULL;
}
PyObject *repr;
if (PyObject_HasAttrString(obj, "dtype")) {
PyObject *dtype = PyObject_GetAttrString(obj, "dtype");
if (dtype == NULL) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need to Py_DECREF(typeRepr) before returning

}
repr = PyObject_Repr(dtype);
if (repr == NULL) {
return NULL;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment above on Py_DECREF(typeRepr); might be a clever way to refactor the function so you don't need to repeat in every error handler, but also OK to repeat for now

}
Py_DECREF(dtype);
} else {
repr = PyUnicode_FromString("<unknown dtype>");
Expand All @@ -233,12 +251,18 @@ static PyObject *get_values(PyObject *obj) {

static PyObject *get_sub_attr(PyObject *obj, char *attr, char *subAttr) {
PyObject *tmp = PyObject_GetAttrString(obj, attr);
if (tmp == NULL) {
return NULL;
}
PyObject *ret;

if (tmp == 0) {
return 0;
}
ret = PyObject_GetAttrString(tmp, subAttr);
if (ret == NULL) {
return ret;
}
Py_DECREF(tmp);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess this is a case where it could make sense to have say Py_DECREF right before the error checking; whether or not things worked you are down with the lifecycle of tmp


return ret;
Expand Down