-
-
Notifications
You must be signed in to change notification settings - Fork 31.9k
gh-127787: refactor helpers for PyUnicodeErrorObject
internal interface
#127789
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-127787: refactor helpers for PyUnicodeErrorObject
internal interface
#127789
Conversation
- Unify `get_unicode` and `get_string` in a single function. - Allow to retrieve the underlying `object` attribute and its size in one round. - Use a common implementation for the following functions: - `PyUnicode{Decode,Encode}Error_GetEncoding` - `PyUnicode{Decode,Encode,Translate}Error_GetObject` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
@encukou I've designed a NVM: just removing the parameter. It's easier to make the check |
@encukou A little implementation question. Do you think it's preferrable to have PyObject *
PyUnicodeEncodeError_GetEncoding(PyObject *self)
{
int rc = check_unicode_error_type(self, "UnicodeEncodeError");
return rc < 0 ? NULL : unicode_error_get_encoding_impl(self);
} with int rc = check_unicode_error_type(self, expect_type); Unless I use generating maocrs, I'll end up either duplicating the |
I think it's fine to silently accept “wrong” subclasses of |
Maybe I wasn't clear but I wasn't talking about subclasses or not. Since I'm using static inline PyObject *
unicode_error_get_encoding_impl(PyObject *self)
{
PyUnicodeErrorObject *exc = PyUnicodeError_CAST(self);
return as_unicode_error_attribute(exc->encoding, "encoding", false);
}
PyObject *
PyUnicodeEncodeError_GetEncoding(PyObject *self)
{
int rc = check_unicode_error_type(self, "UnicodeEncodeError");
return rc < 0 ? NULL : unicode_error_get_encoding_impl(self);
} or static inline PyUnicodeErrorObject *
as_unicode_error(PyObject *self, const char *expect_type)
{
int rc = check_unicode_error_type(self, expect_type);
return rc < 0 ? NULL : _PyUnicodeError_CAST(self);
}
static inline PyObject *
unicode_error_get_encoding_impl(PyObject *self, const char *expect_type)
{
PyUnicodeErrorObject *exc = as_unicode_error(self, expect_type);
return as_unicode_error_attribute(exc->encoding, "encoding", false);
}
PyObject *
PyUnicodeEncodeError_GetEncoding(PyObject *self)
{
return unicode_error_get_encoding_impl(self, "UnicodeEncodeError");
} The first solution delegates type-checking and attribute retrieval to two different functions ( |
Objects/exceptions.c
Outdated
/* | ||
* Return the underlying (str) 'encoding' attribute of a Unicode Error object. | ||
* | ||
* The caller is responsible to ensure that 'self' is a PyUnicodeErrorObject. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer an assert
over a “The caller is responsible to ensure...” comment. To a human reader, they should be equivalent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assert
is actually inside the _CAST
macro. It's just to document that this function would crash otherwise. The alternative is to remove the assert inside the CAST macro and make it an explicit one though that would add lines.
Hm, that looks like a style choice I can leave to you; they look similarly complex. There's one more style you can consider for internal functions, “error pass-through”: /* if self is NULL, return NULL; an exception must already be set */
static inline PyObject *
unicode_error_get_encoding_impl(PyObject *self) {
if (!self) {
return NULL,
}
PyUnicodeErrorObject *exc = PyUnicodeError_CAST(self);
return as_unicode_error_attribute(exc->encoding, "encoding", false);
}
PyObject *
PyUnicodeEncodeError_GetEncoding(PyObject *self)
{
PyObject *err = check_unicode_error_type(self, "UnicodeEncodeError");
return unicode_error_get_encoding_impl(err);
} Are you happy with the current iteration of the PR? |
Co-authored-by: Petr Viktorin <[email protected]>
No worries! I have a lot of PRs that are identical (namely UBSan ones) which you can just skip for now. Other PRs related to unicode error objects are those with codecs. I'm not on my dev session now and since it's a holidays period, I don't want to overwhelm you with review requests.
I'll have a look again tomorrow to decide the final state of the PR. I'm pretty happy with the current implementation (namely no pass-through, and assertion in the CAST) but I can consider the pass-through approach. It looks nice and could reduce the number of overall lines. I can also make sure that an exception is set before returning NULL so it would at least suit what I wanted to do (it would also decouple the logic of checking and performing the actual operation in I don't know how the exception class will evolve in the future, especially how we will decide to handle relative start/end indices (I think we're unfortunately stuck and won't really be able to change the behaviour since it's part of the stable ABI). |
The merge plan I had in mind was:
So until this one is merged, there is no need to review the others as the code will change a bit. Though, you can review them if you want to look at the logic only. |
This is typically useful for future refactorization and to be able to write lines below 80 characters. This also helps avoiding having to remember where to place the NULL arguments.
I eventually decided to avoid a pass-through. While it would work, I feel that it's not right to expect the callee to eventually rely on the fact that an exception has been set. However, do you want me to add NULL checks? (without those, the assertions would also crash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Just a few comment nitpicks left.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
… interface (pythonGH-127789) - Unify `get_unicode` and `get_string` in a single function. - Allow to retrieve the underlying `object` attribute, its size, and the adjusted 'start' and 'end', all at once. Add a new `_PyUnicodeError_GetParams` internal function for this. (In `exceptions.c`, it's somewhat common to not need all the attributes, but the compiler has opportunity to inline the function and optimize unneeded work away. Outside that file, we'll usually need all or most of them at once.) - Use a common implementation for the following functions: - `PyUnicode{Decode,Encode}Error_GetEncoding` - `PyUnicode{Decode,Encode,Translate}Error_GetObject` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
… interface (pythonGH-127789) - Unify `get_unicode` and `get_string` in a single function. - Allow to retrieve the underlying `object` attribute, its size, and the adjusted 'start' and 'end', all at once. Add a new `_PyUnicodeError_GetParams` internal function for this. (In `exceptions.c`, it's somewhat common to not need all the attributes, but the compiler has opportunity to inline the function and optimize unneeded work away. Outside that file, we'll usually need all or most of them at once.) - Use a common implementation for the following functions: - `PyUnicode{Decode,Encode}Error_GetEncoding` - `PyUnicode{Decode,Encode,Translate}Error_GetObject` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason` - `PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}`
Unify
get_unicode
andget_string
in a single function.Allow to retrieve the underlying
object
attribute, its size and its start and end indices in one round.Use a common implementation for the following functions:
PyUnicode{Decode,Encode}Error_GetEncoding
PyUnicode{Decode,Encode,Translate}Error_GetObject
PyUnicode{Decode,Encode,Translate}Error_{Get,Set}Reason
PyUnicode{Decode,Encode,Translate}Error_{Get,Set}{Start,End}
Note that there are some cosmetic changes here and there (in the naming of parameters) but these are essentially in prevision of #127694 in order to reduce the conflicts I'll need to solve (there will be conflicts probably but ideally, I want them to be minimal).
I've moved all helpers before the public API. I could move them inbetween but I felt that it's cleaner that way (it also allowed me to put double blank lines between functions a bit more easily).
PyUnicodeError
internal C helpers #127787