-
Notifications
You must be signed in to change notification settings - Fork 7.9k
Enhance zend_dump_op_array
to Properly Represent Non-Printable Characters (GH-15680)
#15730
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…acters (phpGH-15680) This change enhances `zend_dump_op_array` to properly represent non-printable characters in strings. This is useful for debugging purposes, as it allows developers to see the actual content of strings that contain non-printable characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See the optimisation suggestion.
The rest is fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see big problems, but I also don't see a real need for this.
@iluuu1994 what do you think?
const unsigned char len; | ||
} char_repr_t; | ||
|
||
static const char_repr_t char_reprs[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to separate this array into a header?
It's going to be duplicated every time the header is included.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I separated this array into a header file was primarily for modularity and reusability.
The current file (string.c
) is already quite long, and combining these contents with the C file might make it harder to read and maintain.
However, if you believe this separation is unnecessary, I can certainly move this part back into string.c
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code copied from somewhere? There's no header/license that would indicate so.
Thank you for your feedback. While I understand that this enhancement might not seem immediately necessary, I believe it could bring several significant benefits:
I believe this change, while relatively minor, could add substantial value by addressing these points. I’m open to further refining the implementation based on your feedback. What are your thoughts on this? @iluuu1994 |
@iluuu1994 @nielsdos @arnaud-lb please make a decision. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't object to changing this. LGTM otherwise.
const unsigned char len; | ||
} char_repr_t; | ||
|
||
static const char_repr_t char_reprs[256] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code copied from somewhere? There's no header/license that would indicate so.
ext/standard/charrepr.h
Outdated
{"\\\"", 2}, | ||
{"#", 1}, | ||
{"$", 1}, | ||
{"%%", 1}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fishy. %%
only needs to be escaped in printf
. Given the impl uses memcpy
rather than strcpy
, it shouldn't lead to a bug though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this code copied from somewhere? There's no header/license that would indicate so.
Actually, I just generated this header file by a few lines of ad-hoc Python code with a few manual tweaks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks fishy. %% only needs to be escaped in printf. Given the impl uses memcpy rather than strcpy, it shouldn't lead to a bug though.
Ahhhh, yes, I will fix it soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -3863,6 +3866,38 @@ PHPAPI zend_string *php_addcslashes_str(const char *str, size_t len, const char | |||
} | |||
/* }}} */ | |||
|
|||
/* {{{ php_repr_str */ | |||
PHPAPI zend_string *php_repr_str(const char *str, size_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, and I would also improve the naming. I don't find this particularly descriptive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any suggestions about the naming? basically this function take a string as input, returns a zend_string which converts all non-printable chars to a human readable format (e.g. chr(256)
will be convert to "\xff"
).
The repr
part in the function name was inspired by the python repr
function which means REPResenting the STRing.
repr
returns a string containing a printable representation of an object in Python.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this change. This looks good to me, but I think we could reuse some existing code.
@@ -3863,6 +3866,38 @@ PHPAPI zend_string *php_addcslashes_str(const char *str, size_t len, const char | |||
} | |||
/* }}} */ | |||
|
|||
/* {{{ php_repr_str */ | |||
PHPAPI zend_string *php_repr_str(const char *str, size_t len) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this function has the same output as smart_str_append_escaped()
(minus the enclosing double quotes). Is there anything blocking us from using it?
zend_string *escaped_string = php_addcslashes(Z_STR_P(zv), "\"\\", 2); | ||
|
||
fprintf(stderr, " string(\"%s\")", ZSTR_VAL(escaped_string)); | ||
|
||
zend_string *escaped_string = php_repr_str(Z_STR_P(zv)->val, Z_STR_P(zv)->len); | ||
fprintf(stderr, " string(%s)", ZSTR_VAL(escaped_string)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can achieve a similar result with php_addcslashes(Z_STR_P(zv), "\x00..\x1f\\\\\x7f..\xff", 10)
(with the difference that it uses an octal representation), or the same result with smart_str_append_escaped()
.
…nt Non-Printable Characters in String Literals Replaces phpGH-15730 as that PR became stale. But instead of introducing a new helper, reuse smart_str_append_escaped(), this also removes the dependency on ext/standard.
…nt Non-Printable Characters in String Literals Replaces phpGH-15730 as that PR became stale. But instead of introducing a new helper, reuse smart_str_append_escaped(), this also removes the dependency on ext/standard.
…nt Non-Printable Characters in String Literals Replaces phpGH-15730 as that PR became stale. But instead of introducing a new helper, reuse smart_str_append_escaped(), this also removes the dependency on ext/standard.
This pull request addresses issue GH-15680 which proposed an enhancement for
zend_dump_op_array
to properly represent non-printable characters in strings.This is useful for debugging purposes, as it allows developers to see the actual content of strings that contain non-printable characters.
I have also added testcases for this change, and all tests related to this pull request were passed.
For example, for the following php code:
Current Output:
With this change, the output might looks like:
Related Issues/Pull requests:
zend_dump_op_array
to Properly Represent Non-Printable Characters in String Literals #15680zend_dump_const
output #10576