Skip to content

Enhance zend_dump_op_array to Properly Represent Non-Printable Characters (GH-15680) #15730

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from

Conversation

WangYihang
Copy link

@WangYihang WangYihang commented Sep 3, 2024

This pull request addresses issue GH-15680 which proposed an enhancement for zend_dump_op_array to properly represent non-printable characters in strings.

This is useful for debugging purposes, as it allows developers to see the actual content of strings that contain non-printable characters.

I have also added testcases for this change, and all tests related to this pull request were passed.

For example, for the following php code:

<?php
fwrite($buffer, "\x00\x01\xff--\n");

Current Output:

0040 INIT_FCALL 2 112 string("fwrite")
0041 SEND_VAR CV6($buffer) 1
0042 SEND_VAL string("--
") 2                                     <------ new line (\n) is directly printed
0043 DO_ICALL
Expected Output:

With this change, the output might looks like:

0040 INIT_FCALL 2 112 string("fwrite")
0041 SEND_VAR CV6($buffer) 1
0042 SEND_VAL string("\x00\x01\xff--\n") 2           <------ properly escaped with readable representations
0043 DO_ICALL

Related Issues/Pull requests:

…acters (phpGH-15680)

This change enhances `zend_dump_op_array` to properly represent non-printable characters in strings. This is useful for debugging purposes, as it allows developers to see the actual content of strings that contain non-printable characters.
Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See the optimisation suggestion.
The rest is fine.

Copy link
Member

@dstogov dstogov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see big problems, but I also don't see a real need for this.
@iluuu1994 what do you think?

const unsigned char len;
} char_repr_t;

static const char_repr_t char_reprs[256] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to separate this array into a header?
It's going to be duplicated every time the header is included.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I separated this array into a header file was primarily for modularity and reusability.

The current file (string.c) is already quite long, and combining these contents with the C file might make it harder to read and maintain.

However, if you believe this separation is unnecessary, I can certainly move this part back into string.c.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code copied from somewhere? There's no header/license that would indicate so.

@WangYihang
Copy link
Author

I don't see big problems, but I also don't see a real need for this. @iluuu1994 what do you think?

Thank you for your feedback.

While I understand that this enhancement might not seem immediately necessary, I believe it could bring several significant benefits:

  • Improved Developer Experience: Introducing a dedicated function or enhancing the existing one would provide developers with a more reliable way to handle non-printable characters, reducing the risk of errors in edge cases and improving code readability and maintainability. This need has been raised by several developers concerning string representation in PHP. Similar work has already been merged ([1] [2]), which focused on quoting a few specific characters like newlines. My pull request builds on that foundation by extending the functionality to cover all non-printable characters, making it a logical next step that should be considered for inclusion.

  • Consistency Across Languages: Many modern programming languages, such as Python (repr), Ruby (inspect), and JavaScript (JSON.stringify), offer similar functionality. Having a comparable feature in PHP would align it more closely with these languages, providing a familiar toolset for developers who work across multiple environments.

  • Prevention of Display Issues: During debugging, dumping non-printable control characters can disrupt the terminal's display, making the output difficult to interpret. A function that safely represents these characters would prevent such issues, improving debugging efficiency.

  • Easier Debugging with var_dump: Currently, PHP's var_dump function does not quote strings, which can make it challenging for developers to distinguish between visible and non-visible characters during debugging. This can lead to confusion and increased difficulty in identifying issues within the code. A standardized way to represent strings, including all non-printable characters, would alleviate this problem. Which could be left for the future work (If you guys agreed that it worthwhile, I can help to implement that).

  • Future-Proofing: Even if the need isn’t pressing at this moment, implementing this feature could support future debugging and logging functionalities, helping to keep PHP robust and user-friendly as it evolves.

I believe this change, while relatively minor, could add substantial value by addressing these points. I’m open to further refining the implementation based on your feedback.

What are your thoughts on this? @iluuu1994

@dstogov
Copy link
Member

dstogov commented Sep 9, 2024

@iluuu1994 @nielsdos @arnaud-lb please make a decision.

Copy link
Member

@iluuu1994 iluuu1994 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't object to changing this. LGTM otherwise.

const unsigned char len;
} char_repr_t;

static const char_repr_t char_reprs[256] = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code copied from somewhere? There's no header/license that would indicate so.

{"\\\"", 2},
{"#", 1},
{"$", 1},
{"%%", 1},
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fishy. %% only needs to be escaped in printf. Given the impl uses memcpy rather than strcpy, it shouldn't lead to a bug though.

Copy link
Author

@WangYihang WangYihang Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this code copied from somewhere? There's no header/license that would indicate so.

Actually, I just generated this header file by a few lines of ad-hoc Python code with a few manual tweaks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks fishy. %% only needs to be escaped in printf. Given the impl uses memcpy rather than strcpy, it shouldn't lead to a bug though.

Ahhhh, yes, I will fix it soon.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -3863,6 +3866,38 @@ PHPAPI zend_string *php_addcslashes_str(const char *str, size_t len, const char
}
/* }}} */

/* {{{ php_repr_str */
PHPAPI zend_string *php_repr_str(const char *str, size_t len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, and I would also improve the naming. I don't find this particularly descriptive.

Copy link
Author

@WangYihang WangYihang Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any suggestions about the naming? basically this function take a string as input, returns a zend_string which converts all non-printable chars to a human readable format (e.g. chr(256) will be convert to "\xff").

The repr part in the function name was inspired by the python repr function which means REPResenting the STRing.

repr returns a string containing a printable representation of an object in Python.

Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this change. This looks good to me, but I think we could reuse some existing code.

@@ -3863,6 +3866,38 @@ PHPAPI zend_string *php_addcslashes_str(const char *str, size_t len, const char
}
/* }}} */

/* {{{ php_repr_str */
PHPAPI zend_string *php_repr_str(const char *str, size_t len) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this function has the same output as smart_str_append_escaped() (minus the enclosing double quotes). Is there anything blocking us from using it?

Comment on lines -70 to +71
zend_string *escaped_string = php_addcslashes(Z_STR_P(zv), "\"\\", 2);

fprintf(stderr, " string(\"%s\")", ZSTR_VAL(escaped_string));

zend_string *escaped_string = php_repr_str(Z_STR_P(zv)->val, Z_STR_P(zv)->len);
fprintf(stderr, " string(%s)", ZSTR_VAL(escaped_string));
Copy link
Member

@arnaud-lb arnaud-lb Sep 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can achieve a similar result with php_addcslashes(Z_STR_P(zv), "\x00..\x1f\\\\\x7f..\xff", 10) (with the difference that it uses an octal representation), or the same result with smart_str_append_escaped().

nielsdos added a commit to nielsdos/php-src that referenced this pull request Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
nielsdos added a commit to nielsdos/php-src that referenced this pull request Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
nielsdos added a commit to nielsdos/php-src that referenced this pull request Dec 26, 2024
…nt Non-Printable Characters in String Literals

Replaces phpGH-15730 as that PR became stale.

But instead of introducing a new helper, reuse
smart_str_append_escaped(), this also removes the dependency on
ext/standard.
@nielsdos nielsdos closed this in 55afe8b Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants