-
Notifications
You must be signed in to change notification settings - Fork 589
pp_pack.c: Convert from using 'to_uvchr' functions to 'to_uv' #23564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
khwilliamson
wants to merge
9
commits into
Perl:blead
Choose a base branch
from
khwilliamson:pack_uvchr
base: blead
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Change a few lines that affect the next commits that use a coding style not conforming to current expectations And also add some blank lines to serve as "paragraph" markers between code segments, and change the indentation of a few lines in preparation for blocks being added and removed in the next commits.
This converts a call to utf8n_to_uvchr() to instead use utf8_to_uv_flags(). This is part of the process of replacing all the '_uvchr' functions with '_uv' functions, which are now preferred. See perlapi. There is a subtle change of behavior here when the string to convert is malformed. I believe this is a bug fix. The macro NEXT_UNI_VAL assumed the failure return of utf8n_to_uvchr() is 'retlen' being -1. However, that value is only ever returned if the flags passed to the function include UTF8_CHECK_ONLY. Inspection of the code showed that this macro is never called with that flag; so this never currently fails. I looked at Perl v5.8.9, and the situation was the same. I doubt it has changed until now. This commit changes to use utf8_to_uv_flags(), and its failure return is properly checked for. The upshot is that this could now fail; whereas it couldn't before, even though the clear intent of the code has been to fail upon error.
Like the previous commit, this converts a call to utf8n_to_uvchr() to instead use utf8_to_uv_flags(). And like that commit, the previous code thought it was checking for failure and croaking when it occurred, but was checking the wrong way so that it never could fail. And now it does a proper check and can fail.
The new form croaks under failure; the previous didn't. But to get here the previous paragraph of code had to succeed, which means this should too. The flag in the changed call is the default for utf8_to_uv_or_die(), so it can be removed.
This converts a call to utf8n_to_uvchr(), using instead utf8_to_uv(). The flag parameter UTF8_ALLOW_DEFAULT before this commit is the default for utf8_to_uv(), so it can be omitted. Like previous commits, the previous code thought it was checking for failure and croaking, but was checking the wrong way so that it never could fail. In the new commit, I had to remove that checking, because it caused test suite failures.
Like previous commits, this converts a call to utf8n_to_uvchr() to use instead utf8_to_uv_flags(). And like previous commits, the previous code thought it was checking for failure and croaking, but was checking the wrong way so that it never could fail. With the new commit, it can fail. Also, the previous commit stored the result in an 'int', whose range can be significantly less than the code point returned. The value was truncated without notice. Now, I croak if the value got truncated. I wonder what I should do, as this is for unpacking a 'c' format, which is supposed to be a signed 8-bit value.
Like previous commits, this converts a call to utf8n_to_uvchr() to instead use utf8_to_uv_flags(). Unlike those commits, the previous code did in fact properly check for failure. But upon failure, it semi-unsafely assumed the start byte was accurate as to the intended byte length of the input character. (It did check that the returned value did not go past the end of the buffer, by using UTF8_SAFE_SKIP() ). The new function call always returns the correct number of bytes, so just use that instead of trying to recompute it. The mechanism for deteriming failure no longer depends on the CHECK_ONLY flag, so that is removed.
Prior to this commit, the code went through the buffer with warnings turned off, but saving the fact if there were problems. If so, it ran through the buffer again, with warnings turned on to display those problems. I'm unsure of why it didn't output any warnings until it got to the end. But utf8_to_uv_msgs() allows us to store up the messages and output them later, without having to reparse
The reason for it to be more than this was removed in the previous commit
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This series of commits converts the calls that convert from UTF-8 to code points from using the utf8n_to_uvchr() forms to instead use the newer utf8_to_uv_foo() forms.
One reason to do this is that the newer forms have a much easier interface for returning to the caller if the input was malformed. The old interface was arcane, and this file did not use it correctly. Since at least 5.8.9, just one of the calls in this file knew that the string was malformed. Everywhere else, it thought things were hunky dory when they were not.
That meant that the code to cope with malformedness was dead; it never got executed. This p.r. rectifies that. However, it turns out that
t/utf8_decode.t
was relying on one of the code paths being dead. I had to remove the check when it came alive.So I took care to make each commit in this series change only one call; to aid in bisecting if field experience indicates other calls should not check for validity.