Skip to content

mb_detect_encoding is more accurate on strings with UTF-8/16 BOM #10373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

alexdowad
Copy link
Contributor

Thanks to the GitHub user 'titanz35' for pointing out that the new implementation of mb_detect_encoding had poor detection accuracy on UTF-8 and UTF-16 strings with a byte-order mark.

This relates to #7871. That issue is primarily about the detection accuracy of mb_detect_encoding on strings which contain emoji, but one commenter also pointed out a problem with strings that start with a byte-order mark. Once the emoji issue is also worked out, then #7871 can be closed.

@cmb69 @Girgias @nikic @kamil-tekiela @youkidearitai

Thanks to the GitHub user 'titanz35' for pointing out that the new
implementation of mb_detect_encoding had poor detection accuracy on
UTF-8 and UTF-16 strings with a byte-order mark.
Copy link
Member

@Girgias Girgias left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to make sense, should this be backported to PHP 8.1?

@alexdowad
Copy link
Contributor Author

Seems to make sense, should this be backported to PHP 8.1?

That's a good idea.

@youkidearitai
Copy link
Contributor

Looks good to me.

@alexdowad
Copy link
Contributor Author

Hmm, the patch doesn't apply on PHP-8.1 because the code has been refactored on PHP-8.2...

@alexdowad
Copy link
Contributor Author

Hmm, it doesn't even apply on PHP-8.2.

@alexdowad
Copy link
Contributor Author

Merging into PHP 8.3 for now. Applying the same fix to 8.1 and 8.2 might be done later (though it's more difficult to do this with the legacy code).

@alexdowad alexdowad closed this Jan 19, 2023
@alexdowad alexdowad deleted the bom branch January 19, 2023 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants