You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When mb_convert_encoding() interprets ISO-2022-JP byte sequences, the conversion result of 0x5c is different before and after interpreting 0x0e "Shift out" and 0x0f "Shift in" in JIS X 0201 mode.
Since these control characters do not have the ability to switch from JIS X 0201 character set to another, I believe that the same character must be assigned to the same byte sequence before and after 0x0e 0x0f.
Just a moment, this behavior is maybe historical.
According to Wikipedia (English ver) Shift In (0x0e) means "Latin letters", Shift Out (0x0f) means "Japanese letters". Therefore, "Shift in" can think to "Change to Latin letters (ASCII)", Second \ (0x5c) result is maybe \ (0x5c).
It is not included RFC 1468, but Han-kaku Katakana put on mail to one of technique.
I must say this is surprising to me. My understanding was that in JIS7/8, 0x0E is used to start a section of JISX 0201 kana, and that 0x0F is used to end a section of JISX 0201 kana.
Since the default mode for all ISO-2022-JP variants (the mode which we start every string in) is ASCII mode, I thought that 0x0F should return back to the default (ASCII) mode.
I think this is what mbstring has always done, for more than 10 years. But if that is not true, please let me know.
Since this would be a BC (backward compatibility) break, we would really need to think about it carefully. From past experience, I know that our Japanese users are very sensitive to any BC breaks involving Japanese text encodings. 😉
Hmm, it certainly seems that way when I read the web page @youkidearitai sent me.
I think that the JIS X 0201 specification switches within JIS X 0201, but I don't know how it works in ISO-2022-JP.
Reading IETF RFC 1468, there is no mention of JIS X 0201, and I don't think there are many ISO-2022-JP implementations that use SO/SI to begin with.
It might be better not to change it carelessly.
Thank you!
Description
When mb_convert_encoding() interprets ISO-2022-JP byte sequences, the conversion result of 0x5c is different before and after interpreting 0x0e "Shift out" and 0x0f "Shift in" in JIS X 0201 mode.
Since these control characters do not have the ability to switch from JIS X 0201 character set to another, I believe that the same character must be assigned to the same byte sequence before and after 0x0e 0x0f.
3v4l: https://3v4l.org/XrFGE
The following code:
Resulted in this output:
But I expected this output instead:
PHP Version
PHP 8.1.16
Operating System
No response
The text was updated successfully, but these errors were encountered: