You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I got a question about support for a "pattern" expression that matches only printable Unicode characters. It appears many implementations do not have support for multibyte characters in regular expressions, especially in ECMAScript. This means that non-BMP characters (which must be represented with surrogate pairs in UTF-16) don't work as expected:
This regular expression is only matching the second character of the surrogate pair. Even though JSON Schema suggests limiting patterns to ECMAScript-compatible regular expressions, this is not intended to constrain the encoding of the string; JSON only decodes to a string of Unicode code points. ECMAScript implementations that want to match Unicode characters correctly will need to match the two-byte sequence like so:
>newRegExp('^(🐲)*$').test('🐲🐲')true
Or use the u flag available in newer implementations of ECMAScript:
>newRegExp('^🐲*$','u').test('🐲🐲')true
The "pattern" and "patternProperties" tests should test for this behavior.
The text was updated successfully, but these errors were encountered:
On Wed, May 22, 2019, 20:15 Austin Wright ***@***.***> wrote:
I got a question about support for a "pattern" expression that matches
only printable Unicode characters. It appears many implementations do not
have support for multibyte characters in regular expressions, especially in
ECMAScript. This means that non-MBP characters (which must be represented
with surrogate pairs in UTF-16) don't work as expected:
> new RegExp('^🐲*$').test('')
false
> new RegExp('^🐲*$').test('🐲')
true
> new RegExp('^🐲*$').test('🐲🐲')
false
This regular expression is only matching the second character of the
regular expression. Even though JSON Schema suggests ECMAScript-compatible
regular expressions, this is not intended to constrain the type of string
supplied to the regexp test; JSON only decodes to a string of Unicode code
points. ECMAScript implementations that want to match Unicode characters
correctly will need to match the two-byte sequence like so:
> new RegExp('^(🐲)*$').test('🐲🐲')
true
Or use the u flag available in newer implementations of ECMAScript:
> new RegExp('^🐲*$', 'u').test('🐲🐲')
true
The "pattern" and "patternProperties" tests should test for this behavior.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#264?email_source=notifications&email_token=AACQQXRZD2U42WC743NDHG3PWXO23A5CNFSM4HOY5IU2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4GVKWWVA>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACQQXWSTPBCS2N3XM5B2ODPWXO23ANCNFSM4HOY5IUQ>
.
I got a question about support for a "pattern" expression that matches only printable Unicode characters. It appears many implementations do not have support for multibyte characters in regular expressions, especially in ECMAScript. This means that non-BMP characters (which must be represented with surrogate pairs in UTF-16) don't work as expected:
This regular expression is only matching the second character of the surrogate pair. Even though JSON Schema suggests limiting patterns to ECMAScript-compatible regular expressions, this is not intended to constrain the encoding of the string; JSON only decodes to a string of Unicode code points. ECMAScript implementations that want to match Unicode characters correctly will need to match the two-byte sequence like so:
Or use the
u
flag available in newer implementations of ECMAScript:The "pattern" and "patternProperties" tests should test for this behavior.
The text was updated successfully, but these errors were encountered: