-
Notifications
You must be signed in to change notification settings - Fork 150
Double-encoded IDNA labels don't roundtrip #603
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think double-encoded labels must be considered invalid. I see two ways to achieve this:
|
Setting Note that this would also seem to impact #438 depending on where things land. |
@rmisev Indeed! As evident in UTS 46, the CheckHyphens boolean was first introduced to allow YouTube labels of form "r3---sn-apo3qvuoxuxbt-j5pe". (Previously the boolean was effectively always "true".) But I suppose the Unicode folks didn't consider the possibility of |
Domains (in GTLD) are generally restricted from dashes in the 3rd and 4th
positions /^[A-z0-9][A-z0-9]--/ if this helps. IDNA takes advantage of
this and was the catalyst.
…On Mon, May 17, 2021, 4:01 PM Timothy Gu ***@***.***> wrote:
@rmisev <https://github.com/rmisev> Indeed! As evident in UTS 46, the
*CheckHyphens* boolean was first introduced to allow YouTube labels of
form "r3---sn-apo3qvuoxuxbt-j5pe". (Previously it was always "true".) But I
suppose the Unicode folks didn't consider the possibility of xn--, which
now needs to be forbidden explicitly.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#603 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AACQTJMX6K4GSTMXPSU262DTOGN4XANCNFSM45BA2GJQ>
.
|
@macchiati not sure what we should do here, but wanted to bring this to your attention. |
That is no longer the case. Firefox now throws for |
@valenting does that follow from the specification in any way though? |
The change happened as a consequence of Bug 1724233 - IDNA does not conform to RFC and is interpreted as a different hostname. |
From the bug report, it looks like this was just not following the spec
…On Fri, Dec 9, 2022, 05:10 Valentin Gosu ***@***.***> wrote:
The change happened as a consequence of Bug 1724233 - IDNA does not
conform to RFC and is interpreted as a different hostname
<https://bugzilla.mozilla.org/show_bug.cgi?id=1724233>.
—
Reply to this email directly, view it on GitHub
<#603 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACJLEMDDXUQBILYEM2UINRTWMMVT5ANCNFSM45BA2GJQ>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
I think @macchiati is correct.
We should add a WPT for this, but I think this case is adequately covered by the specification and CheckHyphens doesn't impact it one way or another. |
There is test coverage for this already in {
"comment": "Invalid Punycode (contains non-ASCII character)",
"input": "xn--tešla",
"output": null
} Per https://wpt.fyi/results/url/toascii.window.html Chromium-based browsers have some issues there for Closing this therefore, but please comment if my analysis was lacking somehow. |
@annevk rust-url fuzzing has found another test case for IDNA that doesn't round-trip:
Should we reopen this, or open a new issue? |
I'd prefer a new issue. Both of those result in failure in WebKit so I'd appreciate it if you could go through the steps as I did in #603 (comment) to find out if this is an actual bug in the specification or if we should add these as tests. |
Consider
xn--xn---epa
. It appears that using the current domain to Unicode algorithm (as implemented by Node.js), this would get converted toxn--é
. But applying domain to ASCII onxn--é
would produce a Punycode decoding failure. It sounds like domain to Unicode (or even UTS 46) should return failure on such labels.This is somewhat important because Firefox can create such double-encoded labels:
The text was updated successfully, but these errors were encountered: