-
Notifications
You must be signed in to change notification settings - Fork 18k
net: figure out IDNA Punycode story #13835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I suppose it's too late to delete Punycode from the world, but I would like to understand exactly what security aspects are affected both by adding support to the library and by not adding it. |
The security aspect I'm most aware of is homograph attacks: https://en.wikipedia.org/wiki/IDN_homograph_attack Those are why presumably why browsers render http://кто.рф/eng/ as http://xn--j1ail.xn--p1ai/eng/ in the URL bar. But should this work? c, err := net.Dial("кто.рф:80") Or: res, err := http.Get("http://кто.рф/") I can type http://кто.рф/ into my URL bar and it loads. Requiring all users working with such URLs to do this seems onerous: u, err := url.Parse("http://кто.рф/")
if err != nil {
...
}
u.Host, err = idna.ToASCII(u.Host)
if err != nil {
...
}
res, err := http.Get(u.String()) The other security aspect is substring searching. But it's already the case that I think for security the best thing to do is be consistent which fields contain which form at all times, and document it. |
For what little it's worth, I did look into the topic of Unicode host names when working on url.Parse earlier in the cycle. Go 1.4 and earlier preserved them as is. Go 1.5 %-encoded the non-ASCII bytes (clearly not right). Go 1.6 goes back to the earlier preservation. For URL parsing the general rule is that the fields of the URL struct are unencoded, so it seemed best to keep Host in its original form, especially in non-web uses of URLs (like when people use mysql://whatever). I looked at what seemed to be relevant RFCs at the time and it looked like the HTTP protocol needed one algorithm for mangling to ASCII while DNS needed a different one. Maybe I misunderstood, but that split seemed another argument in favor of keeping the URL.Host field unmangled, and for keeping URL.String() unmangled at least for now. I agree that we should figure out how to make http.Get etc do the right thing. Internally http.Get already calls url.Parse, so if there is http-specific url mangling to do, it can be done there. Similarly we should figure out how to make net.Dial do the right thing, whatever that is. |
I've experienced needing to handle this at Bitly, so i'll add my 2cents. I think url.Parse should not handle IDN/Punycode conversion, as an IDN formatted Host is a valid URL format, and a URL struct constructed with an IDN Host (without using Parse) should also work. Changing round trip semantics of Also, as a side note, the same IDN/Punycode issue also applies to the construction of the When handling requests on the server side, I think the Host header should be left as-is (presumably in punycode if from a well behaved browser/client, but we all know that isn't always the case), but It's unclear to me what, if any, restriction net.Dial should have on unicode or IDN domains as DNS as i believe punycode is a DNS convention not a requirement (i need to do some RFC reading to be better informed on this one). |
Didn't happen for 1.7. |
The parser and encoder of net/mail.Address doesn't handle Punycode either. I've found it as a surprise(perhaps should be documented) because Address.Name is word-encoded/decoded. Edit: net/mail doesn't need additional documentation. After more research I've found that |
CL https://golang.org/cl/28961 mentions this issue. |
Adds golang.org/x/net/idna to the Go repo from the golang.org/x/net repo's git rev 7db922ba (Dec 2012). Punycode is needed for http.Get("привет.рф") etc., which will come in separate commits. Updates #13835 Change-Id: I313ed82d292737579a3ec5dcf8a9e766f2f75138 Reviewed-on: https://go-review.googlesource.com/28961 Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Ian Lance Taylor <[email protected]>
CL https://golang.org/cl/29072 mentions this issue. |
CL https://golang.org/cl/29071 mentions this issue. |
No tests, because tests are in std. (See TestTransportIDNA_h2) Updates golang/go#13835 Change-Id: I0a327d9652ea5e6f32dfa279550915af61567bed Reviewed-on: https://go-review.googlesource.com/29071 Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Chris Broadfoot <[email protected]>
CL https://golang.org/cl/29110 mentions this issue. |
Updates x/net/http2 (and x/net/lex/httplex) to git rev 749a502 for: http2: don't sniff first Request.Body byte in Transport until we have a conn https://golang.org/cl/29074 Fixes #17071 http2: add Transport support for unicode domain names https://golang.org/cl/29071 Updates #13835 http2: don't send bogus :path pseudo headers if Request.URL.Opaque is set https://golang.org/cl/27632 + http2: fix bug where '*' as a valid :path value in Transport https://golang.org/cl/29070 Updates #16847 http2: fix all vet warnings https://golang.org/cl/28344 Updates #16228 Updates #11041 Also uses the new -underscore flag to x/tools/cmd/bundle from https://golang.org/cl/29086 Change-Id: Ica0f6bf6e33266237e37527a166a783d78c059c4 Reviewed-on: https://go-review.googlesource.com/29110 Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Chris Broadfoot <[email protected]>
This CL makes code like this work: res, err := http.Get("https://фу.бар/баз") So far, IDNA support is limited to the http1 and http2 Transports. The http package is currently responsible for converting domain names into Punycode before calling the net layer. The http package also has to Punycode-ify the hostname for the Host & :authority headers for HTTP/1 and HTTP/2, respectively. No automatic translation from Punycode back to Unicode is performed, per Go's historical behavior. Docs are updated where relevant. No changes needed to the Server package. Things are already in ASCII at that point. No changes to the net package, at least yet. Updates x/net/http2 to git rev 57c7820 for https://golang.org/cl/29071 Updates #13835 Change-Id: I1e9a74c60d00a197ea951a9505da5c3c3187099b Reviewed-on: https://go-review.googlesource.com/29072 Reviewed-by: Chris Broadfoot <[email protected]> Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]>
What is the status here? Is this going to be part of Go 1.8? What is left? |
Are there any decisions remaining here or is it just all implementation? Assuming the latter. |
Per earlier chat with @ianlancetaylor, we're going to make |
Actually, I forgot why we might not want to do this in I think we might just want to keep this in I'll keep this bug open until I add some more tests (around case folding), and then we'll call it good for Go 1.8. |
Is it possible that we introduce a net/idna package which if imported for
side effects will add international domain name support to the net package?
Perhaps we don't even need to bloat net/http for that and rely on users to
choose whether to support idna or not.
|
CL https://golang.org/cl/29860 mentions this issue. |
CL https://golang.org/cl/29859 mentions this issue. |
@minux, perhaps. But feels slightly odd. We kinda do that for DNS if you include cgo, but that's pretty explicit. I'd rather consider that later, in Go 1.9. I have other things to focus on during 1.8. |
…pport Add golang.org/x/text/unicode/norm from x/text git rev a7c0236. Needed by net/http for IDNA normalization. Updates #13835 Change-Id: I8b024e179d573f2b093c209a4b9e4f71f7d4a1f2 Reviewed-on: https://go-review.googlesource.com/29859 Run-TryBot: Brad Fitzpatrick <[email protected]> TryBot-Result: Gobot Gobot <[email protected]> Reviewed-by: Marcel van Lohuizen <[email protected]>
In order to store URL's they need to be normalized. If I read all this correctly, then the following is still required to normalize URLs even in 1.8 correct?
|
(Fork of http://thread.gmane.org/gmane.comp.lang.go.devel/101215)
net/url and net/http say nothing about whose responsibility it is to map non-ASCII hostnames to/from Punycode.
Whose is it?
I was going to fix #13686 for Go 1.6 by vendoring the golang.org/x/net/idna package and using ToASCII when sending HTTP requests (when writing the Host header) in the HTTP client code.
But should I also use ToUnicode when reading Host lines in the HTTP server code? Or should I leave the url.URL.Host string in Punycode form?
@nigeltao said:
We should figure this out for Go 1.7.
Should net.Dial work on UTF-8 domain names?
Should url.Parse decode them into URL.Host? Should URL.Host accept either UTF-8 or Punycode?
Should Punycode be exposed in the stdlib?
The text was updated successfully, but these errors were encountered: