Skip to content

URL Parsing behaviour changed on Linux between 5.10 and 6.0 #863

@Lukasa

Description

@Lukasa
Contributor

When attempting to parse the string "ftp://user:password@*.xn--poema-9qae5a.com.br:4343/cat.txt" as a URL, the result is different between 5.10 and 6.0 on Linux, and on Apple platforms as well. The difference is at the very least in the host component.

This can be shown by using the following test program:

import Foundation

func main() {
    let x = URL(string: "ftp://user:password@*.xn--poema-9qae5a.com.br:4343/cat.txt")!
    print(x.host!)
}

main()
  • On Linux, Swift 5.10 (Swift version 5.10.1 (swift-5.10.1-RELEASE)), the result of this program is *.xn--poema-9qae5a.com.br.
  • On Linux, Swift 6.0 (Swift version 6.0-dev (LLVM 3751470251df3e4, Swift 026ffddec66fd08)), the result of the program is *.p%C3%A3oema%C3%A7%C3%A3.com.br.
  • On the latest macOS Sequoia 15.1 beta, the result is *.xn--poema-9qae5a.com.br.

Note that on Linux, 6.0, the URL type seems to have decided to thoroughly mangle the hostname. Specifically, it appears to have decoded the IDNA-encoded hostname, and then percent-encoded it when we got it back out. That can't possibly be the right thing to do.

Activity

self-assigned this
on Aug 19, 2024
0xTim

0xTim commented on Aug 20, 2024

@0xTim
Member

We're seeing similar issues in Vapor's tests for URL parsing which are failing on the 6.0 nightlies and on main.

Essentially creating a URLComponents with a scheme of http+unix is returning nil instead of a correct URL

jrflat

jrflat commented on Aug 20, 2024

@jrflat
Contributor

We're seeing similar issues in Vapor's tests for URL parsing which are failing on the 6.0 nightlies and on main.

Essentially creating a URLComponents with a scheme of http+unix is returning nil instead of a correct URL

http+unix is a bit special since it requires the host be percent-encoded. URLComponents was updated to use IDNA-encoding for hosts instead of percent-encoding, which is what modern WHATWG parsers do, too. A workaround would be to percent-encode the host string before assigning .encodedHost, like

comp.scheme = "https+unix"
comp.encodedHost = "%2Fsocket%2Fpath"

But perhaps a better solution on our end could be to add http(s)+unix to the list of special schemes for which we percent-encode the host. I can look into that, but we should also keep an eye on discussions within the IETF and WHATWG regarding this scheme for standardization guidance.

0xTim

0xTim commented on Aug 20, 2024

@0xTim
Member

@jrflat I'm assuming that will roll out to the macOS releases of 6.0 eventually too? Because currently the tests pass on latest Xcode Beta on macOS but not on Linux. (I'm trying to get a nightly toolchain to work so I can check but the compiler is crashing, the latest nightly I have that works doesn't show the same behaviour as the nightlies on Linux)

Assuming they're the same, we can add a fix to Vapor.

(Sorry Cory for highjacking the issue!)

Lukasa

Lukasa commented on Aug 20, 2024

@Lukasa
ContributorAuthor

Not a problem! It'll be relevant for AHC so it's a worthy digression.

jrflat

jrflat commented on Aug 21, 2024

@jrflat
Contributor

@0xTim We're working on updating the URL behavior to be the same across all Darwin and non-Darwin platforms, so a solution like special-casing http(s)+unix in swift-foundation would apply to both, and I think that's probably the best way forward

0xTim

0xTim commented on Aug 21, 2024

@0xTim
Member

Cool, I've verified that it's the URLComponents.string computed property where we're seeing the behaviour. You can verify a fix by checkout out the swift-6-url-failing-tests branch of Vapor and running the tests

jrflat

jrflat commented on Aug 21, 2024

@jrflat
Contributor

Just posted PR #883 which should fix URLComponents.string returning nil! I'll verify on that branch, thanks!

jrflat

jrflat commented on Aug 21, 2024

@jrflat
Contributor

@0xTim Looking at the URI.swift code, no Vapor workaround is needed for macOS, and the issue should be fixed on Linux when #883 lands. No workaround is needed for macOS because the code

#if canImport(Darwin)
if #available(macOS 13.0, iOS 16.0, tvOS 16.0, watchOS 9.0, *) {
    components.encodedHost = host.addingPercentEncoding(withAllowedCharacters: .urlHostAllowed)
} else {
    components.percentEncodedHost = host.addingPercentEncoding(withAllowedCharacters: .urlHostAllowed)
}
#else
components.percentEncodedHost = host.addingPercentEncoding(withAllowedCharacters: .urlHostAllowed)
#endif

uses .encodedHost for the macOS versions with updated URLComponents behavior, and .encodedHost behaves as expected. Linux will get the fixed .percentEncodedHost behavior when #883 lands. Hope this clears things up and thanks for bringing it to our attention!

0xTim

0xTim commented on Aug 22, 2024

@0xTim
Member

@jrflat would it be easier to just migrate to use the same code on macOS and Linux with encodedHost?

jrflat

jrflat commented on Aug 23, 2024

@jrflat
Contributor

That's probably a good idea in general, but we'll still update the URLComponents behavior, too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Development

    Participants

    @Lukasa@0xTim@jrflat

    Issue actions

      URL Parsing behaviour changed on Linux between 5.10 and 6.0 · Issue #863 · swiftlang/swift-foundation