Skip to content

Non IDNA Mode #1044

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mitsuhiko opened this issue Apr 28, 2025 · 5 comments
Closed

Non IDNA Mode #1044

mitsuhiko opened this issue Apr 28, 2025 · 5 comments

Comments

@mitsuhiko
Copy link

mitsuhiko commented Apr 28, 2025

I'm not sure at which point it has spiraled out of control that much, but right now this crate pulls in a highly excessive dependency tree. Most of this comes from idna_adapter (of which there is also a healthy discussion in #1009 and #938)

I understand that unicode is a pretty core component of what this crate needs to deal with, but it's pretty evident that dependency counts are not a major concern at the moment for the maintenance of this crate. On my rather beefy Macbook Pro a project using url takes 7.4 seconds to build in release mode. The actual crate itself takes 0.8 seconds, the rest is all the dependencies.

If you need full URL support that makes quite a bit of sense, but there are also other uses of URLs for which you can get away with much lighter URLs. For instance if you are working on an HTTP server, you do not actually ever get unicode URLs sent your way. Likewise if you want to support URLs as a service target (eg: postgres://user@database or something similar), you can also get away without unicode support.

I understand that there is in some way a way to opt to other backends by downgrading idna, but that does not really help if you want to use the url crate in your interface but you do not want to force that many dependencies onto people.

Would it be an option to create a (default) idna feature that turns all of this on?

@hsivonen
Copy link
Collaborator

Would it be an option to create a (default) idna feature that turns all of this on?

There's already a mechanism to turn this stuff off. It's documented in the READMEs of both url and idna:

Alternative Unicode back ends

url depends on the idna crate. By default, idna uses ICU4X as its Unicode back end. If you wish to opt for different tradeoffs between correctness, run-time performance, binary size, compile time, and MSRV, please see the README of the latest version of the idna_adapter crate for how to opt into a different Unicode back end.

That is, you get correctness by default, but you can opt out of correctness to get faster builds / smaller binary.

@hsivonen hsivonen reopened this Apr 28, 2025
@hsivonen hsivonen reopened this Apr 28, 2025
@mitsuhiko
Copy link
Author

As I mentioned the alternative mechanism is known. The issue here is this one:

I understand that there is in some way a way to opt to other backends by downgrading idna, but that does not really help if you want to use the url crate in your interface but you do not want to force that many dependencies onto people.

Right now if you use for instance the redis crate you get that entire dependency chain there. I don't think that's a great user experience and it forces such crates to consider alternatives.

@hsivonen
Copy link
Collaborator

I don't think that's a great user experience

More build time is indeed worse than less build time.

However, once you have the dependencies built, you can iterate on your own code without having to build the dependencies on every edit-compile-test cycle, so it's not a case of adding the seconds all the time within the development workflow. Also, this way, you get correctness, smaller binary, and better run-time performance (of the complied binary) by default. Having to opt into correctness (having IDNA support vs. not having it) or having to opt into smaller binary size and better run-time performance isn't great, either.

From Rust itself, we've seen again and again that someone does cargo build and is ready to write Rust off as generating large and slow binaries even though there's no shortage of advice saying that you should add --release before measuring such things.

If you only connect to a known ASCII-hostname (or IP address) end point, then having IDNA by default is a nuisance. However, for apps that deal with arbitrary host names, it would be bad to appear to work for ASCII host names but fail for non-ASCII host names unless further action is taken.

@mitsuhiko
Copy link
Author

If you only connect to a known ASCII-hostname (or IP address) end point, then having IDNA by default is a nuisance. However, for apps that deal with arbitrary host names, it would be bad to appear to work for ASCII host names but fail for non-ASCII host names unless further action is taken.

The use case I care about here is to be able to target backend services in a config file (eg: YAML/TOML etc.). You are quite unlikely to connect to a database that has a non ASCII hostname. In my particular case I'm working on a project which has pluggable backend storages and the URL crate and its dependencies take the majority of the compile time now which is why I opted against using it for now which is unfortunate for my use case.

Sure, further down the line I might re-introduce that crate anyways, but I rather try to avoid paying all that cost from day zero.

@hsivonen
Copy link
Collaborator

hsivonen commented May 9, 2025

The use case I care about here is to be able to target backend services in a config file (eg: YAML/TOML etc.).

The common case for URLs is dealing with https URLs, and dealing with arbitrary https URLs entails IDNA support, so it would be bad to optimize the defaults for your use case instead and require others to take some "actually make stuff work" configuration action.

As of 23 hours ago or so, idna_adapter 1.2.1 depends on ICU4X 2.0. Compared to the previous default of ICU4X 1.5, this reduces the crate graph size by 5 crates and increases build parallelism a bit. It does not address the core of your complaint, though.

It's not particularly relevant to look at the MacBook Pro build time of url or idna alone. When compiling reqwest trunk on M3 Pro, IDNA support costs about 1 second in build time. In an actual application using reqwest, especially one that has other reasons to have proc-macro2 in the dependency graph, there are further opportunities to absorb some of the 1 second into build parallelism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants