Feature selection in workspace depends on the set of packages compiled #4463

New issue

Open

Feature selection in workspace depends on the set of packages compiled#4463

Labels

A-featuresA-workspacesC-bugE-hardS-needs-design

matklad

opened

on Sep 3, 2017

· edited by epage

Contributor

Maintainers notes

The recompilation was fixed, but this issue is still open regarding having features change based on what is being built simultaneously.
The cargo hack plugin will automatically expand cargo check --workspace (etc) to cargo check -p fail_test && cargo check -p lang_rust && ...,

Reproduction:

Check out this commit: matklad/fall@3022be4
Build some test with cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run
Build other tests with cargo test --all --verbose --no-run
Run cargo test -p fall_test -p fall_test -p lang_rust -p lang_rust -p lang_json --verbose --no-run again and observe that memchr and some other dependencies are recompiled.
Run cargo test --all --verbose --no-run and observe memchr recompiled again.

The verbose flag gives the following commands for memchr:

Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name memchr /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/memchr-1.0.1/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="libc"' --cfg 'feature="use_std"' -C metadata=be49c4722e8b48bf -C extra-filename=-be49c4722e8b48bf --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib --cap-lints allow -C target-cpu=native`

Here's the single difference:

--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-90ba32719d46f457.rlib 
--extern libc=/home/matklad/trash/fall/target/debug/deps/liblibc-335251832eb2b7ec.rlib

Versions (whyyyyy cargo is 0.21 and rustc is 1.20??? This is soo confusing)

λ cargo --version --verbose
cargo 0.21.0 (5b4b8b2ae 2017-08-12)
release: 0.21.0
commit-hash: 5b4b8b2ae3f6a884099544ce66dbb41626110ece
commit-date: 2017-08-12

~/trash/fall master
λ rustc --version
rustc 1.20.0 (f3d6973f4 2017-08-27)

added

ContributorAuthor

So, it has to do with features. Namely, two cargo invocations produce two different libcs:

Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="use_std"' -C metadata=335251832eb2b7ec -C extra-filename=-335251832eb2b7ec --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`
Running `rustc --crate-name libc /home/matklad/trash/registry/src/github.com-1ecc6299db9ec823/libc-0.2.30/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' --cfg 'feature="use_std"' -C metadata=90ba32719d46f457 -C extra-filename=-90ba32719d46f457 --out-dir /home/matklad/trash/fall/target/debug/deps -L dependency=/home/matklad/trash/fall/target/debug/deps --cap-lints allow -C target-cpu=native`

The only difference is --cfg 'feature="default"'.

So, I get two different libcs in target:

λ ls target/debug/deps | grep liblibc
.rw-r--r-- 982k matklad  3 Sep 14:06 liblibc-90ba32719d46f457.rlib
.rw-r--r-- 982k matklad  3 Sep 14:03 liblibc-335251832eb2b7ec.rlib

But I get a single memchr:

λ ls target/debug/deps | grep libmemchr
.rw-r--r-- 186k matklad  3 Sep 14:09 libmemchr-be49c4722e8b48bf.rlib

The file name is the same for both cargo commands, but the actual contents differs.

matklad

ContributorAuthor

Hm, so this looks like more serious then spurious rebuild!

Depending on what -p options you pass, you might end up with different final artifacts for the same package. This should not happen, right?

matklad

ContributorAuthor

Minimized example here: https://github.com/matklad/workspace-vs-feaures

added

mentioned this

Hashed dependencies of metadata into the metadata of a lib #4469

matklad

changed the title ~~[-]Spurious rebuilds when testing different packages of a workspace[/-]~~ Feature selection in workspace depends on the set of packages compiled

on Sep 5, 2017

matklad

ContributorAuthor

@alexcrichton continuing discussion here, instead of #4469 which is somewhat orthogonal, as you've rightly pointed out!

I don't think this'd be too hard to implement, but I'm not sure if this is what we'd want implemented per se. If one target of a workspace doesn't want a particular feature activated, wouldn't it be surprising if some other target present in a workspace far away activated the feature?

Yeah, it looks like what we ideally want here is that each final artifact gets the minimal set of features. And this should work even withing a single package: currently, activating feature in dev-dependecy will activate it for usual dependency as well. This is also something to keep in mind if we go the route of binary-only (or per-target) dependencies.

Though such fine-grained feature activation will cause more compilation work overall, so using union of featues might be a pragmatic choice, as long as we keep features additive, and it sort of makes sense, because crates in workspace share dependencies anyway. And seems better then definitely some random unrelated target activating features for you depending on the command line flags.

alexcrichton

Member

I think one of the main problems right now is that we're doing feature resolution far too soon, during the crate graph resolution. Instead what we should be doing is assuming all features are activated until we actually start compiling crates. That way if you have multiple targets all requesting different sets of features they'll all get separately compiled copies with the correct set of features.

Does that make sense? Or perhaps solving a different problem?

matklad

ContributorAuthor

Does that make sense? Or perhaps solving a different problem?

Yeah, totally, "they'll all get separately compiled copies with the correct set of features" is the perfect solution here, and it could be implemented by moving feature selection after the dependency resolution.

But I am really worried about additional work to get separately compiled copies, because it is multiplicative. Let's say you have a workspace with the following layout:

leaf crates A and B, which transitively depend on external crate libc with different features
A large number of intermediate crates, on which A and B also depend
An ubiquitous utils crate, that depends on libc and is a dependency of any other crate.

Because A and B require different features from libc, and because libc happens to be at the bottom of the dependency graph, that means that for cargo build --all we will compile every crate twice. Moreover, editing utils and then doing cargo build --all again recompiles everything two times.

So it's not that only libc will get duplicated, the whole graph may be duplicated in the worst case.

nipunn1313

Contributor

If we assume that features are additive (as intended), then the innermost crate could be compiled once with the union of all features.

Additive features are a bit of a subtle point though (see #3620). Recompiling is the safest way, though expensive.

alexcrichton

Member

@matklad yeah you're definitely right that the more aggressively we cache the more we end up caching :). @nipunn1313 you're also right that it should be safe for features to be unioned, but they often come with runtime or linkage implications. For example if a workspace has a no_std project and an executable, compiling both you wouldn't want to enable the standard library in the dependencies of the no_std project by accident!

I basically see this as there's a specification of what Cargo should be doing here. We've got, for example, two crates in a workspace, each which activates various sets of features in shared dependencies. Today Cargo does the "thing that caches too much" if you compile each separately (and also suffers a bug when you switch between projects it recompiles too much). Cargo also does the "union all the features" if you build both crates simultaneously (e.g. cargo build --all). Basically Cargo's not consistent!

I'd advocate that Cargo should try to stick to the "caches too much" solution as it's following the letter of the law of what you wrote down for a workspace. It also means that crates in a workspace don't need to worry too much about interfering with other crates in a workspace. Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.

matklad

ContributorAuthor

Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features.

This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental.

But now I too lean towards fine-grained features solution.

nipunn1313

Contributor

For what it's worth, we've done that exact trick with the parallel feature of the gcc crate. It does happen, but the workaround is ok.

…

On Wed, Sep 6, 2017 at 12:45 AM Aleksey Kladov ***@***.***> wrote: Projects that run into problems of the "too much is cached" nature I'd imagine could then do the investigation to figure out what features are turned on where, and try to get each workspace member to share more dependencies by unifying the features. This somewhat resolves my concern about build times, but not entirely. I am worried that it might not be easy to unify features manually, if they are turned on by private transitive dependencies. It would be possible to do by adding this private transitive dependency as an explicit and unused dependency, but this looks accidental. But now I too lean towards fine-grained features solution. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#4463 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABPXoxPIsKCCcH5DEgqtKzPt9ek34uLeks5sfk2EgaJpZM4PLGrK> .

123 remaining items

RalfJung

mentioned this

on Mar 10, 2024

Explore using cargo-hakari to avoid duplicate builds between miri test and miri run rust-lang/miri#3372

Arnavion

First cargo test will build it together with the tests, which means more features are enabled in some dependency, and then cargo run will build it again against a crate graph that has fewer features.

To be clear, this is the desirable behavior for many people, including me. I do *not* want the cargo run-compiled binary to have unnecessary features and dependencies enabled and creating bloat. After all that's why I didn't enable those features in the first place. It may even be *incorrect* to enable those features, eg tests might require "std" to be able to unwrap() but the compiled binary must not depend on libstd.

It would be great if there was some flag or so that made cargo run use the same crate graph as cargo test

Yes, if it's opt-in, then there's no problem.

sunshowers

Contributor

It is definitely desirable for many people to not do feature unification at times, either partially or fully. Hakari comes with several knobs to make that possible: https://docs.rs/cargo-hakari/latest/cargo_hakari/config/index.html#traversal-excludes

This is a complicated problem with no easy answers. Any solution in Cargo is going to need a ton of configuration knobs.

added

Contributor

I’ve got a use case that doesn’t appear to have been written up yet. I have some libraries with optional-but-default std features; with those features disabled, the libraries are no_std-capable. I then have a binary, which uses a subset of those libraries in no_std mode. Because the binary uses the libraries in no_std mode, it defines its own #[panic_handler]. I naïvely thought I could just put all the crates into a workspace and expect a plain cargo clippy at the workspace root to check all the crates using their individual default settings (i.e. mylib would be checked with std enabled, because that’s the default, but mybinary would be checked against mylib[-std], because that’s what it asked for). Unfortunately it doesn’t work that way; cargo clippy chooses only one feature set for mylib (which is, due it being a default feature, +std), and then the check of mybinary fails because there’s now a duplicate panic handler (one in mybinary and one in std).

I suppose one could say this is a case where the language kind of forces features to be non-additive. If we take “additive” to mean “works with the feature everywhere it would work without the feature”, then the std feature cannot be additive: it works in a no_std, panic_handler-defining binary without the feature but not with it.