Skip to content

Race condition when trying to compile a lib multiple times for different purposes. #5444

@ehuss

Description

@ehuss
Contributor

As part of #5384 I created some circumstances where the same lib may be built concurrently with different settings. I thought this would be safe since they have different hashes in the filename, but I neglected to address this case when the library is hardlinked. There is a race condition in hardlink_or_copy where one compilation will stomp on another via the hardlink.

Details for the curious.
  1. One compilation creates a hardlink.
  2. The other compilation fails to create a hardlink with EEXIST.
  3. 2nd compilation copies its lib into the existing hardlink, modifying the first one's contents.

This can happen in a variety of situations (mainly with panic), but here is a small repro:

cargo new --lib stomp
cd stomp
cat >> Cargo.toml <<EOL
[profile.dev]
panic = "abort"
EOL
cat > src/main.rs <<EOL
extern crate stomp;
fn main() {}
EOL

# This will fail some of the time on some platforms with wrong panic strategy.
# Note: This requires #5384.  Earlier versions would fail 100% of the time.
# This attempts to build `lib.rs` twice at the same time, once with `panic` 
# and once without (for tests).
cargo build --all-targets -v

I'm trying to think of a fix, but I haven't come up with anything, yet. Generally I'm thinking that these "non-panic" dependencies should not be hard linked at all, but there's no obvious way to detect when a Unit is one of these. We could add a flag to Unit to detect this scenario, but that would require adding some extra de-duplication logic and generally seems a little too complex.

I don't think it is worth it to try to fix the race condition itself since that is very tricky, and would end up with a random lib in the output directory anyway.

I'll keep trying to think of a solution, but if anyone has any thoughts I'd be happy to hear.

Activity

alexcrichton

alexcrichton commented on May 1, 2018

@alexcrichton
Member

Thanks for tracking this down! This I think is the cause of the errors I saw yesterday, right?

I may be reaching a different conclusion with this though, when I use your above example I get compliations that look like:

  • Library, panic=abort
  • Library, as a benchmark
  • LIbrary, as a test harness
  • Binary, as a benchmark, linked to library as panic=abort
  • Binary, as a test harness, linked to library as panic=abort
  • Binary, panic=abort, linked to library as panic=abort

It looks like the bug here is that the library is never compiled in a linkable format with panic=unwind so the binary, when build as a benchmark or test, doesn't have a suitable library to link against.

Now I'd definitely imagine that there's also some stomping over hard links going on here! I'm curious though what you think of the above?

ehuss

ehuss commented on May 1, 2018

@ehuss
ContributorAuthor

It is either this or #5445 (cargo build -p wasm-bindgen-cli --release && cargo build -p add --release reliably triggers #5445 because the wasm-bindgen-macro package depends on serde).

What you're describing is the old behavior. With a build that includes #5384, lib.rs is built four times:

  1. lib with/panic=abort
  2. lib without panic (for tests/benches)
  3. lib as --test
  4. lib as --test for benchmarks

And because 1 and 2 are built at the same time, and both want to hardlink at the same time, that's what causes this problem.

Can you describe what the use cases are for hardlinking libs? I'm guessing it's for integrating with other build systems? Are there other scenarios? Why are only workspace libs elevated (and not dependencies)?

alexcrichton

alexcrichton commented on May 1, 2018

@alexcrichton
Member

Aha right sorry my mistake, it looks like that hasn't yet made its way into the nightlies which is why I was confused!

Currently yeah it's primarily for integrating into other build systems, but we should only link up the "primary" dependency which in this case is the one with panic=abort. I also think it's fine to actually skip hardlinking rlibs entirely, I don't think any build system integration happens at the rlib layer, only at other staticlib/executable/dylib layers.

I think that'd fix the issue, right? If we stopped hardlinking rlibs?

alexcrichton

alexcrichton commented on May 2, 2018

@alexcrichton
Member

I think this may have caused this failure: #5456 (comment)

matklad

matklad commented on May 2, 2018

@matklad
Member

I also think it's fine to actually skip hardlinking rlibs entirely

+1 for hard linking less stuff. Note that #5203 specifically went through several hoops to copy only relevant files to the --out-dir. The logic that that PR uses is "link only top-level units, requested from the command line". Perhaps we can use the same logic for the target dir as well?

alexcrichton

alexcrichton commented on May 2, 2018

@alexcrichton
Member

So before #5458 I'm able to reproduce a spurious failure via:

$ cargo test --test testsuite profile_selection_build_all_targets

(basically reproduce #5456 (comment))

After that PR, however, I cannot reproduce a failure. Writing a test without that PR as well using the OP here it also doesn't seem to fail.

Put another way, I'm not sure if this is still observable after #5458 since panic=unwind and panic=abort have different filenames rather than being placed into the same bucket (and racing).

Now I'd definitely believe that the hard link is still nodeterministically being stomped over, but the severity of this issue is probably far less decreased after #5458. Mind confirming that though @ehuss?

ehuss

ehuss commented on May 2, 2018

@ehuss
ContributorAuthor

If we stopped hardlinking rlibs?

In my tests, other output types have the same problem. (dylib is particularly broken since it doesn't have a hash in the filename.)

Another idea I was thinking about is only hardlinking the top-level targets (essentially the targets created in generate_targets()). Do you think that would be a problem? That would skip hardlinking implied dependencies in a few cases (like cargo build --bin foo would no longer link the lib, but if you want the lib you can just add --lib, or just run cargo build with no arguments).

EDIT: Just read your comments, I'm a little behind.

ehuss

ehuss commented on May 2, 2018

@ehuss
ContributorAuthor

@alexcrichton I'm still able to repro the example up top with #5458. 5458 should only address #5445. It maybe happens about 10% of the time on my particular system.

alexcrichton

alexcrichton commented on May 2, 2018

@alexcrichton
Member

@ehuss and to confirm, by reproduce you mean that it fails spuriously?

ehuss

ehuss commented on May 2, 2018

@ehuss
ContributorAuthor
matklad

matklad commented on May 2, 2018

@matklad
Member

Hm, could this failure be related? I think hardlink stomping happens for binaries as well, because, IIRC, panic flag affects them as well?

ehuss

ehuss commented on May 2, 2018

@ehuss
ContributorAuthor

Hm, could this failure be related?

I don't think so. This issue would only appear when it attempts to compile the same lib at the same time with panic set. build --all-targets is one of the only ways to trigger this. I'll see if I can repro the appveyor error, I don't see how it should be possible.

added a commit that references this issue on May 10, 2018
27edc43
added a commit that references this issue on May 10, 2018
ef719bc
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @ehuss@alexcrichton@matklad

      Issue actions

        Race condition when trying to compile a lib multiple times for different purposes. · Issue #5444 · rust-lang/cargo