-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
As part of #5384 I created some circumstances where the same lib may be built concurrently with different settings. I thought this would be safe since they have different hashes in the filename, but I neglected to address this case when the library is hardlinked. There is a race condition in hardlink_or_copy
where one compilation will stomp on another via the hardlink.
Details for the curious.
- One compilation creates a hardlink.
- The other compilation fails to create a hardlink with EEXIST.
- 2nd compilation copies its lib into the existing hardlink, modifying the first one's contents.
This can happen in a variety of situations (mainly with panic
), but here is a small repro:
cargo new --lib stomp
cd stomp
cat >> Cargo.toml <<EOL
[profile.dev]
panic = "abort"
EOL
cat > src/main.rs <<EOL
extern crate stomp;
fn main() {}
EOL
# This will fail some of the time on some platforms with wrong panic strategy.
# Note: This requires #5384. Earlier versions would fail 100% of the time.
# This attempts to build `lib.rs` twice at the same time, once with `panic`
# and once without (for tests).
cargo build --all-targets -v
I'm trying to think of a fix, but I haven't come up with anything, yet. Generally I'm thinking that these "non-panic" dependencies should not be hard linked at all, but there's no obvious way to detect when a Unit
is one of these. We could add a flag to Unit
to detect this scenario, but that would require adding some extra de-duplication logic and generally seems a little too complex.
I don't think it is worth it to try to fix the race condition itself since that is very tricky, and would end up with a random lib in the output directory anyway.
I'll keep trying to think of a solution, but if anyone has any thoughts I'd be happy to hear.
Activity
panic
not always tracked in built dependencies #5445alexcrichton commentedon May 1, 2018
Thanks for tracking this down! This I think is the cause of the errors I saw yesterday, right?
I may be reaching a different conclusion with this though, when I use your above example I get compliations that look like:
It looks like the bug here is that the library is never compiled in a linkable format with panic=unwind so the binary, when build as a benchmark or test, doesn't have a suitable library to link against.
Now I'd definitely imagine that there's also some stomping over hard links going on here! I'm curious though what you think of the above?
ehuss commentedon May 1, 2018
It is either this or #5445 (
cargo build -p wasm-bindgen-cli --release && cargo build -p add --release
reliably triggers #5445 because thewasm-bindgen-macro
package depends on serde).What you're describing is the old behavior. With a build that includes #5384,
lib.rs
is built four times:--test
--test
for benchmarksAnd because 1 and 2 are built at the same time, and both want to hardlink at the same time, that's what causes this problem.
Can you describe what the use cases are for hardlinking libs? I'm guessing it's for integrating with other build systems? Are there other scenarios? Why are only workspace libs elevated (and not dependencies)?
alexcrichton commentedon May 1, 2018
Aha right sorry my mistake, it looks like that hasn't yet made its way into the nightlies which is why I was confused!
Currently yeah it's primarily for integrating into other build systems, but we should only link up the "primary" dependency which in this case is the one with panic=abort. I also think it's fine to actually skip hardlinking rlibs entirely, I don't think any build system integration happens at the rlib layer, only at other staticlib/executable/dylib layers.
I think that'd fix the issue, right? If we stopped hardlinking rlibs?
alexcrichton commentedon May 2, 2018
I think this may have caused this failure: #5456 (comment)
matklad commentedon May 2, 2018
+1 for hard linking less stuff. Note that #5203 specifically went through several hoops to copy only relevant files to the
--out-dir
. The logic that that PR uses is "link only top-level units, requested from the command line". Perhaps we can use the same logic for thetarget
dir as well?alexcrichton commentedon May 2, 2018
So before #5458 I'm able to reproduce a spurious failure via:
(basically reproduce #5456 (comment))
After that PR, however, I cannot reproduce a failure. Writing a test without that PR as well using the OP here it also doesn't seem to fail.
Put another way, I'm not sure if this is still observable after #5458 since panic=unwind and panic=abort have different filenames rather than being placed into the same bucket (and racing).
Now I'd definitely believe that the hard link is still nodeterministically being stomped over, but the severity of this issue is probably far less decreased after #5458. Mind confirming that though @ehuss?
ehuss commentedon May 2, 2018
In my tests, other output types have the same problem. (dylib is particularly broken since it doesn't have a hash in the filename.)
Another idea I was thinking about is only hardlinking the top-level targets (essentially the targets created in
generate_targets()
). Do you think that would be a problem? That would skip hardlinking implied dependencies in a few cases (likecargo build --bin foo
would no longer link the lib, but if you want the lib you can just add--lib
, or just runcargo build
with no arguments).EDIT: Just read your comments, I'm a little behind.
ehuss commentedon May 2, 2018
@alexcrichton I'm still able to repro the example up top with #5458. 5458 should only address #5445. It maybe happens about 10% of the time on my particular system.
alexcrichton commentedon May 2, 2018
@ehuss and to confirm, by reproduce you mean that it fails spuriously?
ehuss commentedon May 2, 2018
@alexcrichton yes
matklad commentedon May 2, 2018
Hm, could this failure be related? I think hardlink stomping happens for binaries as well, because, IIRC, panic flag affects them as well?
ehuss commentedon May 2, 2018
I don't think so. This issue would only appear when it attempts to compile the same lib at the same time with panic set.
build --all-targets
is one of the only ways to trigger this. I'll see if I can repro the appveyor error, I don't see how it should be possible.Be more conservative about which files are linked to the output dir.
Auto merge of #5460 - ehuss:conservative-link, r=alexcrichton