Closed
Description
Hi all,
my usual test: Can Rust compile itself after a compiler upgrade.
Rust 1.43 dies at this stage:
Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
running: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
error: failed to run `rustc` to learn about target-specific information
Caused by:
process didn't exit successfully: `/datastore/rpmbuild/BUILD/rustc-1.43.0-src/build/bootstrap/debug/rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)
--- stderr
error: character literal may only contain one codepoint
--> <anon>:3:35
|
3 | messages will be sent to 'stderr'.
| ^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
|
3 | messages will be sent to "stderr".
| ^^^^^^^^
error: character literal may only contain one codepoint
--> <anon>:6:35
|
6 | messages will be sent to 'stderr'.
| ^^^^^^^^
|
help: if you meant to write a `str` literal, use double quotes
|
6 | messages will be sent to "stderr".
| ^^^^^^^^
error: unterminated character literal
--> <anon>:9:20
|
9 | FATAL: Cannot open '/dev/null' for writing.
| ^
error: aborting due to 3 previous errors
command did not execute successfully: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101
I didn't identify the problematic code yet, I just wanted to create a tracking issue first.
Rust 1.43 can compile other crates fine, I quickly tested rust-zip or rust-askama.
Activity
jonas-schievink commentedon Apr 24, 2020
Do you need this for anything? This is not a supported configuration
thomasjfox commentedon Apr 24, 2020
yes, Linux distributions often need to recompile a (compiler) package with some patches applied or after a major LLVM / glibc / other major update.
See #69953 for the discussion on the Rust 1.42.0 release.
infinity0 commentedon Apr 24, 2020
Really? I thought this has been an attempted guarantee since years ago. (Even though it was broken for various previous releases.) Debian certainly needs this guarantee and I will have to figure out how to fix this when packaging 1.43 for Debian.
This one seems like a new case not similar to the previous cases, gah.
jonas-schievink commentedon Apr 24, 2020
Well, we don't test this at all, so it's not surprising that it breaks almost every release. Why can't Debian just use the previous release to bootstrap?
infinity0 commentedon Apr 24, 2020
We do, but the porters (for tier-2 architectures like riscv64) and other people occasionally need to rebuild stuff afterwards, when the old version is no longer available (has been moved to archives).
Mark-Simulacrum commentedon Apr 24, 2020
It's not really unsupported so much as not tested. I would ask that if Debian (or others) want this to work we find out before the stable release -- ideally when beta is cut, or earlier -- that way we can do much more to fix it.
I'll try to take a look at this specific failure soon and see if there's a simple patch.
est31 commentedon Apr 24, 2020
Bootstrapping rustc twice would make merging PRs much slower than it's today. However, maybe one could add it only to the beta branch's CI. Or, alternatively, add rustc to crater. The "old state" would serve as check whether it can bootstrap normally, the "new state" would check whether it can bootstrap itself. There's a proposal to add servo already: rust-lang/crater#133
Mark-Simulacrum commentedon Apr 24, 2020
I don't think there's been any proposal that we test this in PR CI? To be clear, I don't currently believe that's something we want to do, though we'd generally be willing to accept PRs that make self-bootstrapping possible, which is why I suggested testing on beta releases (instead of stable releases, where we can't do much so all patches must be downstream).
est31 commentedon Apr 24, 2020
It wasn't in response to any proposal or such, just about how to stop regressions like this from happening. It has been mentioned in the thread that it's not being tested. So naturally one would wonder how the testing would look like so I wrote down my thoughts.
Also, yes, optimally the users would test beta releases (question to @thomasjfox have you tested 1.44 beta yet?), and I've filed a pr for rav1e today to do precisely this, but the entire premise of CI as well as crater is to be proactive about breakage.
infinity0 commentedon Apr 24, 2020
Understood, I will make some more effort to test the beta releases! We had done so a few times in the past but not really made it a regular part of the process.
infinity0 commentedon Apr 24, 2020
I agree it would be too onerous for regular PRs, but perhaps it could be done when anointing nightly as beta, and then on all PRs to beta, that sounds not too expensive.
cuviper commentedon Apr 27, 2020
@thomasjfox
Have you found this yet? Maybe something wrong with your
/dev/null
? I also don't see "messages will be sent to 'stderr'" anywhere in the Rust sources, but googling indicates this often comes from syslog. I don't know why that would be fed into yourrustc
stdin though...FWIW, the automatic rebuild in Fedora rawhide was just fine:
https://koji.fedoraproject.org/koji/taskinfo?taskID=43860983
thomasjfox commentedon Apr 28, 2020
Yesterday evening I continued with the investigation. I managed to recompile the Fedora 31 source rpm on my Fedora 31 workstation. So it must be something triggered by my "custom distro" setup.
The plan for today is to limit the number of parallel build jobs to one and run "forkstat" in the background. I want to isolate the exact command invocation that triggers the error.
The /dev/null device should be fine, the build process is not running inside a container or chroot.
thomasjfox commentedon Apr 28, 2020
I think we might be onto something here. After I had another failed Rust 1.43.0 re-compile attempt, I noticed /dev/null was broken:
(tmux crashed on startup, that's how I noticed)
I've rebooted the devbox, /dev/null is fine again and I will now compile the previous Rust 1.42.0 twice. If /dev/null stays fine, I'll re-try with Rust 1.43.0 and see if /dev/null gets replaced.
thomasjfox commentedon Apr 28, 2020
the only remotely related change to /dev/null handling from 1.42.0 to 1.43.0 is this c2bbe33
in src/libstd/sys/unix/process/process_common.rs. It looks totally unsuspicious to me.
cuviper commentedon Apr 28, 2020
That kind of error usually occurs when one uses something like
-o /dev/null
, versus redirecting>/dev/null
, since-o
usually writes temp names first and then renames over the target. I did find this in one test, new in 1.42 via #67458:rust/src/test/ui/non-ice-error-on-worker-io-fail.rs
Line 8 in b7bd7c1
cuviper commentedon Apr 28, 2020
When I run that test manually as root, I do indeed get a clobbered
/dev/null
:Mark-Simulacrum commentedon Apr 28, 2020
Running tests as root is indeed very trusting :)
But it seems like in this case we should probably write to e.g. /dev/rust-nonexistent-path instead of null to avoid the accidental breakage? Alternatively maybe we can move this test to a make test and chmod our permissions away...
cuviper commentedon Apr 28, 2020
I don't know if a nonexistent path will exercise the intended failure mode, but a simple chmod is insufficient to stop root.
thomasjfox commentedon Apr 28, 2020
There's the immutable xattr, even root can't delete those files. This is how Android malware persists across factory resets:
https://arstechnica.com/information-technology/2020/04/solved-how-android-backdoor-called-xhelper-survives-factory-resets/
But requiring filesystem xattrs just for one unit test is a no no.
Another option would be a read only bind mount, but again, the setup is overkill.
May be a read only place in proc like /proc/version would do:
I've started a build without the unit test in question for the night, results will be in tomorrow.
thomasjfox commentedon Apr 29, 2020
with the mentioned unit test disabled, I can finally build Rust 1.43.0 with itself. Rustception all the way, thanks!
I've verified again it didn't blow up with Rust 1.42.0, even though the unit test was part of it already.
May be the way rustc handles output file creation changed in 1.43.0, so these two in combination led to failure.
I've tested the /proc/version approach:
Using an existing directory as output file gives a different error message:
cuviper commentedon Apr 29, 2020
I think
-o /does-not-exist/output
will be fine, even though it's a different IO error. It still triggers the old ICE on 1.41.0, and is caught as a proper fatal error on 1.42+, even as root.Mark-Simulacrum commentedon Apr 30, 2020
I'd be happy to review a PR here, FWIW, it sounds like it should be a pretty simple update to the UI test.
Rollup merge of rust-lang#71782 - cuviper:leave-dev-null-alone, r=Mar…
thomasjfox commentedon May 15, 2020
Just wanted to say thank you for the quick fix and happy birthday to Rust!
Also Rust 1.43.1 compiled itself without problems.