Search code, repositories, users, issues, pull requests...

Hi all,

my usual test: Can Rust compile itself after a compiler upgrade.
Rust 1.43 dies at this stage:

Building stage0 compiler artifacts (x86_64-unknown-linux-gnu -> x86_64-unknown-linux-gnu)
running: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
error: failed to run `rustc` to learn about target-specific information

Caused by:
  process didn't exit successfully: `/datastore/rpmbuild/BUILD/rustc-1.43.0-src/build/bootstrap/debug/rustc - --crate-name ___ --print=file-names --crate-type bin --crate-type rlib --crate-type dylib --crate-type cdylib --crate-type staticlib --crate-type proc-macro --print=sysroot --print=cfg` (exit code: 1)

--- stderr
error: character literal may only contain one codepoint
 --> <anon>:3:35
  |
3 |          messages will be sent to 'stderr'.
  |                                   ^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
3 |          messages will be sent to "stderr".
  |                                   ^^^^^^^^

error: character literal may only contain one codepoint
 --> <anon>:6:35
  |
6 |          messages will be sent to 'stderr'.
  |                                   ^^^^^^^^
  |
help: if you meant to write a `str` literal, use double quotes
  |
6 |          messages will be sent to "stderr".
  |                                   ^^^^^^^^

error: unterminated character literal
 --> <anon>:9:20
  |
9 | FATAL: Cannot open '/dev/null' for writing.
  |                    ^

error: aborting due to 3 previous errors


command did not execute successfully: "/datastore/dev/rust/bin/cargo" "build" "--target" "x86_64-unknown-linux-gnu" "-Zbinary-dep-depinfo" "-j" "32" "--release" "--frozen" "--features" " llvm" "--manifest-path" "/datastore/rpmbuild/BUILD/rustc-1.43.0-src/src/rustc/Cargo.toml" "--message-format" "json-render-diagnostics"
expected success, got: exit code: 101

I didn't identify the problematic code yet, I just wanted to create a tracking issue first.
Rust 1.43 can compile other crates fine, I quickly tested rust-zip or rust-askama.

added

Contributor

Do you need this for anything? This is not a supported configuration

Author

Do you need this for anything? This is not a supported configuration

yes, Linux distributions often need to recompile a (compiler) package with some patches applied or after a major LLVM / glibc / other major update.

See #69953 for the discussion on the Rust 1.42.0 release.

Contributor

This is not a supported configuration

Really? I thought this has been an attempted guarantee since years ago. (Even though it was broken for various previous releases.) Debian certainly needs this guarantee and I will have to figure out how to fix this when packaging 1.43 for Debian.

This one seems like a new case not similar to the previous cases, gah.

jonas-schievink

Contributor

Well, we don't test this at all, so it's not surprising that it breaks almost every release. Why can't Debian just use the previous release to bootstrap?

Contributor

We do, but the porters (for tier-2 architectures like riscv64) and other people occasionally need to rebuild stuff afterwards, when the old version is no longer available (has been moved to archives).

Member

It's not really unsupported so much as not tested. I would ask that if Debian (or others) want this to work we find out before the stable release -- ideally when beta is cut, or earlier -- that way we can do much more to fix it.

I'll try to take a look at this specific failure soon and see if there's a simple patch.

est31

Member

Bootstrapping rustc twice would make merging PRs much slower than it's today. However, maybe one could add it only to the beta branch's CI. Or, alternatively, add rustc to crater. The "old state" would serve as check whether it can bootstrap normally, the "new state" would check whether it can bootstrap itself. There's a proposal to add servo already: rust-lang/crater#133

Member

I don't think there's been any proposal that we test this in PR CI? To be clear, I don't currently believe that's something we want to do, though we'd generally be willing to accept PRs that make self-bootstrapping possible, which is why I suggested testing on beta releases (instead of stable releases, where we can't do much so all patches must be downstream).

est31

Member

I don't think there's been any proposal that we test this in PR CI?

It wasn't in response to any proposal or such, just about how to stop regressions like this from happening. It has been mentioned in the thread that it's not being tested. So naturally one would wonder how the testing would look like so I wrote down my thoughts.

Also, yes, optimally the users would test beta releases (question to @thomasjfox have you tested 1.44 beta yet?), and I've filed a pr for rav1e today to do precisely this, but the entire premise of CI as well as crater is to be proactive about breakage.

Contributor

Understood, I will make some more effort to test the beta releases! We had done so a few times in the past but not really made it a regular part of the process.

Contributor

I don't think there's been any proposal that we test this in PR CI?

It wasn't in response to any proposal or such, just about how to stop regressions like this from happening.

I agree it would be too onerous for regular PRs, but perhaps it could be done when anointing nightly as beta, and then on all PRs to beta, that sounds not too expensive.

Member

@thomasjfox

I didn't identify the problematic code yet, I just wanted to create a tracking issue first.

Have you found this yet? Maybe something wrong with your /dev/null? I also don't see "messages will be sent to 'stderr'" anywhere in the Rust sources, but googling indicates this often comes from syslog. I don't know why that would be fed into your rustc stdin though...

FWIW, the automatic rebuild in Fedora rawhide was just fine:
https://koji.fedoraproject.org/koji/taskinfo?taskID=43860983

Author

Yesterday evening I continued with the investigation. I managed to recompile the Fedora 31 source rpm on my Fedora 31 workstation. So it must be something triggered by my "custom distro" setup.

The plan for today is to limit the number of parallel build jobs to one and run "forkstat" in the background. I want to isolate the exact command invocation that triggers the error.

The /dev/null device should be fine, the build process is not running inside a container or chroot.

Author

Have you found this yet? Maybe something wrong with your /dev/null?

I think we might be onto something here. After I had another failed Rust 1.43.0 re-compile attempt, I noticed /dev/null was broken:

# ll /dev/null 
-rw-r--r-- 1 root root 0 Apr 28 18:14 /dev/null

(tmux crashed on startup, that's how I noticed)

I've rebooted the devbox, /dev/null is fine again and I will now compile the previous Rust 1.42.0 twice. If /dev/null stays fine, I'll re-try with Rust 1.43.0 and see if /dev/null gets replaced.

Author

the only remotely related change to /dev/null handling from 1.42.0 to 1.43.0 is this c2bbe33
in src/libstd/sys/unix/process/process_common.rs. It looks totally unsuspicious to me.

rust/src/test/ui/non-ice-error-on-worker-io-fail.rs

Member

That kind of error usually occurs when one uses something like -o /dev/null, versus redirecting >/dev/null, since -o usually writes temp names first and then renames over the target. I did find this in one test, new in 1.42 via #67458:

Line 8 in b7bd7c1

// compile-flags: -o /dev/null

Member

When I run that test manually as root, I do indeed get a clobbered /dev/null:

# rustc -o /dev/null src/test/ui/non-ice-error-on-worker-io-fail.rs
warning: unused import: `crate::task::Waker`
  --> src/test/ui/non-ice-error-on-worker-io-fail.rs:34:13
   |
34 |         use crate::task::Waker;
   |             ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

# ls -l /dev/null
-rw-r--r--. 1 root root 3562 Apr 28 09:48 /dev/null

Member

Running tests as root is indeed very trusting :)

But it seems like in this case we should probably write to e.g. /dev/rust-nonexistent-path instead of null to avoid the accidental breakage? Alternatively maybe we can move this test to a make test and chmod our permissions away...

Member

I don't know if a nonexistent path will exercise the intended failure mode, but a simple chmod is insufficient to stop root.

Author

There's the immutable xattr, even root can't delete those files. This is how Android malware persists across factory resets:
https://arstechnica.com/information-technology/2020/04/solved-how-android-backdoor-called-xhelper-survives-factory-resets/

But requiring filesystem xattrs just for one unit test is a no no.
Another option would be a read only bind mount, but again, the setup is overkill.

May be a read only place in proc like /proc/version would do:

[root@host ~]# rm -f /proc/version 
rm: cannot remove '/proc/version': Operation not permitted

[root@host ~]# echo "test" >>/proc/version 
-bash: echo: write error: Input/output error

I've started a build without the unit test in question for the night, results will be in tomorrow.

Author

I've started a build without the unit test in question for the night, results will be in tomorrow.

with the mentioned unit test disabled, I can finally build Rust 1.43.0 with itself. Rustception all the way, thanks!

I've verified again it didn't blow up with Rust 1.42.0, even though the unit test was part of it already.
May be the way rustc handles output file creation changed in 1.43.0, so these two in combination led to failure.

I've tested the /proc/version approach:

[root@host]# rustc -o /proc/version non-ice-error-on-worker-io-fail.rs 
warning: unused import: `crate::task::Waker`
  --> non-ice-error-on-worker-io-fail.rs:34:13
   |
34 |         use crate::task::Waker;
   |             ^^^^^^^^^^^^^^^^^^
   |
   = note: `#[warn(unused_imports)]` on by default

error: failed to write bytecode to /proc/version.non_ice_error_on_worker_io_fail.3a1fbbbh-cgu.0.rcgu.bc.z: No such file or directory (os error 2)

error: aborting due to previous error

Using an existing directory as output file gives a different error message:

[root@host]# rustc -o existing-dir-instead-of-file non-ice-error-on-worker-io-fail.rs 
error: the generated executable for the input file "non-ice-error-on-worker-io-fail.rs" conflicts with the existing directory "existing-dir-instead-of-file"

error: aborting due to previous error

Member

I think -o /does-not-exist/output will be fine, even though it's a different IO error. It still triggers the old ICE on 1.41.0, and is caught as a proper fatal error on 1.42+, even as root.