Skip to content

Miscompilation while working on PR #50882 #52694

@glandium

Description

@glandium
Contributor

(Moving this to a separate issue because PR #50882 is already full of noise)
Reduced STR:

  • Take current rust master
  • Apply the following patch:
diff --git a/src/liballoc/alloc.rs b/src/liballoc/alloc.rs
index 84bd275..4df7730 100644
--- a/src/liballoc/alloc.rs
+++ b/src/liballoc/alloc.rs
@@ -172,7 +172,7 @@ pub(crate) unsafe fn box_free<T: ?Sized>(ptr: Unique<T>) {
     // We do not allocate for Box<T> when T is ZST, so deallocation is also not necessary.
     if size != 0 {
         let layout = Layout::from_size_align_unchecked(size, align);
-        dealloc(ptr as *mut u8, layout);
+        Global.dealloc(NonNull::new_unchecked(ptr).cast(), layout);
     }
 }
 
  • Compile rust against the system llvm (5 or 6)
  • Run libstd tests (x.py test src/libstd --stage 1)

What happens then is that sync::once::tests::wait_for_force_to_finish fails with:

thread '<unnamed>' panicked at 'assertion failed: t1.join().is_ok()', libstd/sync/once.rs:582:9

The disassembly for wait_for_force_to_finish only contains one call to std::thread::JoinHandle::join, instead of the expected two when the code is not miscompiled. That call is followed by a test that jumps to a panic when the result of JoinHandle::join is ... Ok(()):

   93371:       e8 aa fd 07 00          callq  113120 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT
$4join17h0fb3b129ec2e38aeE>
   93376:       48 89 c3                mov    %rax,%rbx
   93379:       49 89 d7                mov    %rdx,%r15
   9337c:       48 85 db                test   %rbx,%rbx
   9337f:       74 1b                   je     9339c <_ZN3std4sync4once5tests24wait_for_force_to_fin
ish17h6199051fcaa3ff6aE+0x21c>

That JoinHandle::join returns a Result<(), Box<Any>>, and Ok(()) is represented as (0, whatever). The destination of that jump is the panic code.

At some point, I'm not entirely sure with what state of the tree, it was even better. The error was:

thread '<unnamed>' panicked at 'assertion failed: t2.join().is_ok()', libstd/sync/once.rs:583:9

And there were two calls to std::thread::JoinHandle::join, as expected, but they didn't have the same result handling:

   93c79:       e8 32 c5 07 00          callq  1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
   93c7e:       48 89 c3                mov    %rax,%rbx
   93c81:       49 89 d7                mov    %rdx,%r15
   93c84:       48 85 db                test   %rbx,%rbx
   93c87:       75 5e                   jne    93ce7 <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x267>
(snip)
   93c9f:       e8 0c c5 07 00          callq  1101b0 <_ZN41_$LT$std..thread..JoinHandle$LT$T$GT$$GT$4join17hc6e7f9bb7d72483aE>
   93ca4:       48 89 c3                mov    %rax,%rbx
   93ca7:       49 89 d7                mov    %rdx,%r15
   93caa:       48 85 db                test   %rbx,%rbx
   93cad:       74 1b                   je     93cca <_ZN3std4sync4once5tests24wait_for_force_to_finish17h13c0ef8dd5eb6a3aE+0x24a>

The destination of both jumps is panic code. The first jump, corresponding to t1.join().is_ok() is correct, and the second, corresponding to t2.join().is_ok() is broken, thus the test failure.

Even better: this doesn't happen when compiling with the bundled llvm. It also doesn't happen when extracting the test from libstd and compiling with a faulty compiler. It seems the fact that it's part of libstd, and that most of libstd is compiled along the test, plays a role.

I'm trying to get the corresponding mir and llvm-ir.

Activity

glandium

glandium commented on Jul 25, 2018

@glandium
ContributorAuthor

There are, interestingly, tons of differences between the mir between a build that works, and a build that doesn't. Both with rust master + the patch quoted above, the only difference being one was built with system llvm 5, and the other with the bundled llvm. I'm puzzled.

glandium

glandium commented on Jul 25, 2018

@glandium
ContributorAuthor

Some more data:

  • I checked out a one-month-old commit, around the time I did the last push to PR Associate an allocator to boxes #50882 that didn't hit the error, applied the same patch, and tried again with system llvm 5. It was similarly broken.
  • But back then, travis builds were using system llvm 3.9, so I tried that too, and that was not broken.

So this could very well be a case of llvm versions being broken (or compiler integration of said versions).

glandium

glandium commented on Jul 25, 2018

@glandium
ContributorAuthor

Even more data:
- With static system llvm trunk built manually: not broken
- With static system llvm 6.0 built manually: not broken
- With shared system llvm 6.0 built manually: not broken
- With shared system llvm 5.0 built manually: not broken

All the above built with clang 5.0. So... tried again with GCC 5.4:
- With shared system llvm 5.0 built manually: not broken

So... this is only happening with Ubuntu builds of llvm (and presumably Debian)?
@sylvestre do you know what's special about them?

Ok, It's too late and I'm tired, all those last attempts were without the patch applied, so they don't count. I'll try again tomorrow.

sylvestre

sylvestre commented on Jul 25, 2018

@sylvestre
Contributor

Not sure. Some ideas:

Ericson2314

Ericson2314 commented on Jul 25, 2018

@Ericson2314
Contributor

This reminds me that it would nice to be able to duplicate/"virtualize" lang items for testing purposes, so the old and new standard library (fragments) are completely separated. I should open an issue door that.

glandium

glandium commented on Jul 25, 2018

@glandium
ContributorAuthor

So, restarting afresh:

  • With shared system llvm 5.0 built manually with clang 5: broken
  • With shared system llvm 6.0 built manually with clang 5: broken
  • With shared system llvm trunk built manually with clang 5: not broken

So at least this is consistent with the initial results with ubuntu system llvm and bundled llvm. Time for a bisect.

glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

Bisection identified llvm-mirror/llvm@a57b9df as having fixed this. Which leads me to more questions:

  • what does that have anything to do?
  • are there more things that might be miscompiled than this one libstd testcase?
  • what's the path forward here? Apparently, system llvm 5 and 6 can miscompile for whatever reason, and that's what some downstreams (Linux distros, BSDs, ...) would tend to use...
eddyb

eddyb commented on Jul 26, 2018

@eddyb
Member

The LLVM patch needs to be backported officially to new 5.* and 6.* patch versions if it fixes miscopilations IMO.

sylvestre

sylvestre commented on Jul 26, 2018

@sylvestre
Contributor

I don't think a new dot release of 5 will happen, 6 is also unlikely.
However, I don't mind taking the patch to debian & ubuntu packages.

Having an upstream bug would help also.

glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

Some more notes:

  • the upstream patch applies cleanly on llvm 6, and fixes the miscompilation. On llvm 5, it has only a few straightforward conflicts, but fails to compile. It looks like it depends on at least llvm-mirror/llvm@f39b8ab, which has tons of conflicts.
  • beta is using llvm 6, but for some reason, it's not affected. Well, it may actually be affected, but it's not visible on this testcase. Who knows.
  • travis builds for pull requests are using system llvm 5. Those are essentially never going to be fixed if they stay on that version. What do we do about that?
glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

Needless to say, #50882 is blocked until the latter is addressed in some way.

sylvestre

sylvestre commented on Jul 26, 2018

@sylvestre
Contributor
glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

It's not really a matter of what travis provides, it's a matter of what minimal version of system llvm rustc supports, and as of #51899, it's llvm 5.

glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

Interestingly, if the minimal supported version was still 3.9, this would have gone undetected.

glandium

glandium commented on Jul 26, 2018

@glandium
ContributorAuthor

Guess why beta is not affected? Because rust-lang/llvm#106 applied llvm-mirror/llvm@a57b9df

20 remaining items

Loading
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    A-LLVMArea: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @eddyb@nikic@sylvestre@glandium@Ericson2314

        Issue actions

          Miscompilation while working on PR #50882 · Issue #52694 · rust-lang/rust