Skip to content

[LLVM 6.0] core::fmt::Debug implementations causing 'error: ran out of registers' #95

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dylanmckay opened this issue Mar 7, 2018 · 14 comments
Labels
A-llvm Affects the LLVM AVR backend has-reduced-testcase A small LLVM IR file exists that demonstrates the problem help wanted This is a good candidate issue to fix

Comments

@dylanmckay
Copy link
Member

dylanmckay commented Mar 7, 2018

A large number of core::fmt::Debug::fmt implementations are causing 'ran out of registers' error. This problem has come up in the past (avr-llvm/llvm#1), but that generally happens with quite complex code. Even fairly trivial fmt functions are causing this error.

This is the error message

LLVM ERROR: ran out of registers during register allocation
error: Could not compile `core`.

Here is the reproduction code

; ModuleID = 'bugpoint-reduced-simplified.bc'
source_filename = "bugpoint-output-81150bd.bc"
target datalayout = "e-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
target triple = "avr-unknown-unknown"

%"fmt::Formatter.1.77.153.229.305.381.1673" = type { [0 x i8], i32, [0 x i8], i32, [0 x i8], i8, [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], { {}*, {}* }, [0 x i8], { i8*, i8* }, [0 x i8], { [0 x { i8*, i8* }]*, i16 }, [0 x i8] }
%"option::Option<usize>.0.76.152.228.304.380.1672" = type { [0 x i8], i8, [2 x i8] }

@str.4S = external constant [5 x i8]

; Function Attrs: uwtable
define void @"_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"(%"fmt::Formatter.1.77.153.229.305.381.1673"* dereferenceable(27) %__arg_0) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
start:
  %0 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 0
  %1 = load {}*, {}** %0, align 1, !noalias !0, !nonnull !9
  %2 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 1
  %3 = bitcast {}** %2 to i1 ({}*, [0 x i8]*, i16)***
  %4 = load i1 ({}*, [0 x i8]*, i16)**, i1 ({}*, [0 x i8]*, i16)*** %3, align 1, !noalias !0, !nonnull !9
  %5 = getelementptr inbounds i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %4, i16 3
  %6 = load i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %5, align 1, !invariant.load !9, !noalias !0, !nonnull !9
  %7 = tail call zeroext i1 %6({}* nonnull %1, [0 x i8]* noalias nonnull readonly bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5), !noalias !10
  unreachable
}

declare i32 @rust_eh_personality(...) unnamed_addr

attributes #0 = { uwtable }

!0 = !{!1, !3, !5, !6, !8}
!1 = distinct !{!1, !2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E: %data.0"}
!2 = distinct !{!2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E"}
!3 = distinct !{!3, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: argument 0"}
!4 = distinct !{!4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E"}
!5 = distinct !{!5, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: %name.0"}
!6 = distinct !{!6, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: argument 0"}
!7 = distinct !{!7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E"}
!8 = distinct !{!8, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: %name.0"}
!9 = !{}
!10 = !{!3, !6}

I have also attached the full libcore LLVM IR that generates this error.

Here is what the function looks like during register allocation when the error is actually hit.

# Machine code for function _ZN65_$LT$lib..sync..atomic..AtomicBool$u20$as$u20$lib..fmt..Debug$GT$3fmt17hd605133ba71344ebE: NoPHIs, TracksLiveness
Frame Objects:
  fi#0: size=1, align=1, at location [SP+2]
  fi#1: size=6, align=1, at location [SP+2]
Function Live Ins: $r25r24 in %11, $r23r22 in %12

bb.0.start:
  successors: %bb.2(0x40000000), %bb.1(0x40000000); %bb.2(200.00%), %bb.1(200.00%)
  liveins: $r25r24, $r23r22
  %95:ptrdispregs = COPY $r23r22
  %92:dregs = COPY $r25r24
  early-clobber %13:dregs = LDDWRdPtrQ %95:ptrdispregs, 15; mem:LD2[%1](align=1)(noalias=!82,!84,!86,!87,!89)(dereferenceable)
  %96:dregs = COPY %95:ptrdispregs
  %97:ptrdispregs = COPY %96:dregs
  early-clobber %18:ptrdispregs = LDDWRdPtrQ %97:ptrdispregs, 17; mem:LD2[%4](align=1)(noalias=!82,!84,!86,!87,!89)(dereferenceable)
  early-clobber %17:dregs = LDDWRdPtrQ %18:ptrdispregs, 6; mem:LD2[%6](align=1)(noalias=!82,!84,!86,!87,!89)(invariant)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %19:dldregs = LDIWRdK @str.w
  %20:dldregs = LDIWRdK 10
  $r25r24 = COPY %13:dregs
  $r23r22 = COPY %19:dldregs
  $r21r20 = COPY %20:dldregs
  $r31r30 = COPY %17:dregs

  ICALL $r31r30, $r25r24, $r23r22, $r21r20, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r1
4 $r17r16 $r29r28>, implicit $sp, implicit $r31r30, implicit-def $sp, implicit-def $r24

  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %21:gpr8 = COPY $r24
  STDWPtrQRr %stack.1._6, 0, %96:dregs; mem:ST2[%9](align=1)(alias.scope=!84,!87)(noalias=!86,!89)
  STDPtrQRr %stack.1._6, 2, %21:gpr8; mem:ST1[%10](alias.scope=!84,!87)(noalias=!86,!89)
  %22:dldregs = LDIWRdK 0
  STDWPtrQRr %stack.1._6, 3, %22:dldregs; mem:ST2[%12](align=1)(alias.scope=!84,!87)(noalias=!86,!89)
  %23:ld8 = LDIRdK 0
  STDPtrQRr %stack.1._6, 5, %23:ld8; mem:ST1[%13](alias.scope=!84,!87)(noalias=!86,!89)
  %93:ptrregs = COPY %92:dregs
  %24:ld8 = AtomicLoad8 %93:ptrregs; mem:Volatile LD1[%self]
  %83:ld8 = LDIRdK 1
  CPIRdK %24:ld8, 0, implicit-def $sreg
  BRNEk %bb.1, implicit killed $sreg
  RJMPk %bb.2

@shepmaster suspects that:

My underlying suspicion is that the format struct is simply huge and tries to be passed in registers in release mode. In debug mode, it's behind some pointer indirection. I haven't verified this at all.

core-noregs.ll.txt

@dylanmckay
Copy link
Member Author

dylanmckay commented Mar 7, 2018

In the case of _ZN65_$LT$lib..sync..atomic..AtomicBool$u20$as$u20$lib..fmt..Debug$GT$3fmt17hd605133ba71344ebE

Here is the live interval that it is failing on

#0  llvm::RegAllocBase::allocatePhysRegs (this=0x555555c4f638) at /home/dylan/projects/llvm-project/llvm/lib/CodeGen/RegAllocBase.cpp:131
131             report_fatal_error("ran out of registers during register allocation");
(gdb) print VirtReg->dump()
%97 [76r,92r:0)  0@76r weight:INF
$8 = void

That variable is only live over the slot range 76:92. This corresponds to these instructions (from RegAllocGreedy::Indexes->dump()).

Look for the rows 76-92 inclusive.

0
16 %95:ptrdispregs = COPY $r23r22
32 %92:dregs = COPY $r25r24
48
64 early-clobber %13:dregs = LDDWRdPtrQ %95:ptrdispregs, 15; mem:LD2[%1](align=1)(noalias=!82,!84,!86,!87,!89)(dereferenceable)
72 %96:dregs = COPY %95:ptrdispregs
76 %97:ptrdispregs = COPY %96:dregs
84
92 early-clobber %18:ptrdispregs = LDDWRdPtrQ %97:ptrdispregs, 17; mem:LD2[%4](align=1)(noalias=!82,!84,!86,!87,!89)(dereferenceable)
96
112 early-clobber %17:dregs = LDDWRdPtrQ %18:ptrdispregs, 6; mem:LD2[%6](align=1)(noalias=!82,!84,!86,!87,!89)(invariant)
128 ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
144 %19:dldregs = LDIWRdK @str.w

@dylanmckay
Copy link
Member Author

I suspect that this is not actually related to the format struct being passed fully in registers, because that logic is fully decided by the calling convention, and LLVM would not be able to "optimize" this because it'd break the ABI.

@dylanmckay
Copy link
Member Author

Note that the ICALL instruction itself does not take any operands - all of its inputs and outputs are implicit.

ICALL always executes the function pointed to by r31:r30. This function probably also has a frame pointer because the backend currently makes any function that has arguments on the stack have r29:r28 dedicated to the frame pointer.

This problem could simply be ordinary register pressure, combined with avr-llvm/llvm#1.

Surprising that this didn't occur on the previous LLVM version.

@dylanmckay
Copy link
Member Author

I found an oldish LLVM build on my computer, built on Oct 5th 2017

The binaries were compiled at this commit

commit 0149e8bd35f09463404affe879aac2e1466a6bd6 (HEAD -> avr-rust-llvm-release-4-0-1, avr-rust/avr-rust-llvm-release-4-0-1)
Author: Dylan McKay <[email protected]>
Date:   Sun Nov 12 23:10:52 2017 +1300

    [LOCAL] Relax unaligned access assertion when type is byte aligned

    This commit has been cherry-picked from upstream LLVM review D39946.
    Once that patch lands in LLVM trunk, we should revert this commit and
    cherry-pick the official one.

    Original message:
    -----------------

    This relaxes an assertion inside SelectionDAGBuilder which is overly
    restrictive on targets which have no concept of alignment (such as AVR).

    In these architectures, all types are aligned to 8-bits.

    After this, LLVM will only assert that accesses are aligned on targets
    which actually require alignment.

Note the "avr-rust-llvm-release-4-0-1". This is the current version avr-rust/rust uses.

I've ran the same testcase in the description, and it too failed on this piece of code. This means that whatever AVR backend bug is being triggered, it has existed since at least LLVM 4.0.

I wonder if current Rust itself is generating different LLVM IR for this function than it used to. That would explain why this has only come up now.

@dylanmckay
Copy link
Member Author

dylanmckay commented Mar 7, 2018

The LLVM IR calls a function via a function pointer derived from the fmt::Formatter argument. This is probably dynamic dispatch reading from the vtable.

define void @"_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"(%"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0) {
start:
  %0 = getelementptr %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 0
  %1 = load {}*, {}** %0, align 1
  %2 = getelementptr %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 1
  %3 = bitcast {}** %2 to i1 ({}*, [0 x i8]*, i16)***
  %4 = load i1 ({}*, [0 x i8]*, i16)**, i1 ({}*, [0 x i8]*, i16)*** %3, align 1
  %5 = getelementptr i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %4, i16 3
  %6 = load i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %5, align 1
  ; %7 = call i1 %6({}* %1, [0 x i8]* bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5)
  %7 = call i1 @foobar({}* %1, [0 x i8]* bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5)
  unreachable
}

The fmt::Debug implementation for str::Chars is automatically #[derive]d.

The expanded implementation looks like this

#[stable(feature = "rust1", since = "1.0.0")]
pub struct Chars<'a> {
  iter: slice::Iter<'a, u8>,
}

#[automatically_derived]
#[allow(unused_qualifications)]
#[stable(feature = "rust1", since = "1.0.0")]
impl <'a> ::fmt::Debug for Chars<'a> {
  fn fmt(&self, __arg_0: &mut ::fmt::Formatter) -> ::fmt::Result {
    match *self {
      Chars { iter: ref __self_0_0 } => {
        let mut builder = __arg_0.debug_struct("Chars");
        let _ = builder.field("iter", &&(*__self_0_0));
        builder.finish()
      }
    }
  }
}

@dylanmckay
Copy link
Member Author

Here is the code prior to register allocation with optimisations

/bin/llc -march=avr -mcpu=atmega328p ran-out-of-regs.ll  -o /dev/null -O2 -print-before-all 2>&1|less
# *** IR Dump Before Greedy Register Allocator ***:
# Machine code for function _ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E: NoPHIs, TracksLiveness
Function Live Ins: $r25r24 in %0

0B      bb.0.start:
          liveins: $r25r24
16B       %2:ptrdispregs = COPY $r25r24
48B       early-clobber %4:ptrdispregs = LDDWRdPtrQ %2:ptrdispregs, 17; mem:LD2[%3](align=1)(noalias=!1,!3,!5,!6,!8)(dereferenceable)
80B       early-clobber %3:dregs = LDDWRdPtrQ %4:ptrdispregs, 6; mem:LD2[%5](align=1)(noalias=!1,!3,!5,!6,!8)(invariant)
96B       early-clobber %5:dregs = LDDWRdPtrQ %2:ptrdispregs, 15; mem:LD2[%0](align=1)(noalias=!1,!3,!5,!6,!8)(dereferenceable)
112B      ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
128B      %7:dldregs = LDIWRdK @str.4S
144B      %8:dldregs = LDIWRdK 5
160B      $r25r24 = COPY %5:dregs
176B      $r23r22 = COPY %7:dldregs
192B      $r21r20 = COPY %8:dldregs
208B      $r31r30 = COPY %3:dregs
224B      ICALL killed $r31r30, $r25r24, killed $r23r22, killed $r21r20, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r31r30, implicit-def $sp, implicit-def dead $r24
240B      ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp

# End machine code for function _ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E.

Here is the code prior to register allocation with no optimisations

/bin/llc -march=avr -mcpu=atmega328p ran-out-of-regs.ll  -o /dev/null -O2 -print-before-all 2>&1|less
# *** IR Dump Before Fast Register Allocator ***:
# Machine code for function _ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E: NoPHIs, TracksLiveness
Function Live Ins: $r25r24 in %0

bb.0.start:
  liveins: $r25r24
  %0:dregs = COPY $r25r24
  %1:dregs = COPY %0:dregs
  %3:ptrdispregs = COPY %0:dregs
  early-clobber %2:dregs = LDDWRdPtrQ %3:ptrdispregs, 15; mem:LD2[%0](align=1)(noalias=!1,!3,!5,!6,!8)(dereferenceable)
  %5:ptrdispregs = COPY %0:dregs
  early-clobber %4:dregs = LDDWRdPtrQ %5:ptrdispregs, 17; mem:LD2[%3](align=1)(noalias=!1,!3,!5,!6,!8)(dereferenceable)
  %7:ptrdispregs = COPY %4:dregs
  early-clobber %6:dregs = LDDWRdPtrQ killed %7:ptrdispregs, 6; mem:LD2[%5](align=1)(noalias=!1,!3,!5,!6,!8)(invariant)
  ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %8:dldregs = LDIWRdK @str.4S
  %9:dldregs = LDIWRdK 5
  $r25r24 = COPY %2:dregs
  $r23r22 = COPY %8:dldregs
  $r21r20 = COPY %9:dldregs
  $r31r30 = COPY %6:dregs
  ICALL $r31r30, $r25r24, $r23r22, $r21r20, <regmask $r2 $r3 $r4 $r5 $r6 $r7 $r8 $r9 $r10 $r11 $r12 $r13 $r14 $r15 $r16 $r17 $r28 $r29 $r3r2 $r5r4 $r7r6 $r9r8 $r11r10 $r13r12 $r15r14 $r17r16 $r29r28>, implicit $sp, implicit $r31r30, implicit-def $sp, implicit-def $r24
  ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
  %10:gpr8 = COPY $r24

# End machine code for function _ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E.

The only differences I can see

  • The unoptimised version has more intermediate COPY instructions, potentially reducing pressure
  • The unoptimised version does not kill the $r23r22, $r21r20 registers

@dylanmckay
Copy link
Member Author

dylanmckay commented Mar 7, 2018

Here's a full LLVM MIR dump of the reproduction, taken just prior to register allocation.

--- |
  ; ModuleID = 'ran-out-of-regs.ll'
  source_filename = "bugpoint-output-81150bd.bc"
  target datalayout = "e-P1-p:16:8-i8:8-i16:8-i32:8-i64:8-f32:8-f64:8-n8-a:8"
  target triple = "avr-unknown-unknown"
  
  %"fmt::Formatter.1.77.153.229.305.381.1673" = type { [0 x i8], i32, [0 x i8], i32, [0 x i8], i8, [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], %"option::Option<usize>.0.76.152.228.304.380.1672", [0 x i8], { {}*, {}* }, [0 x i8], { i8*, i8* }, [0 x i8], { [0 x { i8*, i8* }]*, i16 }, [0 x i8] }
  %"option::Option<usize>.0.76.152.228.304.380.1672" = type { [0 x i8], i8, [2 x i8] }
  
  @str.4S = external constant [5 x i8]
  
  ; Function Attrs: uwtable
  define void @"_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E"(%"fmt::Formatter.1.77.153.229.305.381.1673"* dereferenceable(27) %__arg_0) unnamed_addr #0 personality i32 (...)* @rust_eh_personality {
  start:
    %0 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 0
    %1 = load {}*, {}** %0, align 1, !noalias !0, !nonnull !9
    %2 = getelementptr inbounds %"fmt::Formatter.1.77.153.229.305.381.1673", %"fmt::Formatter.1.77.153.229.305.381.1673"* %__arg_0, i16 0, i32 11, i32 1
    %3 = bitcast {}** %2 to i1 ({}*, [0 x i8]*, i16)***
    %4 = load i1 ({}*, [0 x i8]*, i16)**, i1 ({}*, [0 x i8]*, i16)*** %3, align 1, !noalias !0, !nonnull !9
    %5 = getelementptr inbounds i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %4, i16 3
    %6 = load i1 ({}*, [0 x i8]*, i16)*, i1 ({}*, [0 x i8]*, i16)** %5, align 1, !invariant.load !9, !noalias !0, !nonnull !9
    %7 = tail call zeroext i1 %6({}* nonnull %1, [0 x i8]* noalias nonnull readonly bitcast ([5 x i8]* @str.4S to [0 x i8]*), i16 5), !noalias !10
    unreachable
  }
  
  declare i32 @rust_eh_personality(...) unnamed_addr #1
  
  ; Function Attrs: nounwind
  declare void @llvm.stackprotector(i8*, i8**) #2
  
  attributes #0 = { uwtable "target-cpu"="atmega328p" }
  attributes #1 = { "target-cpu"="atmega328p" }
  attributes #2 = { nounwind }
  
  !0 = !{!1, !3, !5, !6, !8}
  !1 = distinct !{!1, !2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E: %data.0"}
  !2 = distinct !{!2, !"_ZN3lib3fmt9Formatter9write_str17ha1a9656fc66ccbe5E"}
  !3 = distinct !{!3, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: argument 0"}
  !4 = distinct !{!4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E"}
  !5 = distinct !{!5, !4, !"_ZN3lib3fmt8builders16debug_struct_new17h352a1de8f89c2bc3E: %name.0"}
  !6 = distinct !{!6, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: argument 0"}
  !7 = distinct !{!7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E"}
  !8 = distinct !{!8, !7, !"_ZN3lib3fmt9Formatter12debug_struct17ha1ff79f633171b68E: %name.0"}
  !9 = !{}
  !10 = !{!3, !6}

...
---
name:            '_ZN65_$LT$lib..str..Chars$LT$$u27$a$GT$$u20$as$u20$lib..fmt..Debug$GT$3fmt17h76a537e22649f739E'
alignment:       1
exposesReturnsTwice: false
legalized:       false
regBankSelected: false
selected:        false
failedISel:      false
tracksRegLiveness: true
registers:       
  - { id: 0, class: dregs, preferred-register: '' }
  - { id: 1, class: dregs, preferred-register: '' }
  - { id: 2, class: ptrdispregs, preferred-register: '' }
  - { id: 3, class: dregs, preferred-register: '' }
  - { id: 4, class: ptrdispregs, preferred-register: '' }
  - { id: 5, class: dregs, preferred-register: '' }
  - { id: 6, class: ptrdispregs, preferred-register: '' }
  - { id: 7, class: dldregs, preferred-register: '' }
  - { id: 8, class: dldregs, preferred-register: '' }
  - { id: 9, class: gpr8, preferred-register: '' }
liveins:         
  - { reg: '$r25r24', virtual-reg: '%0' }
frameInfo:       
  isFrameAddressTaken: false
  isReturnAddressTaken: false
  hasStackMap:     false
  hasPatchPoint:   false
  stackSize:       0
  offsetAdjustment: 0
  maxAlignment:    0
  adjustsStack:    false
  hasCalls:        true
  stackProtector:  ''
  maxCallFrameSize: 4294967295
  hasOpaqueSPAdjustment: false
  hasVAStart:      false
  hasMustTailInVarArgFunc: false
  savePoint:       ''
  restorePoint:    ''
fixedStack:      
stack:           
constants:       
body:             |
  bb.0.start:
    liveins: $r25r24
  
    %2:ptrdispregs = COPY $r25r24
    early-clobber %4:ptrdispregs = LDDWRdPtrQ %2, 17 :: (dereferenceable load 2 from %ir.3, align 1, !noalias !0)
    early-clobber %3:dregs = LDDWRdPtrQ %4, 6 :: (invariant load 2 from %ir.5, align 1, !noalias !0)
    early-clobber %5:dregs = LDDWRdPtrQ %2, 15 :: (dereferenceable load 2 from %ir.0, align 1, !noalias !0)
    ADJCALLSTACKDOWN 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp
    %7:dldregs = LDIWRdK @str.4S
    %8:dldregs = LDIWRdK 5
    $r25r24 = COPY %5
    $r23r22 = COPY %7
    $r21r20 = COPY %8
    $r31r30 = COPY %3
    ICALL killed $r31r30, $r25r24, killed $r23r22, killed $r21r20, csr_normal, implicit $sp, implicit $r31r30, implicit-def $sp, implicit-def dead $r24
    ADJCALLSTACKUP 0, 0, implicit-def dead $sp, implicit-def dead $sreg, implicit $sp

@dylanmckay
Copy link
Member Author

N.B. The function in question does not have a frame pointer

@shepmaster
Copy link
Member

I wonder if current Rust itself is generating different LLVM IR

that logic is fully decided by the calling convention

These are all "regular" Rust functions, which means that Rust itself gets to determine the calling convention. This can (and has) changed over time. I even believe that with inlining, the calling convention could go out the window, so long as there's no "external visibility" of the change.

@shepmaster
Copy link
Member

I checked for professional opinions in #rustc:

nagisa
definitely LLVM backend code
registers are entirely abstracted from the IR generator by LLVM. Changing IR might work-around the issue, but LLVM should handle such stuff transparently
this is similar to e.g. i128 support -- it should be entirely abstracted by LLVM, except for possibly user having to provide software implementations of certain operations
as per the attached "What the function looks like during register allocation", the suspicion seems right

eddyb
worse case you just spill everything
I can't think of any single situation where you need more than three-four hardware registers live
it can end up being really inefficient
but it shouldn't fail compilation
translate the IR back into C and showcase it with clang? if you want to bother
otherwise just try to minimize the IR and show it to LLVM devs
not as an official bug
but asking LLVM devs on IRC

nagisa
do ask help from folks at #llvm, I’m sure they’ll be able to point y'all at the best places to look at.
this is definitely something that LLVM should handle well, without throwing up

irc://irc.oftc.net/#llvm

@nagisa
Copy link

nagisa commented Mar 7, 2018

as per the attached "What the function looks like during register allocation", the suspicion seems right

Something I wrote after skimming the first post here. Probably not true.


Stuff I’d do first is double check whether the tables for machine instructions are correct and maybe try alternative register allocator instead of fast-reg-alloc.

@brainlag
Copy link

Isn't this the same bug as #37?

@dylanmckay
Copy link
Member Author

Yes, but I raised this because the frequency of the bug seemed to occur a lot more under LLVM 6.0.

IIRC LLVM 6.0, this issue didn't affect core much if at all, and most of the debug implementations were left enabled.

For example, here is the same debug implementation above that LLVM 6 had a problem with. This has not been cfg'd out.

@shepmaster
Copy link
Member

We've recently upgraded to LLVM 8 🎉 I'm going to close any bug that is reported against an older version of LLVM. If you are still having this issue with the LLVM-8 based code, please ping me and I can reopen the issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-llvm Affects the LLVM AVR backend has-reduced-testcase A small LLVM IR file exists that demonstrates the problem help wanted This is a good candidate issue to fix
Projects
None yet
Development

No branches or pull requests

4 participants