Skip to content
This repository was archived by the owner on Sep 2, 2018. It is now read-only.

LLVM ERROR: Cannot select — umul_lohi not implemented #174

Closed
shepmaster opened this issue Jan 18, 2016 · 36 comments
Closed

LLVM ERROR: Cannot select — umul_lohi not implemented #174

shepmaster opened this issue Jan 18, 2016 · 36 comments

Comments

@shepmaster
Copy link

LLVM ERROR: Cannot select: t35: i8 = mulhs t2, t4
  t2: i8,ch = CopyFromReg t0, Register:i8 %vreg27
    t1: i8 = Register %vreg27
  t4: i8,ch = CopyFromReg t0, Register:i8 %vreg25
    t3: i8 = Register %vreg25
In function: _ZN3num14from_str_radix20h7837196669301488416E

More information is available in the original issue. I'll let @dylanmckay copy over whatever information is important.

@shepmaster
Copy link
Author

Compiled with llc -march=avr -mcpu=atmega328p -filetype=obj all-core.ll:

; ModuleID = 'bugpoint-reduced-simplified.bc'
target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8-i64:8:8-f32:8:8-f64:8:8-n8"
target triple = "avr-atmel-none"

; Function Attrs: nounwind readnone
declare i16 @llvm.bswap.i16(i16) #1

; Function Attrs: nounwind readnone uwtable
define i16 @_ZN3num6bignum10u8.FullOps8full_mul20h0abbd05633bc14b3GhgE(i8, i8, i8) unnamed_addr #3 {
entry-block:
  %3 = zext i8 %2 to i16
  %4 = add i16 0, %3
  %sret_slot.sroa.0.0.insert.insert = tail call i16 @llvm.bswap.i16(i16 %4)
  ret i16 %sret_slot.sroa.0.0.insert.insert
}

@dylanmckay
Copy link
Member

Test case added in test/CodeGen/AVR/issue-cannot-select-mulhs.ll at bf5a68a.

@shepmaster
Copy link
Author

Note that the error for the reduced case is:

LLVM ERROR: Cannot select: t9: i16 = bswap t7
  t7: i16 = zero_extend t6
    t6: i8,ch = CopyFromReg t0, Register:i8 %vreg2
      t5: i8 = Register %vreg2

Perhaps that's a separate bug or it explains when you said "we currently do support multiplication"?

@shepmaster
Copy link
Author

Popping into the debugger, I see that SelectionDAGISel::DoInstructionSelection is calling Select with a node with opcode 219. I believe that to be a NodeType::RET_FLAG. SelectionDAGISel::SelectCodeCommon doesn't know how to handle that, so it asserts.

@shepmaster
Copy link
Author

with opcode 219

That's from the wrong iteration. I think the real culprit is 103 (BSWAP), which seems suspiciously specific. I'm going to try re-bugpointing with an even more specific test.

@shepmaster
Copy link
Author

If I constrain the issue to the original mulhs, I reduce to this:

; ModuleID = 'bugpoint-reduced-simplified.bc'
target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8-i64:8:8-f32:8:8-f64:8:8-n8"
target triple = "avr-atmel-none"

define fastcc void @from_str_radix(i32) unnamed_addr {
entry-block:
  br i1 undef, label %return, label %next

next:                                             ; preds = %entry-block
  %1 = trunc i32 %0 to i8
  br label %exit

exit:                                             ; preds = %match_case1, %next
  %result.0282 = phi i8 [ %5, %match_case1 ], [ 0, %next ]
  %2 = icmp ult i32 undef, %0
  br i1 %2, label %match_case0, label %return

match_case0:                                      ; preds = %exit
  %3 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %result.0282, i8 %1)
  %4 = extractvalue { i8, i1 } %3, 1
  %brmerge = or i1 %4, undef
  br i1 %brmerge, label %return, label %match_case1

match_case1:                                      ; preds = %match_case0
  %5 = extractvalue { i8, i1 } undef, 0
  br label %exit

return:                                           ; preds = %match_case0, %exit, %entry-block
  ret void
}

; Function Attrs: nounwind readnone
declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8) #0

attributes #0 = { nounwind readnone }

@shepmaster
Copy link
Author

Repeating the same debugging as before, the offending opcode is 90, MULHS.

FWIW, the backtrace at the interesting point is

llvm::SelectionDAGISel::CannotYetSelect(this=..., N=...) at SelectionDAGISel.cpp:3417
llvm::SelectionDAGISel::SelectCodeCommon(this=..., NodeToMatch=..., MatcherTable="\x16Ny", TableSize=2485) at SelectionDAGISel.cpp:3368
llvm::AVRDAGToDAGISel::SelectCode(this=..., N=...) at AVRGenDAGISel.inc:1309
llvm::AVRDAGToDAGISel::Select(this=..., N=...) at AVRISelDAGToDAG.cpp:544
llvm::SelectionDAGISel::DoInstructionSelection(this=...) at SelectionDAGISel.cpp:940
llvm::SelectionDAGISel::CodeGenAndEmitDAG(this=...) at SelectionDAGISel.cpp:840
llvm::SelectionDAGISel::SelectBasicBlock(this=..., Begin=llvm::BasicBlock::const_iterator @ ..., End=llvm::BasicBlock::const_iterator @ ..., HadTailCall=...) at SelectionDAGISel.cpp:668
llvm::SelectionDAGISel::SelectAllBasicBlocks(this=..., Fn=...) at SelectionDAGISel.cpp:1359

@dylanmckay
Copy link
Member

From memory, you can pass -debug-only=isel to llc to trace the ISel selection process. Can you try this to figure out the sequence of operations it took to map (presumably a mul) into mulhs?.

@shepmaster
Copy link
Author

trace the ISel selection process

Ah, that's how you enable those debug prints. I hadn't dug into that yet. Here are the full results (577 lines). It ends with:

ISEL: Starting pattern match on root node: t25: i8 = mulhs t2, t4

  Initial Opcode index to 0
  Match failed at index 0
LLVM ERROR: Cannot select: t25: i8 = mulhs t2, t4
  t2: i8,ch = CopyFromReg t0, Register:i8 %vreg1
    t1: i8 = Register %vreg1
  t4: i8,ch = CopyFromReg t0, Register:i8 %vreg0
    t3: i8 = Register %vreg0
In function: from_str_radix

@shepmaster
Copy link
Author

Note that the lines like Truncating 16 to 16 are from my own hack to work around another assertion in APInt:

commit 508ab43fd226303dd9e58576060bde576f05a2a3
Author: Jake Goulding <[email protected]>
Date:   Thu Jan 14 12:15:20 2016 -0500

    [HACK] Avoid truncate, zero-, sign-extending 16-to-16

diff --git a/lib/Support/APInt.cpp b/lib/Support/APInt.cpp
index 23f89bb..f0706c1 100644
--- a/lib/Support/APInt.cpp
+++ b/lib/Support/APInt.cpp
@@ -930,6 +930,10 @@ double APInt::roundToDouble(bool isSigned) const {

 // Truncate to new width.
 APInt APInt::trunc(unsigned width) const {
+  if (width == BitWidth) {
+    fprintf(stderr, "Truncating %d to %d\n", BitWidth, width);
+    return *this;
+  }
   assert(width < BitWidth && "Invalid APInt Truncate request");
   assert(width && "Can't truncate to 0 bits");

@@ -953,6 +957,11 @@ APInt APInt::trunc(unsigned width) const {

 // Sign extend to a new width.
 APInt APInt::sext(unsigned width) const {
+  if (width == BitWidth) {
+    fprintf(stderr, "Sign extending %d to %d\n", BitWidth, width);
+    return *this;
+  }
+
   assert(width > BitWidth && "Invalid APInt SignExtend request");

   if (width <= APINT_BITS_PER_WORD) {
@@ -994,6 +1003,11 @@ APInt APInt::sext(unsigned width) const {

 //  Zero extend to a new width.
 APInt APInt::zext(unsigned width) const {
+  if (width == BitWidth) {
+    fprintf(stderr, "Zero extending %d to %d\n", BitWidth, width);
+    return *this;
+  }
+
   assert(width > BitWidth && "Invalid APInt ZeroExtend request");

   if (width <= APINT_BITS_PER_WORD)

@shepmaster
Copy link
Author

And the bswap isel trace, for good measure.

@shepmaster
Copy link
Author

This stuff is mind-bending, to say the least. From what I can tell, MatcherTable contains a bunch of cases to use for instruction selection. We start at the beginning of that table and exhaust it before finding one that matches opcode 90, MULHS. Thus the failure.

Opcode 103, BSWAP, does seem to occur in the table, but I haven't dug in further there.

@shepmaster
Copy link
Author

I feel like something is being lost between AVRInstrInfo.td and AVRGenDAGISel.inc, as they don't seem to be 1-to-1 mappings. I can't quite tell what controls that though.

@shepmaster
Copy link
Author

I think the bswap issue is more straight forward - bswap is only defined for 8-bits and needs a 16 bit version (and maybe all the rest?)

@shepmaster
Copy link
Author

Yeah, bswap can be further simplified to this:

declare i16 @llvm.bswap.i16(i16) #1

define i16 @bswap(i16) unnamed_addr #3 {
entry-block:
  %1 = tail call i16 @llvm.bswap.i16(i16 %0)
  ret i16 %1
}

@dylanmckay
Copy link
Member

bswaping a single byte is a nop, so it is likely LLVM handles this generically.

The problem then boils down to us not handling bswap at all currently.

@shepmaster
Copy link
Author

not handling bswap at all currently.

Hmm. Where are intrinsic implementations supposed to reside? The docs I'm seeing suggest I should see them in a tablegen file, but nothing obvious is popping up. Presumably some intrinsic is already implemented...

@dylanmckay
Copy link
Member

LLVM converts the intrinsic call into a dag node. see
the corresponding ISDNode for BSWAP.

If we have an AVR instruction which byte swaps, we could simply match it
with a pattern. I really doubt this is the case.

Otherwise we could write our own 'custom lowering' hook inside
AVRISelLowering.cpp.

However the best way to do it would be to tell LLVM to Expand this
instruction into other ones. Look at AVRISelLowering.cpp for details.

From memory, we should be able to do:

setOperationAction(ISD::BSWAP, MVT::i16, Expand);

And it will be expanded into the correct "copy into tmp reg, copy into high
byte, copy tmp reg back to low byte" set of instructions.

On Thu, Jan 21, 2016 at 12:03 PM, Jake Goulding [email protected]
wrote:

not handling bswap at all currently.

Hmm. Where are intrinsic implementations supposed to reside? The docs I'm
seeing http://llvm.org/docs/ExtendingLLVM.html suggest I should see
them in a tablegen file, but nothing obvious is popping up. Presumably
some intrinsic is already implemented...


Reply to this email directly or view it on GitHub
#174 (comment).

@shepmaster
Copy link
Author

From memory, we should be able to do:

Yup, that worked. Went ahead and did it for {16,32,64} bits. Will update the tests and submit that soon. Do you think it's time to split this into two issues? If so, which would you prefer to stay here?

@shepmaster
Copy link
Author

Yup, that worked

Although it does expand into 20+ instructions with a lot of rotates and shifts.

@dylanmckay
Copy link
Member

If the bswap issue is now fixed, I wouldn't worry about creating a new
issue. Once we are fully merged into master we'll need to start using
LLVM's bugtracker.

On Thu, Jan 21, 2016 at 1:22 PM, Jake Goulding [email protected]
wrote:

Yup, that worked

Although it does expand into 20+ instructions with a lot of rotates and
shifts.


Reply to this email directly or view it on GitHub
#174 (comment).

@shepmaster
Copy link
Author

If the bswap issue is now fixed

The bswap issue is fixed (with overly inefficient code), but the mulhs issue remains. You'd like me to submit that fix without being attached to a Github issue?

@dylanmckay
Copy link
Member

Yeah our generated code is pretty bad, nobody has worked on it (correctness
before speed, and all that).

Yeah don't worry about filing an issue, just put a note in the commit
message.

On Thu, Jan 21, 2016 at 1:39 PM, Jake Goulding [email protected]
wrote:

If the bswap issue is now fixed

The bswap issue is fixed (with overly inefficient code), but the mulhs
issue remains. You'd like me to submit that fix without being attached to a
Github issue?


Reply to this email directly or view it on GitHub
#174 (comment).

@shepmaster
Copy link
Author

I noticed this warning while building:

avr-rust/build-no-core-debug/x86_64-apple-darwin/llvm/lib/Target/AVR/AVRGenCallingConv.inc:54:13: warning:
      unused function 'ArgCC_AVR_BUILTIN_MUL' [-Wunused-function]
static bool ArgCC_AVR_BUILTIN_MUL(unsigned ValNo, MVT ValVT,
            ^

Could that help explain this issue?

@dylanmckay
Copy link
Member

No, we previously had a case for ArgCC_AVR_BUILTIN_MUL but we don't yet
generate calls to multiplication builtins, so the code is left there for
the future.

AVR only has single-bit shift instructions, so in some part it is
unavoidable. We definitely can optimise our generated code in general
though.

On Thu, Jan 21, 2016 at 4:34 PM, Jake Goulding [email protected]
wrote:

I noticed this warning while building:

avr-rust/build-no-core-debug/x86_64-apple-darwin/llvm/lib/Target/AVR/AVRGenCallingConv.inc:54:13: warning:
unused function 'ArgCC_AVR_BUILTIN_MUL' [-Wunused-function]
static bool ArgCC_AVR_BUILTIN_MUL(unsigned ValNo, MVT ValVT,
^

Could that help explain this issue?


Reply to this email directly or view it on GitHub
#174 (comment).

@shepmaster
Copy link
Author

A smaller reproduction:

; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:

define i1 @foo(i8, i8) unnamed_addr {
; CHECK-LABEL: foo:
entry-block:
  %2 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
}

declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8)

@shepmaster
Copy link
Author

If I'm reading the mailing list correctly, then we need to actually expand something like $dest = mul($src1, $src2) into mul($src1, $src2); copy R0 -> $dest:0; copy R1 -> $dest:1 (apologies for silly pseudocode)?

@shepmaster
Copy link
Author

It's also interesting as AVR doesn't have a u8 * u8 -> u8. Instead, we'd have to ignore the upper half of the result (or really use that combined with the carry flag).

@shepmaster
Copy link
Author

I think I've run out of road here. I have a high-level concept of what needs to be done, but not enough on-the-ground experience. If you have some time to handwavingly point me in the right direction, I might get be able to pick up steam again.

@dylanmckay
Copy link
Member

Hey Shep, if you've got something more specific, I'd be happy to help :)

@shepmaster
Copy link
Author

I'd like to pick this back up as it's the first error that occurs when building Rust's libcore. I'm going to basically ignore everything from before and start back with the smaller reproduction:

; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:

define i1 @foo(i8, i8) unnamed_addr {
; CHECK-LABEL: foo:
entry-block:
  %2 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %0, i8 %1)
  %3 = extractvalue { i8, i1 } %2, 1
  ret i1 %3
}

declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8)

This fails with the error:

LLVM ERROR: Cannot select: t12: i8 = mulhs t2, t4
  t2: i8,ch = CopyFromReg t0, Register:i8 %vreg0
    t1: i8 = Register %vreg0
  t4: i8,ch = CopyFromReg t0, Register:i8 %vreg1
    t3: i8 = Register %vreg1

As I understand the error, the LLVM intrinsic smul.with.overflow.* is not currently supported by AVR-LLVM.

The ISel trace logs indicate that the intrinsic is ultimately converted to a mulhs, which is where ISel fails.

I guess my next question would be: where about in the code could I find how mulhs should correspond to the next level down?

@dylanmckay
Copy link
Member

First, we need to know what sequence of AVR instructions we want to generate.

Then, we want to know if there is an ISDNode for the sequence.

If there is:

We want to expand the original smul_with_overflow.* intrinsic into this ISDNode. We can then add a new pattern definition to AVRInstrInfo.td. A quick look at the manual makes me think that there is no instruction for overflowing multiplication (but don't take my word for it).

This means we need to write a custom lowering hook to lower the mulhs to some set of AVR instructions. This is done in two steps.

  • Mark MULHS as TargetLowering::Custom in AVRISelLowering.cpp
setOperationAction(ISD::MULHS, MVT::i8, Custom);
setOperationAction(ISD::MULHS, MVT::i16, Custom);
  • Add a new case to AVRTargetLowering::LowerOperation

@shepmaster
Copy link
Author

Thanks!

if there is an ISDNode for the sequence.

I believe there is:

case Intrinsic::smul_with_overflow: Op = ISD::SMULO; break;

a custom lowering hook

I see similar concepts in the AArch64 code and some of the general ideas are described elsewhere.

For unsigned, I think the unsigned 8-bit code is straight-forward:

MULU Ra, Rb ; Stores result in R1:R0
CPI R1, 0   ; Compare high byte to 0
BRNE 2      ; If high byte set, we had overflow
RET (R0, 0) ; No overflow
RET (R0, 1)

As the AArch64 code shows, the signed is a bit more complicated. I might just try to copy the "64 bit multiply" section of AArch64 to see if that does anything useful by itself.

@dylanmckay
Copy link
Member

Nice!

If you need any help writing tests (LLVM has its own test infrastructure), feel free to ping me :)

@shepmaster
Copy link
Author

Fixed in #190

@dylanmckay
Copy link
Member

Closing because #190 has been merged.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

2 participants