-
Notifications
You must be signed in to change notification settings - Fork 21
LLVM ERROR: Cannot select — umul_lohi not implemented #174
Comments
Compiled with ; ModuleID = 'bugpoint-reduced-simplified.bc'
target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8-i64:8:8-f32:8:8-f64:8:8-n8"
target triple = "avr-atmel-none"
; Function Attrs: nounwind readnone
declare i16 @llvm.bswap.i16(i16) #1
; Function Attrs: nounwind readnone uwtable
define i16 @_ZN3num6bignum10u8.FullOps8full_mul20h0abbd05633bc14b3GhgE(i8, i8, i8) unnamed_addr #3 {
entry-block:
%3 = zext i8 %2 to i16
%4 = add i16 0, %3
%sret_slot.sroa.0.0.insert.insert = tail call i16 @llvm.bswap.i16(i16 %4)
ret i16 %sret_slot.sroa.0.0.insert.insert
} |
Test case added in |
Note that the error for the reduced case is:
Perhaps that's a separate bug or it explains when you said "we currently do support multiplication"? |
Popping into the debugger, I see that |
That's from the wrong iteration. I think the real culprit is |
If I constrain the issue to the original ; ModuleID = 'bugpoint-reduced-simplified.bc'
target datalayout = "e-p:16:8:8-i8:8:8-i16:8:8-i32:8:8-i64:8:8-f32:8:8-f64:8:8-n8"
target triple = "avr-atmel-none"
define fastcc void @from_str_radix(i32) unnamed_addr {
entry-block:
br i1 undef, label %return, label %next
next: ; preds = %entry-block
%1 = trunc i32 %0 to i8
br label %exit
exit: ; preds = %match_case1, %next
%result.0282 = phi i8 [ %5, %match_case1 ], [ 0, %next ]
%2 = icmp ult i32 undef, %0
br i1 %2, label %match_case0, label %return
match_case0: ; preds = %exit
%3 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %result.0282, i8 %1)
%4 = extractvalue { i8, i1 } %3, 1
%brmerge = or i1 %4, undef
br i1 %brmerge, label %return, label %match_case1
match_case1: ; preds = %match_case0
%5 = extractvalue { i8, i1 } undef, 0
br label %exit
return: ; preds = %match_case0, %exit, %entry-block
ret void
}
; Function Attrs: nounwind readnone
declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8) #0
attributes #0 = { nounwind readnone } |
Repeating the same debugging as before, the offending opcode is FWIW, the backtrace at the interesting point is
|
From memory, you can pass |
Ah, that's how you enable those debug prints. I hadn't dug into that yet. Here are the full results (577 lines). It ends with:
|
Note that the lines like commit 508ab43fd226303dd9e58576060bde576f05a2a3
Author: Jake Goulding <[email protected]>
Date: Thu Jan 14 12:15:20 2016 -0500
[HACK] Avoid truncate, zero-, sign-extending 16-to-16
diff --git a/lib/Support/APInt.cpp b/lib/Support/APInt.cpp
index 23f89bb..f0706c1 100644
--- a/lib/Support/APInt.cpp
+++ b/lib/Support/APInt.cpp
@@ -930,6 +930,10 @@ double APInt::roundToDouble(bool isSigned) const {
// Truncate to new width.
APInt APInt::trunc(unsigned width) const {
+ if (width == BitWidth) {
+ fprintf(stderr, "Truncating %d to %d\n", BitWidth, width);
+ return *this;
+ }
assert(width < BitWidth && "Invalid APInt Truncate request");
assert(width && "Can't truncate to 0 bits");
@@ -953,6 +957,11 @@ APInt APInt::trunc(unsigned width) const {
// Sign extend to a new width.
APInt APInt::sext(unsigned width) const {
+ if (width == BitWidth) {
+ fprintf(stderr, "Sign extending %d to %d\n", BitWidth, width);
+ return *this;
+ }
+
assert(width > BitWidth && "Invalid APInt SignExtend request");
if (width <= APINT_BITS_PER_WORD) {
@@ -994,6 +1003,11 @@ APInt APInt::sext(unsigned width) const {
// Zero extend to a new width.
APInt APInt::zext(unsigned width) const {
+ if (width == BitWidth) {
+ fprintf(stderr, "Zero extending %d to %d\n", BitWidth, width);
+ return *this;
+ }
+
assert(width > BitWidth && "Invalid APInt ZeroExtend request");
if (width <= APINT_BITS_PER_WORD) |
And the bswap isel trace, for good measure. |
This stuff is mind-bending, to say the least. From what I can tell, Opcode 103, BSWAP, does seem to occur in the table, but I haven't dug in further there. |
I feel like something is being lost between |
I think the bswap issue is more straight forward - bswap is only defined for 8-bits and needs a 16 bit version (and maybe all the rest?) |
Yeah, declare i16 @llvm.bswap.i16(i16) #1
define i16 @bswap(i16) unnamed_addr #3 {
entry-block:
%1 = tail call i16 @llvm.bswap.i16(i16 %0)
ret i16 %1
} |
The problem then boils down to us not handling |
Hmm. Where are intrinsic implementations supposed to reside? The docs I'm seeing suggest I should see them in a tablegen file, but nothing obvious is popping up. Presumably some intrinsic is already implemented... |
LLVM converts the intrinsic call into a dag node. see If we have an AVR instruction which byte swaps, we could simply match it Otherwise we could write our own 'custom lowering' hook inside However the best way to do it would be to tell LLVM to From memory, we should be able to do: setOperationAction(ISD::BSWAP, MVT::i16, Expand); And it will be expanded into the correct "copy into tmp reg, copy into high On Thu, Jan 21, 2016 at 12:03 PM, Jake Goulding [email protected]
|
Yup, that worked. Went ahead and did it for {16,32,64} bits. Will update the tests and submit that soon. Do you think it's time to split this into two issues? If so, which would you prefer to stay here? |
Although it does expand into 20+ instructions with a lot of rotates and shifts. |
If the On Thu, Jan 21, 2016 at 1:22 PM, Jake Goulding [email protected]
|
The |
Yeah our generated code is pretty bad, nobody has worked on it (correctness Yeah don't worry about filing an issue, just put a note in the commit On Thu, Jan 21, 2016 at 1:39 PM, Jake Goulding [email protected]
|
I noticed this warning while building:
Could that help explain this issue? |
No, we previously had a case for AVR only has single-bit shift instructions, so in some part it is On Thu, Jan 21, 2016 at 4:34 PM, Jake Goulding [email protected]
|
A smaller reproduction: ; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:
define i1 @foo(i8, i8) unnamed_addr {
; CHECK-LABEL: foo:
entry-block:
%2 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %0, i8 %1)
%3 = extractvalue { i8, i1 } %2, 1
ret i1 %3
}
declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8) |
If I'm reading the mailing list correctly, then we need to actually expand something like |
It's also interesting as AVR doesn't have a |
I think I've run out of road here. I have a high-level concept of what needs to be done, but not enough on-the-ground experience. If you have some time to handwavingly point me in the right direction, I might get be able to pick up steam again. |
Hey Shep, if you've got something more specific, I'd be happy to help :) |
I'd like to pick this back up as it's the first error that occurs when building Rust's libcore. I'm going to basically ignore everything from before and start back with the smaller reproduction: ; RUN: llc < %s -march=avr | FileCheck %s
; XFAIL:
define i1 @foo(i8, i8) unnamed_addr {
; CHECK-LABEL: foo:
entry-block:
%2 = tail call { i8, i1 } @llvm.smul.with.overflow.i8(i8 %0, i8 %1)
%3 = extractvalue { i8, i1 } %2, 1
ret i1 %3
}
declare { i8, i1 } @llvm.smul.with.overflow.i8(i8, i8) This fails with the error:
As I understand the error, the LLVM intrinsic The ISel trace logs indicate that the intrinsic is ultimately converted to a I guess my next question would be: where about in the code could I find how |
First, we need to know what sequence of AVR instructions we want to generate. Then, we want to know if there is an ISDNode for the sequence. If there is: We want to expand the original This means we need to write a custom lowering hook to lower the
|
Thanks!
case Intrinsic::smul_with_overflow: Op = ISD::SMULO; break;
I see similar concepts in the AArch64 code and some of the general ideas are described elsewhere. For unsigned, I think the unsigned 8-bit code is straight-forward: MULU Ra, Rb ; Stores result in R1:R0
CPI R1, 0 ; Compare high byte to 0
BRNE 2 ; If high byte set, we had overflow
RET (R0, 0) ; No overflow
RET (R0, 1) As the AArch64 code shows, the signed is a bit more complicated. I might just try to copy the "64 bit multiply" section of AArch64 to see if that does anything useful by itself. |
Nice! If you need any help writing tests (LLVM has its own test infrastructure), feel free to ping me :) |
Fixed in #190 |
Closing because #190 has been merged. |
More information is available in the original issue. I'll let @dylanmckay copy over whatever information is important.
The text was updated successfully, but these errors were encountered: