-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering
#89833
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-backend-risc-v Author: Patrick O'Neill (patrick-rivos)
Testcase:
```c
short a;
long long b;
int c = 4;
signed char d[10], e[10];
int main() {
for (long h = 0; h < 10; ++h)
d[h] = 1;
for (int h = 0; h < 10; h += c || a)
e[h] = 256 / d[h];
__builtin_printf("%d\n", e[1]);
}
```
Commands: > /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv -march=rv64gcv -flto -O2 red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
1
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
0 Discovered/tested using cebc960 (not bisected) Found via fuzzer.
|
Reduced LLVM IR: target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"
@d = global [10 x i8] zeroinitializer
@e = global [10 x i8] zeroinitializer
@.str = constant [4 x i8] c"%d\0A\00"
; Function Attrs: nofree nounwind uwtable vscale_range(2,1024)
define noundef signext i32 @main() #0 {
tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) @d, i8 1, i64 1, i1 false)
%1 = tail call <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64()
%2 = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 1)
%3 = getelementptr inbounds [10 x i8], ptr @d, i64 0, <vscale x 16 x i64> %1
%4 = tail call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr> %3, i32 0, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> zeroinitializer)
%5 = sext <vscale x 16 x i8> shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 1, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer) to <vscale x 16 x i16>
%6 = select <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> zeroinitializer
%7 = sdiv <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 256, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), %6
%8 = trunc <vscale x 16 x i16> %7 to <vscale x 16 x i8>
%9 = getelementptr inbounds [10 x i8], ptr @e, i64 0, <vscale x 16 x i64> %1
tail call void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8> %8, <vscale x 16 x ptr> %9, i32 1, <vscale x 16 x i1> %2)
%10 = load i8, ptr getelementptr inbounds ([10 x i8], ptr @e, i64 0, i64 1), align 1
%11 = sext i8 %10 to i32
%12 = tail call signext i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef signext %11) #6
ret i32 0
}
; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64() #2
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64, i64) #2
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>, <vscale x 16 x i8>) #3
; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
declare void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8>, <vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>) #4
; Function Attrs: nofree nounwind
declare noundef signext i32 @printf(ptr nocapture noundef readonly, ...) #5
; uselistorder directives
uselistorder i8 1, { 1, 0 }
uselistorder i64 1, { 2, 0, 1 }
uselistorder i64 0, { 0, 1, 6, 7, 4, 5, 2, 3 }
attributes #0 = { nofree nounwind uwtable vscale_range(2,1024) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+v" }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(read) }
attributes #4 = { nocallback nofree nosync nounwind willreturn memory(write) }
attributes #5 = { nofree nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+v" }
attributes #6 = { nounwind } Commands:
|
-O2 -flto
with pass gather/scatter lowering
--riscv-gather-scatter-lowering
--riscv-gather-scatter-lowering
--riscv-gather-scatter-lowering
It seems we don't lower truncated masked_store correctly. |
This issue is caused by wrong MemoryVT after combining With wrong MemoryVT, the DAGCombiner will combine Combining: t41: ch = llvm.riscv.masked.strided.store<(store unknown-size into @e, align 1)> t23:1, TargetConstant:i64<9782>, t38, GlobalAddress:i64<ptr @e> 0, Constant:i64<1>, t14
Creating new node: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
... into: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
Combining: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
Creating new node: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
... into: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14 |
…v.masked.strided.store According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT is the scalar element VT for strided store and the MemoryVT is the same as the store value's VT for unit-stride store. After combining `riscv.masked.strided.store` to `masked.store`, we just use the scalar element VT to construct `masked.store`, which is wrong. With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store` to truncated `masked.store` because `TLI.canCombineTruncStore` returns true. So, we should use the store value's VT as the MemoryVT. This fixes llvm#89833.
…v.masked.strided.store (#89874) According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT is the scalar element VT for strided store and the MemoryVT is the same as the store value's VT for unit-stride store. After combining `riscv.masked.strided.store` to `masked.store`, we just use the scalar element VT to construct `masked.store`, which is wrong. With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store` to truncated `masked.store` because `TLI.canCombineTruncStore` returns true. So, we should use the store value's VT as the MemoryVT. This fixes #89833.
Testcase:
Commands:
Discovered/tested using cebc960 (not bisected)
Found via fuzzer.
-Wl,-mllvm,-opt-bisect-limit
points topass (70) RISC-V gather/scatter lowering on function (main)
The text was updated successfully, but these errors were encountered: