[RISC-V] rv64gcv miscompile with pass `--riscv-gather-scatter-lowering` #89833

patrick-rivos · 2024-04-23T21:19:06Z

Testcase:

short a;
long long b;
int c = 4;
signed char d[10], e[10];
int main() {
  for (long h = 0; h < 10; ++h)
    d[h] = 1;
  for (int h = 0; h < 10; h += c || a)
    e[h] = 256 / d[h];
  __builtin_printf("%d\n", e[1]);
}

Commands:

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv -march=rv64gcv -flto -O2 red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
1
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
0

Discovered/tested using cebc960 (not bisected)

Found via fuzzer.

-Wl,-mllvm,-opt-bisect-limit points to pass (70) RISC-V gather/scatter lowering on function (main)

The text was updated successfully, but these errors were encountered:

llvmbot · 2024-04-23T21:32:08Z

@llvm/issue-subscribers-backend-risc-v

Author: Patrick O'Neill (patrick-rivos)

Testcase: ```c short a; long long b; int c = 4; signed char d[10], e[10]; int main() { for (long h = 0; h < 10; ++h) d[h] = 1; for (int h = 0; h < 10; h += c || a) e[h] = 256 / d[h]; __builtin_printf("%d\n", e[1]); } ```

Commands:

&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv -march=rv64gcv -flto -O2 red.c -o red.out
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
1
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv red.c -o red.out
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
0

Discovered/tested using cebc960 (not bisected)

Found via fuzzer.

-Wl,-mllvm,-opt-bisect-limit points to pass (70) RISC-V gather/scatter lowering on function (main)

patrick-rivos · 2024-04-23T22:53:16Z

Reduced LLVM IR:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

@d = global [10 x i8] zeroinitializer
@e = global [10 x i8] zeroinitializer
@.str = constant [4 x i8] c"%d\0A\00"

; Function Attrs: nofree nounwind uwtable vscale_range(2,1024)
define noundef signext i32 @main() #0 {
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) @d, i8 1, i64 1, i1 false)
  %1 = tail call <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64()
  %2 = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 1)
  %3 = getelementptr inbounds [10 x i8], ptr @d, i64 0, <vscale x 16 x i64> %1
  %4 = tail call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr> %3, i32 0, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> zeroinitializer)
  %5 = sext <vscale x 16 x i8> shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 1, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer) to <vscale x 16 x i16>
  %6 = select <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> zeroinitializer
  %7 = sdiv <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 256, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), %6
  %8 = trunc <vscale x 16 x i16> %7 to <vscale x 16 x i8>
  %9 = getelementptr inbounds [10 x i8], ptr @e, i64 0, <vscale x 16 x i64> %1
  tail call void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8> %8, <vscale x 16 x ptr> %9, i32 1, <vscale x 16 x i1> %2)
  %10 = load i8, ptr getelementptr inbounds ([10 x i8], ptr @e, i64 0, i64 1), align 1
  %11 = sext i8 %10 to i32
  %12 = tail call signext i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef signext %11) #6
  ret i32 0
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64() #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64, i64) #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>, <vscale x 16 x i8>) #3

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
declare void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8>, <vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>) #4

; Function Attrs: nofree nounwind
declare noundef signext i32 @printf(ptr nocapture noundef readonly, ...) #5

; uselistorder directives
uselistorder i8 1, { 1, 0 }
uselistorder i64 1, { 2, 0, 1 }
uselistorder i64 0, { 0, 1, 6, 7, 4, 5, 2, 3 }

attributes #0 = { nofree nounwind uwtable vscale_range(2,1024) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+v" }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(read) }
attributes #4 = { nocallback nofree nosync nounwind willreturn memory(write) }
attributes #5 = { nofree nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+v" }
attributes #6 = { nounwind }

Commands:

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/opt --riscv-gather-scatter-lowering  $1 > opt.bc

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -Xclang -disable-llvm-passes opt.bc -o test.o
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -Xclang -disable-llvm-passes $1 -o baseline.o

> QEMU_CPU=rv64,vlen=256,v=true,vext_spec=v1.0,zve32f=true,zve64f=true timeout --verbose -k 0.1 4 /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 baseline.o
0
> QEMU_CPU=rv64,vlen=256,v=true,vext_spec=v1.0,zve32f=true,zve64f=true timeout --verbose -k 0.1 4 /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 test.o
1

wangpc-pp · 2024-04-24T03:44:13Z

It seems we don't lower truncated masked_store correctly.

wangpc-pp · 2024-04-24T05:14:42Z

It seems we don't lower truncated masked_store correctly.

This issue is caused by wrong MemoryVT after combining Intrinsic::riscv_masked_strided_store to masked.store (#65674).

With wrong MemoryVT, the DAGCombiner will combine trunc+masked.store to truncated masked.store because TLI.canCombineTruncStore returns true (in this case, we have ValVT==nxv16i16 and MemVT==i8, which is Legal by default).

Combining: t41: ch = llvm.riscv.masked.strided.store<(store unknown-size into @e, align 1)> t23:1, TargetConstant:i64<9782>, t38, GlobalAddress:i64<ptr @e> 0, Constant:i64<1>, t14
Creating new node: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
 ... into: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14

Combining: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
Creating new node: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
 ... into: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14

https://github.com/llvm/llvm-project/blob/da1e3e8b9ab76e7dc6de1fa41116983cb7d0c510/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L11860-L11874

…v.masked.strided.store According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT is the scalar element VT for strided store and the MemoryVT is the same as the store value's VT for unit-stride store. After combining `riscv.masked.strided.store` to `masked.store`, we just use the scalar element VT to construct `masked.store`, which is wrong. With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store` to truncated `masked.store` because `TLI.canCombineTruncStore` returns true. So, we should use the store value's VT as the MemoryVT. This fixes llvm#89833.

…v.masked.strided.store (#89874) According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT is the scalar element VT for strided store and the MemoryVT is the same as the store value's VT for unit-stride store. After combining `riscv.masked.strided.store` to `masked.store`, we just use the scalar element VT to construct `masked.store`, which is wrong. With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store` to truncated `masked.store` because `TLI.canCombineTruncStore` returns true. So, we should use the store value's VT as the MemoryVT. This fixes #89833.

github-actions bot added the new issue label Apr 23, 2024

EugeneZelenko added backend:RISC-V miscompilation LTO Link time optimization (regular/full LTO or ThinLTO) and removed new issue labels Apr 23, 2024

patrick-rivos changed the title ~~[RISC-V] rv64gcv miscompile at -O2 -flto with pass gather/scatter lowering~~ [RISC-V] rv64gcv miscompile at with pass --riscv-gather-scatter-lowering Apr 23, 2024

patrick-rivos changed the title ~~[RISC-V] rv64gcv miscompile at with pass --riscv-gather-scatter-lowering~~ [RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering Apr 23, 2024

wangpc-pp mentioned this issue Apr 24, 2024

[RISCV] Use the store value's VT as the MemoryVT after combining riscv.masked.strided.store #89874

Merged

wangpc-pp closed this as completed in #89874 Apr 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RISC-V] rv64gcv miscompile with pass `--riscv-gather-scatter-lowering` #89833

[RISC-V] rv64gcv miscompile with pass `--riscv-gather-scatter-lowering` #89833

patrick-rivos commented Apr 23, 2024 •

edited

Loading

llvmbot commented Apr 23, 2024

patrick-rivos commented Apr 23, 2024

wangpc-pp commented Apr 24, 2024

wangpc-pp commented Apr 24, 2024 •

edited

Loading

[RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering #89833

[RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering #89833

Comments

patrick-rivos commented Apr 23, 2024 • edited Loading

llvmbot commented Apr 23, 2024

patrick-rivos commented Apr 23, 2024

wangpc-pp commented Apr 24, 2024

wangpc-pp commented Apr 24, 2024 • edited Loading

[RISC-V] rv64gcv miscompile with pass `--riscv-gather-scatter-lowering` #89833

[RISC-V] rv64gcv miscompile with pass `--riscv-gather-scatter-lowering` #89833

patrick-rivos commented Apr 23, 2024 •

edited

Loading

wangpc-pp commented Apr 24, 2024 •

edited

Loading