Skip to content

[RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering #89833

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
patrick-rivos opened this issue Apr 23, 2024 · 4 comments · Fixed by #89874
Closed

[RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering #89833

patrick-rivos opened this issue Apr 23, 2024 · 4 comments · Fixed by #89874
Labels
backend:RISC-V LTO Link time optimization (regular/full LTO or ThinLTO) miscompilation

Comments

@patrick-rivos
Copy link
Contributor

patrick-rivos commented Apr 23, 2024

Testcase:

short a;
long long b;
int c = 4;
signed char d[10], e[10];
int main() {
  for (long h = 0; h < 10; ++h)
    d[h] = 1;
  for (int h = 0; h < 10; h += c || a)
    e[h] = 256 / d[h];
  __builtin_printf("%d\n", e[1]);
}

Commands:

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv -march=rv64gcv -flto -O2 red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
1
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv red.c -o red.out
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
0

Discovered/tested using cebc960 (not bisected)

Found via fuzzer.

-Wl,-mllvm,-opt-bisect-limit points to pass (70) RISC-V gather/scatter lowering on function (main)

@EugeneZelenko EugeneZelenko added backend:RISC-V miscompilation LTO Link time optimization (regular/full LTO or ThinLTO) and removed new issue labels Apr 23, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2024

@llvm/issue-subscribers-backend-risc-v

Author: Patrick O'Neill (patrick-rivos)

Testcase: ```c short a; long long b; int c = 4; signed char d[10], e[10]; int main() { for (long h = 0; h < 10; ++h) d[h] = 1; for (int h = 0; h < 10; h += c || a) e[h] = 256 / d[h]; __builtin_printf("%d\n", e[1]); } ```

Commands:

&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv -march=rv64gcv -flto -O2 red.c -o red.out
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
1
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -fwrapv red.c -o red.out
&gt; /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 red.out
0

Discovered/tested using cebc960 (not bisected)

Found via fuzzer.

-Wl,-mllvm,-opt-bisect-limit points to pass (70) RISC-V gather/scatter lowering on function (main)

@patrick-rivos
Copy link
Contributor Author

Reduced LLVM IR:

target datalayout = "e-m:e-p:64:64-i64:64-i128:128-n32:64-S128"
target triple = "riscv64-unknown-linux-gnu"

@d = global [10 x i8] zeroinitializer
@e = global [10 x i8] zeroinitializer
@.str = constant [4 x i8] c"%d\0A\00"

; Function Attrs: nofree nounwind uwtable vscale_range(2,1024)
define noundef signext i32 @main() #0 {
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 1 dereferenceable(10) @d, i8 1, i64 1, i1 false)
  %1 = tail call <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64()
  %2 = tail call <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64 0, i64 1)
  %3 = getelementptr inbounds [10 x i8], ptr @d, i64 0, <vscale x 16 x i64> %1
  %4 = tail call <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr> %3, i32 0, <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i8> zeroinitializer)
  %5 = sext <vscale x 16 x i8> shufflevector (<vscale x 16 x i8> insertelement (<vscale x 16 x i8> poison, i8 1, i64 0), <vscale x 16 x i8> poison, <vscale x 16 x i32> zeroinitializer) to <vscale x 16 x i16>
  %6 = select <vscale x 16 x i1> shufflevector (<vscale x 16 x i1> insertelement (<vscale x 16 x i1> poison, i1 true, i64 0), <vscale x 16 x i1> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 1, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), <vscale x 16 x i16> zeroinitializer
  %7 = sdiv <vscale x 16 x i16> shufflevector (<vscale x 16 x i16> insertelement (<vscale x 16 x i16> poison, i16 256, i64 0), <vscale x 16 x i16> poison, <vscale x 16 x i32> zeroinitializer), %6
  %8 = trunc <vscale x 16 x i16> %7 to <vscale x 16 x i8>
  %9 = getelementptr inbounds [10 x i8], ptr @e, i64 0, <vscale x 16 x i64> %1
  tail call void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8> %8, <vscale x 16 x ptr> %9, i32 1, <vscale x 16 x i1> %2)
  %10 = load i8, ptr getelementptr inbounds ([10 x i8], ptr @e, i64 0, i64 1), align 1
  %11 = sext i8 %10 to i32
  %12 = tail call signext i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str, i32 noundef signext %11) #6
  ret i32 0
}

; Function Attrs: nocallback nofree nounwind willreturn memory(argmem: write)
declare void @llvm.memset.p0.i64(ptr nocapture writeonly, i8, i64, i1 immarg) #1

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i64> @llvm.experimental.stepvector.nxv16i64() #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(none)
declare <vscale x 16 x i1> @llvm.get.active.lane.mask.nxv16i1.i64(i64, i64) #2

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(read)
declare <vscale x 16 x i8> @llvm.masked.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>, <vscale x 16 x i8>) #3

; Function Attrs: nocallback nofree nosync nounwind willreturn memory(write)
declare void @llvm.masked.scatter.nxv16i8.nxv16p0(<vscale x 16 x i8>, <vscale x 16 x ptr>, i32 immarg, <vscale x 16 x i1>) #4

; Function Attrs: nofree nounwind
declare noundef signext i32 @printf(ptr nocapture noundef readonly, ...) #5

; uselistorder directives
uselistorder i8 1, { 1, 0 }
uselistorder i64 1, { 2, 0, 1 }
uselistorder i64 0, { 0, 1, 6, 7, 4, 5, 2, 3 }

attributes #0 = { nofree nounwind uwtable vscale_range(2,1024) "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+v" }
attributes #1 = { nocallback nofree nounwind willreturn memory(argmem: write) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #3 = { nocallback nofree nosync nounwind willreturn memory(read) }
attributes #4 = { nocallback nofree nosync nounwind willreturn memory(write) }
attributes #5 = { nofree nounwind "no-trapping-math"="true" "stack-protector-buffer-size"="8" "target-cpu"="generic-rv64" "target-features"="+64bit,+a,+v" }
attributes #6 = { nounwind }

Commands:

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/opt --riscv-gather-scatter-lowering  $1 > opt.bc

> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -Xclang -disable-llvm-passes opt.bc -o test.o
> /scratch/tc-testing/tc-apr-23/build-rv64gcv/build-llvm-linux/bin/clang -Xclang -disable-llvm-passes $1 -o baseline.o

> QEMU_CPU=rv64,vlen=256,v=true,vext_spec=v1.0,zve32f=true,zve64f=true timeout --verbose -k 0.1 4 /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 baseline.o
0
> QEMU_CPU=rv64,vlen=256,v=true,vext_spec=v1.0,zve32f=true,zve64f=true timeout --verbose -k 0.1 4 /scratch/tc-testing/tc-apr-23/build-rv64gcv/bin/qemu-riscv64 test.o
1

@patrick-rivos patrick-rivos changed the title [RISC-V] rv64gcv miscompile at -O2 -flto with pass gather/scatter lowering [RISC-V] rv64gcv miscompile at with pass --riscv-gather-scatter-lowering Apr 23, 2024
@patrick-rivos patrick-rivos changed the title [RISC-V] rv64gcv miscompile at with pass --riscv-gather-scatter-lowering [RISC-V] rv64gcv miscompile with pass --riscv-gather-scatter-lowering Apr 23, 2024
@wangpc-pp
Copy link
Contributor

It seems we don't lower truncated masked_store correctly.

@wangpc-pp
Copy link
Contributor

wangpc-pp commented Apr 24, 2024

It seems we don't lower truncated masked_store correctly.

This issue is caused by wrong MemoryVT after combining Intrinsic::riscv_masked_strided_store to masked.store (#65674).

With wrong MemoryVT, the DAGCombiner will combine trunc+masked.store to truncated masked.store because TLI.canCombineTruncStore returns true (in this case, we have ValVT==nxv16i16 and MemVT==i8, which is Legal by default).

Combining: t41: ch = llvm.riscv.masked.strided.store<(store unknown-size into @e, align 1)> t23:1, TargetConstant:i64<9782>, t38, GlobalAddress:i64<ptr @e> 0, Constant:i64<1>, t14
Creating new node: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
 ... into: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14

Combining: t66: ch = masked_store<(store unknown-size into @e, align 1)> t23:1, t38, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
Creating new node: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14
 ... into: t67: ch = masked_store<(store unknown-size into @e, align 1), trunc to i8> t23:1, t37, GlobalAddress:i64<ptr @e> 0, undef:i64, t14

https://github.com/llvm/llvm-project/blob/da1e3e8b9ab76e7dc6de1fa41116983cb7d0c510/llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp#L11860-L11874

wangpc-pp added a commit to wangpc-pp/llvm-project that referenced this issue Apr 24, 2024
…v.masked.strided.store

According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT
is the scalar element VT for strided store and the MemoryVT is the
same as the store value's VT for unit-stride store.

After combining `riscv.masked.strided.store` to `masked.store`, we
just use the scalar element VT to construct `masked.store`, which is
wrong.

With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store`
to truncated `masked.store` because `TLI.canCombineTruncStore` returns
true.

So, we should use the store value's VT as the MemoryVT.

This fixes llvm#89833.
wangpc-pp added a commit that referenced this issue Apr 24, 2024
…v.masked.strided.store (#89874)

According to `RISCVTargetLowering::getTgtMemIntrinsic`, the MemoryVT
is the scalar element VT for strided store and the MemoryVT is the
same as the store value's VT for unit-stride store.

After combining `riscv.masked.strided.store` to `masked.store`, we
just use the scalar element VT to construct `masked.store`, which is
wrong.

With wrong MemoryVT, the DAGCombiner will combine `trunc+masked.store`
to truncated `masked.store` because `TLI.canCombineTruncStore` returns
true.

So, we should use the store value's VT as the MemoryVT.

This fixes #89833.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:RISC-V LTO Link time optimization (regular/full LTO or ThinLTO) miscompilation
Projects
None yet
4 participants