Skip to content

[VPlan] Fix mutating whilst iterating over users in EVL transform #122885

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 14, 2025

Conversation

lukel97
Copy link
Contributor

@lukel97 lukel97 commented Jan 14, 2025

This fixes a miscompilation extracted from 525.x264_r, where we were failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were removing a use of VF whilst iterating over the users() iterator, which messed up the iterator in-flight and caused us to miss some recipes. This fixes it by copying the users into a SmallVector first. (make_early_inc_range doesn't work here either)

Fixes #122681
Fixes #122682

…sform

This fixes a miscompilation extracted from 525.x264_r, where we were failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were replacing a user of VF whilst iterating over the users() iterator, which meant that we ended up missing some recipes. This fixes it by storing it into a SmallVector first before iterating. (make_early_inc_range doesn't work here either)
@llvmbot
Copy link
Member

llvmbot commented Jan 14, 2025

@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Luke Lau (lukel97)

Changes

This fixes a miscompilation extracted from 525.x264_r, where we were failing to update the runtime VF of a VPReverseVectorPointerRecipe.

We were replacing a user of VF whilst iterating over the users() iterator, which meant that we ended up missing some recipes. This fixes it by storing it into a SmallVector first before iterating. (make_early_inc_range doesn't work here either)


Full diff: https://github.com/llvm/llvm-project/pull/122885.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+1-1)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll (+118)
diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
index 545d277d7aa018..33f6dc94b9e7a3 100644
--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp
@@ -1603,7 +1603,7 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
   LLVMContext &Ctx = CanonicalIVType->getContext();
   VPValue *AllOneMask = Plan.getOrAddLiveIn(ConstantInt::getTrue(Ctx));
 
-  for (VPUser *U : Plan.getVF().users()) {
+  for (VPUser *U : SmallVector<VPUser *>(Plan.getVF().users())) {
     if (auto *R = dyn_cast<VPReverseVectorPointerRecipe>(U))
       R->setOperand(1, &EVL);
   }
diff --git a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
index f323231445aadc..c50e2adac8d7d6 100644
--- a/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
+++ b/llvm/test/Transforms/LoopVectorize/RISCV/vectorize-force-tail-with-evl-reverse-load-store.ll
@@ -249,3 +249,121 @@ loopend:
   ret void
 }
 
+define void @multiple_reverse_vector_pointer(ptr noalias %a, ptr noalias %b, ptr noalias %c, ptr noalias %d) {
+; IF-EVL-LABEL: @multiple_reverse_vector_pointer(
+; IF-EVL-NEXT:  entry:
+; IF-EVL-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]
+; IF-EVL:       vector.ph:
+; IF-EVL-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP1:%.*]] = mul i64 [[TMP0]], 16
+; IF-EVL-NEXT:    [[TMP2:%.*]] = sub i64 [[TMP1]], 1
+; IF-EVL-NEXT:    [[N_RND_UP:%.*]] = add i64 1025, [[TMP2]]
+; IF-EVL-NEXT:    [[N_MOD_VF:%.*]] = urem i64 [[N_RND_UP]], [[TMP1]]
+; IF-EVL-NEXT:    [[N_VEC:%.*]] = sub i64 [[N_RND_UP]], [[N_MOD_VF]]
+; IF-EVL-NEXT:    [[TMP3:%.*]] = call i64 @llvm.vscale.i64()
+; IF-EVL-NEXT:    [[TMP4:%.*]] = mul i64 [[TMP3]], 16
+; IF-EVL-NEXT:    [[TMP5:%.*]] = sub i64 1024, [[N_VEC]]
+; IF-EVL-NEXT:    br label [[VECTOR_BODY:%.*]]
+; IF-EVL:       vector.body:
+; IF-EVL-NEXT:    [[INDEX:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_NEXT:%.*]], [[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[EVL_BASED_IV:%.*]] = phi i64 [ 0, [[VECTOR_PH]] ], [ [[INDEX_EVL_NEXT:%.*]], [[VECTOR_BODY]] ]
+; IF-EVL-NEXT:    [[AVL:%.*]] = sub i64 1025, [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[TMP6:%.*]] = call i32 @llvm.experimental.get.vector.length.i64(i64 [[AVL]], i32 16, i1 true)
+; IF-EVL-NEXT:    [[OFFSET_IDX:%.*]] = sub i64 1024, [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[TMP7:%.*]] = add i64 [[OFFSET_IDX]], 0
+; IF-EVL-NEXT:    [[TMP8:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[TMP7]]
+; IF-EVL-NEXT:    [[TMP9:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[TMP10:%.*]] = mul i64 0, [[TMP9]]
+; IF-EVL-NEXT:    [[TMP11:%.*]] = sub i64 1, [[TMP9]]
+; IF-EVL-NEXT:    [[TMP12:%.*]] = getelementptr i8, ptr [[TMP8]], i64 [[TMP10]]
+; IF-EVL-NEXT:    [[TMP13:%.*]] = getelementptr i8, ptr [[TMP12]], i64 [[TMP11]]
+; IF-EVL-NEXT:    [[VP_OP_LOAD:%.*]] = call <vscale x 16 x i8> @llvm.vp.load.nxv16i8.p0(ptr align 1 [[TMP13]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[VP_REVERSE:%.*]] = call <vscale x 16 x i8> @llvm.experimental.vp.reverse.nxv16i8(<vscale x 16 x i8> [[VP_OP_LOAD]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP14:%.*]] = getelementptr i8, ptr [[B:%.*]], <vscale x 16 x i8> [[VP_REVERSE]]
+; IF-EVL-NEXT:    [[WIDE_MASKED_GATHER:%.*]] = call <vscale x 16 x i8> @llvm.vp.gather.nxv16i8.nxv16p0(<vscale x 16 x ptr> align 1 [[TMP14]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP15:%.*]] = getelementptr i8, ptr [[C:%.*]], i64 [[TMP7]]
+; IF-EVL-NEXT:    [[TMP16:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[TMP17:%.*]] = mul i64 0, [[TMP16]]
+; IF-EVL-NEXT:    [[TMP18:%.*]] = sub i64 1, [[TMP16]]
+; IF-EVL-NEXT:    [[TMP19:%.*]] = getelementptr i8, ptr [[TMP15]], i64 [[TMP17]]
+; IF-EVL-NEXT:    [[TMP20:%.*]] = getelementptr i8, ptr [[TMP19]], i64 [[TMP18]]
+; IF-EVL-NEXT:    [[VP_REVERSE1:%.*]] = call <vscale x 16 x i8> @llvm.experimental.vp.reverse.nxv16i8(<vscale x 16 x i8> [[WIDE_MASKED_GATHER]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    call void @llvm.vp.store.nxv16i8.p0(<vscale x 16 x i8> [[VP_REVERSE1]], ptr align 1 [[TMP20]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP21:%.*]] = getelementptr i8, ptr [[D:%.*]], i64 [[TMP7]]
+; IF-EVL-NEXT:    [[TMP22:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[TMP23:%.*]] = mul i64 0, [[TMP22]]
+; IF-EVL-NEXT:    [[TMP24:%.*]] = sub i64 1, [[TMP22]]
+; IF-EVL-NEXT:    [[TMP25:%.*]] = getelementptr i8, ptr [[TMP21]], i64 [[TMP23]]
+; IF-EVL-NEXT:    [[TMP26:%.*]] = getelementptr i8, ptr [[TMP25]], i64 [[TMP24]]
+; IF-EVL-NEXT:    [[VP_REVERSE2:%.*]] = call <vscale x 16 x i8> @llvm.experimental.vp.reverse.nxv16i8(<vscale x 16 x i8> [[WIDE_MASKED_GATHER]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    call void @llvm.vp.store.nxv16i8.p0(<vscale x 16 x i8> [[VP_REVERSE2]], ptr align 1 [[TMP26]], <vscale x 16 x i1> splat (i1 true), i32 [[TMP6]])
+; IF-EVL-NEXT:    [[TMP27:%.*]] = zext i32 [[TMP6]] to i64
+; IF-EVL-NEXT:    [[INDEX_EVL_NEXT]] = add nuw i64 [[TMP27]], [[EVL_BASED_IV]]
+; IF-EVL-NEXT:    [[INDEX_NEXT]] = add nuw i64 [[INDEX]], [[TMP4]]
+; IF-EVL-NEXT:    [[TMP28:%.*]] = icmp eq i64 [[INDEX_NEXT]], [[N_VEC]]
+; IF-EVL-NEXT:    br i1 [[TMP28]], label [[MIDDLE_BLOCK:%.*]], label [[VECTOR_BODY]], !llvm.loop [[LOOP6:![0-9]+]]
+; IF-EVL:       middle.block:
+; IF-EVL-NEXT:    br i1 true, label [[EXIT:%.*]], label [[SCALAR_PH]]
+; IF-EVL:       scalar.ph:
+; IF-EVL-NEXT:    [[BC_RESUME_VAL:%.*]] = phi i64 [ [[TMP5]], [[MIDDLE_BLOCK]] ], [ 1024, [[ENTRY:%.*]] ]
+; IF-EVL-NEXT:    br label [[LOOP:%.*]]
+; IF-EVL:       loop:
+; IF-EVL-NEXT:    [[IV:%.*]] = phi i64 [ [[BC_RESUME_VAL]], [[SCALAR_PH]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; IF-EVL-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[A]], i64 [[IV]]
+; IF-EVL-NEXT:    [[X:%.*]] = load i8, ptr [[GEP_A]], align 1
+; IF-EVL-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[B]], i8 [[X]]
+; IF-EVL-NEXT:    [[Y:%.*]] = load i8, ptr [[GEP_B]], align 1
+; IF-EVL-NEXT:    [[GEP_C:%.*]] = getelementptr i8, ptr [[C]], i64 [[IV]]
+; IF-EVL-NEXT:    store i8 [[Y]], ptr [[GEP_C]], align 1
+; IF-EVL-NEXT:    [[GEP_D:%.*]] = getelementptr i8, ptr [[D]], i64 [[IV]]
+; IF-EVL-NEXT:    store i8 [[Y]], ptr [[GEP_D]], align 1
+; IF-EVL-NEXT:    [[IV_NEXT]] = add i64 [[IV]], -1
+; IF-EVL-NEXT:    [[CMP_NOT:%.*]] = icmp eq i64 [[IV]], 0
+; IF-EVL-NEXT:    br i1 [[CMP_NOT]], label [[EXIT]], label [[LOOP]], !llvm.loop [[LOOP7:![0-9]+]]
+; IF-EVL:       exit:
+; IF-EVL-NEXT:    ret void
+;
+; NO-VP-LABEL: @multiple_reverse_vector_pointer(
+; NO-VP-NEXT:  entry:
+; NO-VP-NEXT:    br label [[LOOP:%.*]]
+; NO-VP:       loop:
+; NO-VP-NEXT:    [[IV:%.*]] = phi i64 [ 1024, [[ENTRY:%.*]] ], [ [[IV_NEXT:%.*]], [[LOOP]] ]
+; NO-VP-NEXT:    [[GEP_A:%.*]] = getelementptr i8, ptr [[A:%.*]], i64 [[IV]]
+; NO-VP-NEXT:    [[X:%.*]] = load i8, ptr [[GEP_A]], align 1
+; NO-VP-NEXT:    [[GEP_B:%.*]] = getelementptr i8, ptr [[B:%.*]], i8 [[X]]
+; NO-VP-NEXT:    [[Y:%.*]] = load i8, ptr [[GEP_B]], align 1
+; NO-VP-NEXT:    [[GEP_C:%.*]] = getelementptr i8, ptr [[C:%.*]], i64 [[IV]]
+; NO-VP-NEXT:    store i8 [[Y]], ptr [[GEP_C]], align 1
+; NO-VP-NEXT:    [[GEP_D:%.*]] = getelementptr i8, ptr [[D:%.*]], i64 [[IV]]
+; NO-VP-NEXT:    store i8 [[Y]], ptr [[GEP_D]], align 1
+; NO-VP-NEXT:    [[IV_NEXT]] = add i64 [[IV]], -1
+; NO-VP-NEXT:    [[CMP_NOT:%.*]] = icmp eq i64 [[IV]], 0
+; NO-VP-NEXT:    br i1 [[CMP_NOT]], label [[EXIT:%.*]], label [[LOOP]]
+; NO-VP:       exit:
+; NO-VP-NEXT:    ret void
+;
+entry:
+  br label %loop
+
+loop:
+  %iv = phi i64 [ 1024, %entry ], [ %iv.next, %loop ]
+
+  %gep.a = getelementptr i8, ptr %a, i64 %iv
+  %x = load i8, ptr %gep.a
+
+  %gep.b = getelementptr i8, ptr %b, i8 %x
+  %y = load i8, ptr %gep.b
+
+  %gep.c = getelementptr i8, ptr %c, i64 %iv
+  store i8 %y, ptr %gep.c
+
+  %gep.d = getelementptr i8, ptr %d, i64 %iv
+  store i8 %y, ptr %gep.d
+
+  %iv.next = add i64 %iv, -1
+  %cmp.not = icmp eq i64 %iv, 0
+  br i1 %cmp.not, label %exit, label %loop
+
+exit:
+  ret void
+}

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 14, 2025

I think this should be enough to fix #122681, confirming now.

Edit, looks like this fixes it and #122682

@lukel97 lukel97 changed the title [VPlan] Fix mutating whilst iterating over users iterator in EVL transform [VPlan] Fix mutating whilst iterating over users in EVL transform Jan 14, 2025
Copy link
Contributor

@fhahn fhahn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@@ -1603,7 +1603,7 @@ static void transformRecipestoEVLRecipes(VPlan &Plan, VPValue &EVL) {
LLVMContext &Ctx = CanonicalIVType->getContext();
VPValue *AllOneMask = Plan.getOrAddLiveIn(ConstantInt::getTrue(Ctx));

for (VPUser *U : Plan.getVF().users()) {
for (VPUser *U : SmallVector<VPUser *>(Plan.getVF().users())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for (VPUser *U : SmallVector<VPUser *>(Plan.getVF().users())) {
for (VPUser *U : to_vector(Plan.getVF().users())) {

Should be possible to use to_vector I think

@@ -249,3 +249,121 @@ loopend:
ret void
}

define void @multiple_reverse_vector_pointer(ptr noalias %a, ptr noalias %b, ptr noalias %c, ptr noalias %d) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might be good to add a link to the GH issue as well here

@lukel97 lukel97 merged commit f925e54 into llvm:main Jan 14, 2025
5 of 7 checks passed
@Mel-Chen
Copy link
Contributor

Thanks!:)
I will try this patch tomorrow to see if it fixes the remaining fails in llvm test-suite.

@llvm-ci
Copy link
Collaborator

llvm-ci commented Jan 14, 2025

LLVM Buildbot has detected a new failure on builder llvm-clang-x86_64-expensive-checks-debian running on gribozavr4 while building llvm at step 6 "test-build-unified-tree-check-all".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/16/builds/11970

Here is the relevant piece of the build log for the reference
Step 6 (test-build-unified-tree-check-all) failure: test (failure)
******************** TEST 'LLVM :: tools/llvm-gsymutil/ARM_AArch64/macho-merged-funcs-dwarf.yaml' FAILED ********************
Exit Code: 1

Command Output (stdout):
--
Input file: /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.dSYM
Output file (aarch64): /b/1/llvm-clang-x86_64-expensive-checks-debian/build/test/tools/llvm-gsymutil/ARM_AArch64/Output/macho-merged-funcs-dwarf.yaml.tmp.default.gSYM
Loaded 3 functions from DWARF.
Loaded 3 functions from symbol table.
warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000030
addr=0x0000000000000248, file=  2, line=  5
addr=0x0000000000000254, file=  2, line=  7
addr=0x0000000000000258, file=  2, line=  9
addr=0x000000000000025c, file=  2, line=  8
addr=0x0000000000000260, file=  2, line= 11
addr=0x0000000000000264, file=  2, line= 10
addr=0x0000000000000268, file=  2, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000047
addr=0x0000000000000248, file=  3, line=  5
addr=0x0000000000000254, file=  3, line=  7
addr=0x0000000000000258, file=  3, line=  9
addr=0x000000000000025c, file=  3, line=  8
addr=0x0000000000000260, file=  3, line= 11
addr=0x0000000000000264, file=  3, line= 10
addr=0x0000000000000268, file=  3, line=  6


warning: same address range contains different debug info. Removing:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000047
addr=0x0000000000000248, file=  3, line=  5
addr=0x0000000000000254, file=  3, line=  7
addr=0x0000000000000258, file=  3, line=  9
addr=0x000000000000025c, file=  3, line=  8
addr=0x0000000000000260, file=  3, line= 11
addr=0x0000000000000264, file=  3, line= 10
addr=0x0000000000000268, file=  3, line=  6


In favor of this one:
[0x0000000000000248 - 0x0000000000000270): Name=0x00000001
addr=0x0000000000000248, file=  1, line=  5
addr=0x0000000000000254, file=  1, line=  7
addr=0x0000000000000258, file=  1, line=  9
addr=0x000000000000025c, file=  1, line=  8
addr=0x0000000000000260, file=  1, line= 11
addr=0x0000000000000264, file=  1, line= 10
...

@LiqinWeng
Copy link
Contributor

LiqinWeng commented Jan 15, 2025

I'm more curious about how you extracted miscompilation extracted from 525.x264_r,. Can you teach me?:)(
I ran the spec2017 of llvm-test-suit but there was no prompt information)@luke197

@LiqinWeng
Copy link
Contributor

LGTM

@lukel97
Copy link
Contributor Author

lukel97 commented Jan 15, 2025

I'm more curious about how you extracted miscompilation extracted from 525.x264_r

I first started with 502.gcc_r because it exits much more quickly when there's a miscompile. Then I used llvm/utils/rsp_bisect.py comparing a non-EVL build and an EVL build:

#!/bin/bash

./build/bin/clang --target=riscv64-linux-gnu -march=rva22u64_v -O3 -fuse-ld=lld  -o 502.gcc_r @502.gcc_r.rsp

# hacky: a good build will timeout, a miscompile will exit within 15 seconds
timeout 15 qemu-riscv64 -cpu rv64,v=on,vext_spec=v1.0 ./502.gcc_r 200.c -O3 -finline-limit=50000 -o /tmp/200.c.o

if [ $? -eq 124 ]
then
        exit 0
else
        exit 1
fi

# bisect with ./llvm/utils/rsp_bisect.py --test=502.gcc_r-rsp-bisect.sh --rsp=502.gcc_r.rsp --other-rel-path=../build.rva22u64_v-evl-O3

This reduced the miscompile down to the changes in one file, reload1.c.

From there I diffed the output assembly and noticed that there were some loops with what looked like to be reversed pointer vectors, so I changed VPlanTransforms::tryAddExplicitVectorLength to bail if it encountered any recipes with VPReverseVectorPointerRecipe.

Removing VPReverseVectorPointerRecipe support fixed 502.gcc_r, so I knew it lied somewhere there.

But the diff was still quite big in 502.gcc_r, so I diffed 525.x264_r with and without VPReverseVectorPointerRecipes, which was much easier to follow. There was only one diff and it was in a loop in the function quant_trellis_cabac.

There I was able to extract the bit of C from the sources that generated that loop, and after staring at the output for a while I noticed the error.

It took me a while to find this, but I hope that helps! It's much easier to debug miscompiles if they can be git-bisected back, but this wasn't the case unfortunately :)

@Mel-Chen
Copy link
Contributor

Thanks!:) I will try this patch tomorrow to see if it fixes the remaining fails in llvm test-suite.

I confirmed the remaining 4 fails in llvm test-suite are fixed by this patch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

RISC-V EVL tail folding failure on SPEC CPU 2017 502.gcc_r RISC-V EVL tail folding failure on SPEC CPU 2017 525.x264_r
6 participants