Skip to content

[LV][EVL] Incorrect behavior of fixed-order recurrence idiom with EVL tail folding #122461

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Mel-Chen opened this issue Jan 10, 2025 · 0 comments · Fixed by #124093
Closed

[LV][EVL] Incorrect behavior of fixed-order recurrence idiom with EVL tail folding #122461

Mel-Chen opened this issue Jan 10, 2025 · 0 comments · Fixed by #124093
Assignees

Comments

@Mel-Chen
Copy link
Contributor

When enabling EVL tail folding, the llvm.splice operation may encounter errors in the final iteration because the EVL in the second-to-last iteration might not equal VF * UF.
This could result in unexpected behavior, such as:

llvm.splice([A, B, C, poison], [D, E, poison, poison], -1) ==> [poison, D, E, poison]  

This issue was identified by the LLVM test-suite in SingleSource/UnitTests/Vectorizer/recurrences.test.

Checking first_order_recurrence
Checking second_order_recurrence
Checking third_order_recurrence
Miscompare

Currently, we have temporarily disabled this feature using #122458. It will be re-enabled after implementing the following fixes.

vector.ph:                                        ; preds = %for.body.preheader.i.i.i
  ...
  %max.vf.1 = tail call i32 @llvm.vscale.i32()
  %max.vf = shl nuw nsw i32 %max.vf.1, 2 
  br label %vector.body

vector.body:                                      ; preds = %vector.body, %vector.ph
  %index = phi i64 [ 0, %vector.ph ], [ %index.next, %vector.body ]
  %evl.based.iv = phi i64 [ 0, %vector.ph ], [ %index.evl.next, %vector.body ]

 ### Record the evl of previous iteration. Initialized by VF ###
  %prev.evl = phi i32 [ %max.vf, %vector.ph ], [ %17, %vector.body ]    

  %vector.recur = phi <vscale x 4 x i32> [ %vector.recur.init, %vector.ph ], [ %vp.op.load, %vector.body ]
  %vector.recur8 = phi <vscale x 4 x i32> [ %vector.recur.init7, %vector.ph ], [ %19, %vector.body ]
  %vector.recur10 = phi <vscale x 4 x i32> [ %vector.recur.init9, %vector.ph ], [ %20, %vector.body ]
  %avl = sub i64 %wide.trip.count.i.i.i, %evl.based.iv
  %17 = tail call i32 @llvm.experimental.get.vector.length.i64(i64 %avl, i32 4, i1 true)
  %18 = getelementptr inbounds nuw i32, ptr %__args.val, i64 %evl.based.iv
  %vp.op.load = tail call <vscale x 4 x i32> @llvm.vp.load.nxv4i32.p0(ptr align 4 %18, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6

### Replace llvm.splice with llvm.experimental.vp.splice. ###
  %19 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur, <vscale x 4 x i32> %vp.op.load, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
  %20 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur8, <vscale x 4 x i32> %19, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)
  %21 = tail call <vscale x 4 x i32> @llvm.experimental.vp.splice.nxv4i32(<vscale x 4 x i32> %vector.recur10, <vscale x 4 x i32> %20, i32 -1, <vscale x 4 x i1> splat (i1 true), i32 %prev.evl, i32 %17)

  %vp.op = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %20, <vscale x 4 x i32> %19, <vscale x 4 x i1> splat (i1 true), i32 %17)
  %vp.op11 = tail call <vscale x 4 x i32> @llvm.vp.add.nxv4i32(<vscale x 4 x i32> %vp.op, <vscale x 4 x i32> %21, <vscale x 4 x i1> splat (i1 true), i32 %17)
  %22 = getelementptr inbounds nuw i32, ptr %__args1.val, i64 %evl.based.iv
  tail call void @llvm.vp.store.nxv4i32.p0(<vscale x 4 x i32> %vp.op11, ptr align 4 %22, <vscale x 4 x i1> splat (i1 true), i32 %17), !tbaa !6
  %23 = zext i32 %17 to i64
  %index.evl.next = add nuw i64 %evl.based.iv, %23
  %index.next = add nuw i64 %index, %7
  %24 = icmp eq i64 %index.next, %n.vec
  br i1 %24, label %"_ZSt10__invoke_rIvRZ4mainE3$_1JPjS2_jEENSt9enable_ifIX16is_invocable_r_vIT_T0_DpT1_EES4_E4typeEOS5_DpOS6_.exit", label %vector.body, !llvm.loop !39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant