Skip to content

SimplifyCFG speculation transformation is blocking vectorization in LoopVectorize #104023

Open
@bjope

Description

@bjope

After commit 47d831f (#100514) I noticed a regression in a downstream benchmark, due to a loop no longer being vectorized. It seems like the changed cost of "llvm.umin" (from 2 to 1, given that the operation was legal for the target) impacted SimplifyCFG in a way that allowed a transform that simplified the control flow by speculating the umin together with a select statement. Unfortunately, that new code emitted by SimplifyCFG is for some reason not recognized by the loop vectorizer.

Here is an IR example cfg.ll to show what happens:

define i32 @foo(ptr %0, i32 %1) {
  br label %5

3:                                                ; preds = %14
  %4 = phi i32 [ %15, %14 ]
  ret i32 %4

5:                                                ; preds = %2, %14
  %6 = phi i64 [ 0, %2 ], [ %16, %14 ]
  %7 = phi i32 [ 128, %2 ], [ %15, %14 ]
  %8 = getelementptr inbounds i32, ptr %0, i64 %6
  %9 = load i32, ptr %8, align 4
  %10 = icmp sgt i32 %9, %1
  br i1 %10, label %11, label %14

11:                                               ; preds = %5
  %12 = trunc nuw nsw i64 %6 to i32
  %13 = tail call i32 @llvm.umin.i32(i32 %7, i32 %12)
  br label %14

14:                                               ; preds = %5, %11
  %15 = phi i32 [ %13, %11 ], [ %7, %5 ]
  %16 = add nuw nsw i64 %6, 1
  %17 = icmp eq i64 %16, 128
  br i1 %17, label %3, label %5
}

If only running the vectorizer we get:

> opt -mtriple x86_64 -passes='loop-vectorize' cfg.ll -S -o - | grep umin
  %11 = call <4 x i32> @llvm.umin.v4i32(<4 x i32> %vec.phi, <4 x i32> %vec.ind)
  %12 = call <4 x i32> @llvm.umin.v4i32(<4 x i32> %vec.phi1, <4 x i32> %step.add)
  %rdx.minmax = call <4 x i32> @llvm.umin.v4i32(<4 x i32> %predphi, <4 x i32> %predphi4)
  %16 = call i32 @llvm.vector.reduce.umin.v4i32(<4 x i32> %rdx.minmax)
  %27 = tail call i32 @llvm.umin.i32(i32 %21, i32 %26)

But if first running simplifycfg there is no vectorization:

> build-all/bin/opt -mtriple x86_64 -passes='simplifycfg,loop-vectorize' cfg.ll -S -o - | grep umin
  %12 = tail call i32 @llvm.umin.i32(i32 %7, i32 %11)

The transform done by simplifycfg result in this IR:

define i32 @foo(ptr %0, i32 %1) {
  br label %5

3:                                                ; preds = %5
  %4 = phi i32 [ %13, %5 ]
  ret i32 %4

5:                                                ; preds = %5, %2
  %6 = phi i64 [ 0, %2 ], [ %14, %5 ]
  %7 = phi i32 [ 128, %2 ], [ %13, %5 ]
  %8 = getelementptr inbounds i32, ptr %0, i64 %6
  %9 = load i32, ptr %8, align 4
  %10 = icmp sgt i32 %9, %1
  %11 = trunc nuw nsw i64 %6 to i32
  %12 = tail call i32 @llvm.umin.i32(i32 %7, i32 %11)
  %13 = select i1 %10, i32 %12, i32 %7
  %14 = add nuw nsw i64 %6, 1
  %15 = icmp eq i64 %14, 128
  br i1 %15, label %3, label %5
}

And loop-vectorize complains like this with -debug:

LV: Checking a loop in 'foo' from cfg.ll
LV: Loop hints: force=? width=0 interleave=0
LV: Found a loop: 
LV: Found an induction variable.
LV: PHI is not a poly recurrence.
LV: PHI is not a poly recurrence.
LV: Not vectorizing: Found an unidentified PHI   %7 = phi i32 [ 128, %2 ], [ %13, %5 ]
LV: Interleaving disabled by the pass manager
LV: Can't vectorize the instructions or CFG
LV: Not vectorizing: Cannot prove legality.

Is this some kind of limitation/bug in loop-vectorize? Or is it a phase ordering problem?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions