Skip to content

[SelectionDAG][x86] Shuffle pyramid is not eliminated #144227

@sutajo

Description

@sutajo

https://godbolt.org/z/zEonMr7da

The above example illustrates the problem.

There are 2 minimum reductions in this snippet, but only the second is compiled into the efficient form:

vextracti128 xmm1, ymm0, 1
vpminuw xmm0, xmm0, xmm1
vphminposuw xmm0, xmm0

The first one is computed with shuffles and mins:

vextracti128    xmm2, ymm0, 1
vpminuw xmm2, xmm0, xmm2
vpshufd xmm3, xmm2, 238                 # xmm3 = xmm2[2,3,2,3]
vpminuw xmm3, xmm2, xmm3
vpshufd xmm4, xmm3, 85                  # xmm4 = xmm3[1,1,1,1]
vpminuw xmm3, xmm3, xmm4
vpsrld  xmm4, xmm3, 16
vphminposuw     xmm2, xmm2

Normally, llvm.vector.reduce.umin.v16i16 is converted into a series of vector_shuffle and umin operations in the SelectionDAG.

Initial selection DAG: %bb.0 'test_reduce_v16i16_with_sharing:start'
SelectionDAG has 47 nodes:
  t0: ch,glue = EntryToken
    t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t4: v16i16,ch = load<(load (s256))> t0, t2, undef:i64
    t9: v16i16 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t4, poison:v16i16
  t10: v16i16 = umin t4, t9
    t11: v16i16 = vector_shuffle<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> t10, poison:v16i16
  t12: v16i16 = umin t10, t11
    t13: v16i16 = vector_shuffle<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t12, poison:v16i16
  t14: v16i16 = umin t12, t13
  t17: i32 = Constant<0>
      t15: v16i16 = vector_shuffle<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t14, poison:v16i16
    t16: v16i16 = umin t14, t15
  t19: i16 = extract_vector_elt t16, Constant:i64<0>
  t21: v1i16 = insert_vector_elt poison:v1i16, t19, Constant:i64<0>
      t22: v16i16 = concat_vectors t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21, t21
    t24: v16i1 = setcc t4, t22, seteq:ch
      t6: i64,ch = CopyFromReg t0, Register:i64 %1
    t7: v16i16,ch = load<(load (s256))> t0, t6, undef:i64
    t26: v16i16 = BUILD_VECTOR Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>, Constant:i16<-1>
  t27: v16i16 = vselect t24, t7, t26
    t28: v16i16 = vector_shuffle<8,9,10,11,12,13,14,15,u,u,u,u,u,u,u,u> t27, poison:v16i16
  t29: v16i16 = umin t27, t28
    t30: v16i16 = vector_shuffle<4,5,6,7,u,u,u,u,u,u,u,u,u,u,u,u> t29, poison:v16i16
  t31: v16i16 = umin t29, t30
    t32: v16i16 = vector_shuffle<2,3,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t31, poison:v16i16
  t33: v16i16 = umin t31, t32
  t38: i16,i16 = merge_values undef:i16, undef:i16
    t39: i16,i16 = merge_values t19, undef:i16
        t34: v16i16 = vector_shuffle<1,u,u,u,u,u,u,u,u,u,u,u,u,u,u,u> t33, poison:v16i16
      t35: v16i16 = umin t33, t34
    t36: i16 = extract_vector_elt t35, Constant:i64<0>
  t40: i16,i16 = merge_values t39, t36
  t43: ch,glue = CopyToReg t0, Register:i16 $ax, t40
  t45: ch,glue = CopyToReg t43, Register:i16 $dx, t40:1, t43:1
  t46: ch = X86ISD::RET_GLUE t45, TargetConstant:i32<0>, Register:i16 $ax, Register:i16 $dx, t45:1

In the x86 backend, the combineExtractVectorElt function is supposed detect binary operation reductions and replace these with a more efficient implementation. The problem is that combineExtractVectorElt only replaces the result of extract_vector_elt (in this case t19 and 36), but it does not replace t16 and t35.

During the combination steps, t22 is transformed to vector_shuffle<0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0> t16, undef:v16i16, so it depends on t16, and thus it keeps the reduction chain alive.

A possbile solution to this would be:
Let's say N = extract_vector_elt(V,0) and V comes from a reduction.
If N gets replaced with M, we could replace V with scalar_to_vector(M).
This would eliminate all references to the inefficient reduction.
This is safe to do, since:

  • All elements of V expect from the zeroth are known to be undefined
  • M does not depend on V, so there will be no cycles.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions