[LV] Maximum VF does not consider scaled reductions

Reproducer: https://godbolt.org/z/4xf7c8GMM



It looks like the vectorizer has not yet been updated to consider scaled reductions (a.k.a. multiply-accumulate with extended operands) in the VF selection logic.  In this case, if my tracing through the debug output is correct, we consider the widest type in the loop to be an i32 and select a maximum VF to cost based on that.  This results in a loop which is running at 1/4 of the width it should be.  It's still more profitable than not using the zvqdotq (scaled reduction) lowering, but also isn't ideal.  

```
int doti32_i8_sext(char *a, char *b, int N) {
  int sum = 0;
  for (int i = 0; i < N; i++) {
    int a32 = a[i];
    int b32 = b[i];
    sum += a32 * b32;
  }
  return sum;
}
```

`clang --target=riscv64 -march=rv64gcv_zvqdotq0p0 -menable-experimental-extensions  dot.c -S -o - -O3`

```
# Relevant Loop Only
.LBB0_5:
        vsetvli a5, zero, e8, mf2, ta, ma
        vle8.v  v9, (a3)
        vle8.v  v10, (a4)
        add     a4, a4, t0
        vsetvli a5, zero, e32, mf2, ta, ma
        vqdotu.vv       v8, v10, v9
        add     a3, a3, t0
        bne     a4, a7, .LBB0_5
```

A better result would be:
```
	vsetvli	a3, zero, e32, m2, ta, ma
        ....
.LBB1_5:                                # %vector.body
                                        # =>This Inner Loop Header: Depth=1
	vl2r.v	v10, (a3)
	vl2r.v	v12, (a4)
	add	a4, a4, a5
	vqdotu.vv	v8, v12, v10
	add	a3, a3, a5
	bne	a4, a7, .LBB1_5
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LV] Maximum VF does not consider scaled reductions #141768

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[LV] Maximum VF does not consider scaled reductions #141768

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions