Vector<T> operations don't take advantage of memory operands

With https://github.com/dotnet/coreclr/pull/22944, the raw hardware intrinsics are able to take advantage of folding the memory load operation into the SIMD instruction itself.

However, this same optimization was not applied to `Vector` and `Vector<T>` more generally, even though they're using nearly identical codegen under the covers.

```cs
public static Vector<byte> M(Vector<byte> a, ref Vector<byte> b)
{
    return Vector.Equals(a, b);
}

public static Vector256<byte> N(Vector256<byte> a, ref Vector256<byte> b)
{
    return Avx2.CompareEqual(a, b);
}
```

```asm
; C.M(System.Numerics.Vector`1<Byte>, System.Numerics.Vector`1<Byte> ByRef)
    L0000: vzeroupper
    L0003: vmovupd ymm0, [rdx]
    L0007: vmovupd ymm1, [r8]   ; note the allocation of register ymm1
    L000c: vpcmpeqb xmm0, xmm0, xmm1
    L0010: vmovupd [rcx], ymm0
    L0014: mov rax, rcx
    L0017: vzeroupper
    L001a: ret

; C.N(System.Runtime.Intrinsics.Vector256`1<Byte>, System.Runtime.Intrinsics.Vector256`1<Byte> ByRef)
    L0000: vzeroupper
    L0003: vmovupd ymm0, [rdx]
    L0007: vpcmpeqb xmm0, xmm0, [r8]   ; operation doesn't touch register ymm1
    L000c: vmovupd [rcx], ymm0
    L0010: mov rax, rcx
    L0013: vzeroupper
    L0016: ret
```

category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vector<T> operations don't take advantage of memory operands #13798

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Vector<T> operations don't take advantage of memory operands #13798

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions