-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Open
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization
Milestone
Description
With dotnet/coreclr#22944, the raw hardware intrinsics are able to take advantage of folding the memory load operation into the SIMD instruction itself.
However, this same optimization was not applied to Vector
and Vector<T>
more generally, even though they're using nearly identical codegen under the covers.
public static Vector<byte> M(Vector<byte> a, ref Vector<byte> b)
{
return Vector.Equals(a, b);
}
public static Vector256<byte> N(Vector256<byte> a, ref Vector256<byte> b)
{
return Avx2.CompareEqual(a, b);
}
; C.M(System.Numerics.Vector`1<Byte>, System.Numerics.Vector`1<Byte> ByRef)
L0000: vzeroupper
L0003: vmovupd ymm0, [rdx]
L0007: vmovupd ymm1, [r8] ; note the allocation of register ymm1
L000c: vpcmpeqb xmm0, xmm0, xmm1
L0010: vmovupd [rcx], ymm0
L0014: mov rax, rcx
L0017: vzeroupper
L001a: ret
; C.N(System.Runtime.Intrinsics.Vector256`1<Byte>, System.Runtime.Intrinsics.Vector256`1<Byte> ByRef)
L0000: vzeroupper
L0003: vmovupd ymm0, [rdx]
L0007: vpcmpeqb xmm0, xmm0, [r8] ; operation doesn't touch register ymm1
L000c: vmovupd [rcx], ymm0
L0010: mov rax, rcx
L0013: vzeroupper
L0016: ret
category:cq
theme:vector-codegen
skill-level:intermediate
cost:medium
Metadata
Metadata
Assignees
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMICLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIoptimization