Skip to content

HWIntrinsics: FMA suboptimal codegen #12212

@saucecontrol

Description

@saucecontrol

This code:

av1 = Fma.MultiplyAdd(iv1, Sse.LoadVector128(mp + 4), av1);

currently compiles to:

lea         rbx,[rdi+10h]  
vfmadd132ps xmm4,xmm1,xmmword ptr [rbx]  
vmovaps     xmm1,xmm4  

Assuming dotnet/coreclr#22944 would eliminate the extra lea there, I believe this should be generating:

vfmadd231ps xmm1,xmm4,xmmword ptr [rdi+10h]  

It looks like the logic in genFMAIntrinsic is missing the fact the two non-contained arguments could be swapped here.

cc @tannergooding

category:cq
theme:hardware-intrinsics
skill-level:expert
cost:medium

Metadata

Metadata

Assignees

No one assigned

    Labels

    area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIenhancementProduct code improvement that does NOT require public API changes/additions

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions