Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 10, 2025

When we have "VN == CNS" assertion, we can replace ARR_LENGTH tree with that CNS if ARR_LENTH has that VN. Perhaps, we can replace other kinds of trees with CNS, but seems to be profitable only ARR_LENGTH now (there are lots of candidates for IND node as well, but with mixed diffs).

Normally, it's not needed when ARR_LENGTH is CSE'd so our existing logic knows how to do that replacement for LCL_VAR, but sometimes it's not CSE'd.

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 10, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Copy link
Member

@tannergooding tannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Would the same propagation be valid for LCL_FLD that is tagged for span.Length? I think most other special fields we import and handle directly as constant, where relevant.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 10, 2025

LGTM.

Would the same propagation be valid for LCL_FLD that is tagged for span.Length? I think most other special fields we import and handle directly as constant, where relevant.

Yeah we do this kind of folding for VAR/FLD today, the problem (and the SPMI diffs highlight that) is that it often leads to size regression, e.g. we have

if (x == 42)
{
    Foo(x)
}

here this transformation changes Foo(x) to Foo(42) and when we already had x loaded into a register, we now need an additional instruction to populate 42 arg. So we basically want to do this only when we know that x will unlock some other foldings..

So while this transformation does what I needed, namely, unlocks more late unrollings for memcpy/memmove, it brings too many size regressions :(

@tannergooding
Copy link
Member

So while this transformation does what I needed, namely, unlocks more late unrollings for memcpy/memmove, it brings too many size regressions :(

👍. It notably looks like this might be an instruction count reduction, even if it isn't a size reduction (based on Arm64 vs x64 diffs).

I think it would be beneficial if we could have extra information here showing Improvements (instructions) Regressions (instructions) as well, so we can see when it's just encoding differences:

Contexts with diffs	Improvements	Regressions	Same size	Improvements (bytes)	Regressions (bytes)

A number of the size regressions look to be a from inlining some call that uses SIMD. It isn't clear what call this is as the diff just lists [<unknown method>], but it's functionally doing (a ^ b) | (c ^ d) and then checking the result:

+       vmovups  zmm0, zmmword ptr [rax]
+       vmovups  zmm1, zmmword ptr [rax+0x16]
+       vmovups  zmm2, zmmword ptr [rcx]
+       vpxorq   zmm1, zmm1, zmmword ptr [rcx+0x16]
+       vpternlogq zmm1, zmm2, zmm0, -10
+       vptestmq k1, zmm1, zmm1
+       kortestb k1, k1
+       sete     al

when we already had x loaded into a register, we now need an additional instruction to populate 42 arg

Do you think this is another example of the LSRA not reusing existing constants issue?

I know we have a number of CSE considerations around integer constants and how that's special cased. We likewise have a few in almost the inverse direction for floating-point/simd.

Perhaps there's some simple heuristic we can use to help decide, such as whether it fits in byte (and so is likely to be contained) or for SIMD the existing CanBenefitFromConstantProp and ShouldConstantProp checks, to help identify where reuse is better vs where containment is likely to occur.

@EgorBo EgorBo closed this Jul 16, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Aug 16, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants