Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Oct 29, 2021

draft pr. The goal is to achieve this codegen: https://godbolt.org/z/dEerMoE6M
when we access any C# array by index (variable).
Currently JIT emits for it:

            mov     w1, w1 ;; zero extend for index.
            lsl     x1, x1, #2
            add     x1, x1, #16
            ldr     w0, [x0, x1]

@ghost ghost added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 29, 2021
@ghost
Copy link

ghost commented Oct 29, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

draft pr

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@BruceForstall
Copy link
Contributor

Incorporating to godbolt data here, you suggest generating:

        add     x8, x0, w1, sxtw #2
        ldr     w0, [x8, #12]

One issue with this form is that we'll always need two instructions inside a loop; if w1 is the loop array index, neither instruction is hoistable.

#35618 suggested a form of generating array base + data offset (as a byref), which is hoistable out of a loop, and then generating ldr w0, [x0, w1, sxtw #2] in the loop body, so you only have one instruction in the loop.

@EgorBo
Copy link
Member Author

EgorBo commented Nov 2, 2021

Incorporating to godbolt data here, you suggest generating:

        add     x8, x0, w1, sxtw #2
        ldr     w0, [x8, #12]

One issue with this form is that we'll always need two instructions inside a loop; if w1 is the loop array index, neither instruction is hoistable.

#35618 suggested a form of generating array base + data offset (as a byref), which is hoistable out of a loop, and then generating ldr w0, [x0, w1, sxtw #2] in the loop body, so you only have one instruction in the loop.

Thanks, Bruce, yeah I keep that in mind. Actually it can be easily enabled even today in fgMorphArrayIndex. Currently we morph GT_INDEX into
(baseRef + ((indexRef * scaleCns) + dataOffsetCns)
while we can change it to
((baseRef + dataOffsetCns) + (indexRef * scaleCns)) if it's safe from GC's point of view and then (baseRef + dataOffsetCns) will be hoisted.

Alternatively, we can implement a more generic optimization for patterns like this:

"(invariantTree1 + X) + invariantTree2"  => "X + (invariantTree1  + invariantTree2)"

@EgorBo EgorBo closed this Nov 6, 2021
@EgorBo EgorBo reopened this Nov 6, 2021
@EgorBo EgorBo closed this Nov 7, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants