-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
JIT loop cloning creates a "fast path" and a "slow path", where a set of checks is done to choose between them. Currently, those checks determine whether array bounds checks can be statically removed, in which case the fast path does remove some number of bounds checks.
This issue proposes generalizing the cloning conditions and "fast path" optimizations to include type tests.
Guarded devirtualization (GDV) will introduce type tests. If we see one or more of those in a loop and the tests are loop invariant and biased, we should consider adding those as cloning preconditions.
The canonical example is a foreach
over an IEnumerable<T>
. This will create an iterator object and make repeated interface calls to MoveNext
, get_Current
, etc.
When there is a dominant collection type (e.g., List<int>
), with GDV we will have multiple checks for the type of the enumerator that are highly biased towards the collections’ enumerator type. This enumerator object’s type is a loop invariant.
So a plausible implementation sketch is:
- Walk loop body looking for loop invariant type tests feeding branches that are highly biased (one successor cold). Collect up all the unique instances.
- If there are a sufficient number and other things look good, clone the loop.
- Add those tests to the predicate chain for the hot loop. This may require a null check followed by a type test for each.
Initially we can likely rely on redundant branch optimizations to clean up the now-redundant tests in the fast path and slow path loops, but we could also try cleaning those up during cloning.
A canonical test case would be a no-inline method taking an IEnumerable<int>
, say, adding up the elements via foreach
and returning the sum.
If we feed that with List<int>
we should see dramatically improved performance.
If we feed that with int[]
we should also see improved performance, assuming #62457 is resolved.
Baseline would be the performance of the same code where the arg has the concrete collection type and/or a version that iterates manually via for
.
Extra credit would be doing similar things if the operation is passed in via Func<T>
or similar… (devirtualize and also profile driven delegate opt)
Basics are now implemented via following PRs.
- JIT: defer some flow graph reordering until after loop recognition #69878
- JIT: fix invariant analysis for cloning #70126
- JIT: broaden cloning invariant checks #70232
- JIT: enable cloning based on loop invariant type tests #70377 (main PR)
- JIT: rework optCanonicalizeLoop #70792
- JIT: handle case where we are cloning adjacent sibling loops #70874
- JIT: avoid cloning mid-entry loops with multiple non-loop entry preds #70959
Areas for follow-up:
- better loop recognition. Note we don't need counted/iterable loops for type test cloning, we just need to know that there is a single entry
- better invariance analysis. Right now we do a simple IR walk that looks for local assignments.
- better profile maintenance.
- optimize the GDV predicates in the fast path loop directly
- "pessimize" the GDV predicate in the slow path loop directly (always execute the fall back)
category:cq
theme:loop-opt
skill-level:expert
cost:large
impact:large