-
Notifications
You must be signed in to change notification settings - Fork 5.2k
LINQ GroupBy Grouping can use InlineArray for first elements to avoid allocations #108569
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm concerned about the scope and extensibility of this change. It's effectively recreating
|
This change does add some complexity to improve performance, by implementing lower level concepts. However one could argue that this is already happening as Creating a more generic As for the point on small size groups be favored. It's interesting because this change actually has the biggest improvement for medium sized groups as it avoids the allocations of arrays size 1, 2 and 4. For small groups there's some unused memory in the inline array and for large groups these 3 allocations get lost in the noise. Of course if we wanted to keep complexity low and didn't mind extra unused memory for small group sizes then we could just bump the initial array size to Out of curiosity I also attempted to implement an alternative algorithm which allocates no extra memory allocations by loading temporary data into a buffer (SegmentedArrayBuilder) to calculate the correct group sizes. Interestingly enough we did reduce memory allocations but suffered a ~13% performance hit so I discounted this approach despite the reduction of memory allocation to 46% (at least in it's current form). internal static Lookup<TKey, TElement> Create(IEnumerable<TElement> source, Func<TElement, TKey> keySelector, IEqualityComparer<TKey>? comparer)
{
Debug.Assert(source is not null);
Debug.Assert(keySelector is not null);
SegmentedArrayBuilder<GroupingElementAssignment>.ScratchBuffer scratch = default;
SegmentedArrayBuilder<GroupingElementAssignment> arrayBuilder = new(scratch);
CollectionLookup<TKey, TElement> lookup = new (comparer);
foreach (TElement item in source)
{
TKey key = keySelector(item);
Grouping<TKey, TElement> grouping = lookup.GetGrouping(key, create: true)!;
grouping._count++;
arrayBuilder.Add(new GroupingElementAssignment
{
Element = item,
Grouping = grouping
});
}
Grouping<TKey, TElement> g = lookup._lastGrouping!;
if (g is not null)
{
do
{
g = g._next!;
g._elements = new TElement[g._count];
g._count = 0;
} while (g != lookup._lastGrouping);
}
int enumeratorState = 0;
while (arrayBuilder.MoveNextSegment(ref enumeratorState, out bool moveNext) is ReadOnlySpan<GroupingElementAssignment> nextSegment && moveNext)
{
foreach (GroupingElementAssignment assignment in nextSegment)
{
assignment.Grouping._elements[assignment.Grouping._count++] = assignment.Element;
}
}
arrayBuilder.Dispose();
return lookup;
}
|
Hey @andrewjsaid, apologies for the delay in reviewing this. Is there a chance you could rebase your changes on top of the latest main? I don't have write permissions on your PR branch. |
…jsaid/dotnet-runtime into linq-groupby-inline-array
Hi @eiriktsarpalis thank you for revisiting my PR. I have merged the main branch into my code. While reviewing this PR, I would emphasize reading the analysis before making final judgement, especially this following part.
|
Thanks. I just went through the code and benchmarks and from my perspective the performance improvements seem too marginal to justify the potential for code bloat that this change introduces. Seems like increasing the starting array seems like a reasonable enough compromise that doesn't involve a huge code change. |
@eiriktsarpalis Thank you for your considered review. If you let me know which starting array size you prefer based on the benchmarks in the original commit message, I can create a follow-up PR and re-run benchmarks etc. |
I leave it up to you. From my perspective picking 4 seems reasonable because we use that in other collections as well. |
LINQ's GroupBy operator adds elements from the
IEnumerable<T> source
into buckets calledGrouping
(Grouping<TKey, TElement> : IGrouping<TKey, TElement>
). It uses a hashtable to identify the Grouping and then callsAdd()
to the grouping.The current algorithm for a Grouping is to start with an array with a single element and resize it (double) as elements are added.
This proposed PR instead adds an InlineArray of size S directly into the grouping so that no arrays are allocated until the
S + 1
th element is added to the grouping. This of course increases the memory footprint of each Grouping but microbenchmarks from dotnet/performance show that overall it results in fewer bytes allocated and faster execution in cases where 100 integer elements are divided into 10 buckets as well into 100 buckets.Below I have shown benchmarks where S = 2, 3 and 4. As S increases, the size of the Grouping class increases but the arrays grow faster leading to fewer arrays allocated for large buckets.
I have also benchmarked a different change where instead of using an inline array like the existing PR, the change is to start with a larger array than length 1. As the main advantage of this approach is fewer array resizes, these benchmarks favour larger bucket scenarios at the expense of smaller bucket scenarios.
One change that had to be made for Inline Array to work is to remove the
Trim
method from Grouping. Up to now we have passed in the_elements
array intoresultsSelector
, requiring it to be trimmed beforehand which results in an extra allocation. InlineArray can't be trimmed andresultsSelector
accepts anIEnumerable<TElement>
anyway, so instead of passing in the private_elements
field I just pass in the grouping. This change should be reviewed carefully.After reviewing the benchmarks, I propose this PR as is, with an Inline Array of size 3 to be added to
Grouping<TKey, TElement>
which improves both larger and smaller bucket scenarios.Benchmarks
I added a benchmark to
Perf.Enumerable
in my local dotnet/performance as defined below. In this benchmark thesource is
Enumerable.Range(0, 100)
. This creates a single element per group which is different from the existingGroupBy benchmark which uses
% 10
to create 10 buckets of 10 elementsBenchmarks run via:
Benchmark Results (Inline Array)
Size = 2
Size = 3
Size = 4
Benchmark Results (Larger Starting Size Array)
Size = 2
Size = 3
Size = 4