Skip to content

Conversation

andrewjsaid
Copy link
Contributor

@andrewjsaid andrewjsaid commented Oct 5, 2024

LINQ's GroupBy operator adds elements from the IEnumerable<T> source into buckets called Grouping (Grouping<TKey, TElement> : IGrouping<TKey, TElement>). It uses a hashtable to identify the Grouping and then calls Add() to the grouping.

The current algorithm for a Grouping is to start with an array with a single element and resize it (double) as elements are added.

This proposed PR instead adds an InlineArray of size S directly into the grouping so that no arrays are allocated until the S + 1th element is added to the grouping. This of course increases the memory footprint of each Grouping but microbenchmarks from dotnet/performance show that overall it results in fewer bytes allocated and faster execution in cases where 100 integer elements are divided into 10 buckets as well into 100 buckets.

  • In the case of 10 larger groups the savings come from fewer array resizes due to the larger starting size of the inline array.
  • In the case of 100 smaller groups the savings are due to avoiding the 20 bytes overhead of an array (syncblock, method table pointer, length). This is a fixed cost and may mean larger total memory allocated for larger TElement types e.g. Guids, however it still reduces total number of objects allocated and thus the edges on the object graph.

Below I have shown benchmarks where S = 2, 3 and 4. As S increases, the size of the Grouping class increases but the arrays grow faster leading to fewer arrays allocated for large buckets.

I have also benchmarked a different change where instead of using an inline array like the existing PR, the change is to start with a larger array than length 1. As the main advantage of this approach is fewer array resizes, these benchmarks favour larger bucket scenarios at the expense of smaller bucket scenarios.

One change that had to be made for Inline Array to work is to remove the Trim method from Grouping. Up to now we have passed in the _elements array into resultsSelector, requiring it to be trimmed beforehand which results in an extra allocation. InlineArray can't be trimmed and resultsSelector accepts an IEnumerable<TElement> anyway, so instead of passing in the private _elements field I just pass in the grouping. This change should be reviewed carefully.

After reviewing the benchmarks, I propose this PR as is, with an Inline Array of size 3 to be added to Grouping<TKey, TElement> which improves both larger and smaller bucket scenarios.

Benchmarks

I added a benchmark to Perf.Enumerable in my local dotnet/performance as defined below. In this benchmark the
source is Enumerable.Range(0, 100). This creates a single element per group which is different from the existing
GroupBy benchmark which uses % 10 to create 10 buckets of 10 elements

[Benchmark]
[ArgumentsSource(nameof(IEnumerableArgument))]
public void GroupBy_SingleElementPerGroup(LinqTestData input) => input.Collection.GroupBy(x => x).Consume(_consumer);

Benchmarks run via:

dotnet run -c Release -f net9.0 `
  --filter System.Linq.Tests.Perf_Enumerable*GroupBy* `
  --corerun `
    D:\Code\_forks\dotnet-runtime-base\artifacts\bin\testhost\net10.0-windows-Release-x64\shared\Microsoft.NETCore.App\10.0.0\corerun.exe `
    D:\Code\_forks\dotnet-runtime-pr\artifacts\bin\testhost\net10.0-windows-Release-x64\shared\Microsoft.NETCore.App\10.0.0\corerun.exe

Benchmark Results (Inline Array)

Size = 2

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1.142 us 0.0212 us 0.0208 us 1.142 us 1.1025 us 1.189 us 1.00 0.03 0.2008 - 3.34 KB 1.00
GroupBy pr IEnumerable 1.012 us 0.0190 us 0.0178 us 1.011 us 0.9858 us 1.040 us 0.89 0.02 0.1697 - 2.79 KB 0.84
GroupBy_SingleElementPerGroup main IEnumerable 3.341 us 0.0568 us 0.0531 us 3.343 us 3.2334 us 3.445 us 1.00 0.02 0.6513 0.0136 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 2.890 us 0.0424 us 0.0376 us 2.877 us 2.8270 us 2.953 us 0.87 0.02 0.5055 0.0126 8.41 KB 0.78

Size = 3

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1,160.3 ns 6.93 ns 5.79 ns 1,157.7 ns 1,153.9 ns 1,173.5 ns 1.00 0.01 0.1997 - 3.34 KB 1.00
GroupBy pr IEnumerable 956.5 ns 17.01 ns 15.08 ns 954.6 ns 936.0 ns 986.3 ns 0.82 0.01 0.1312 - 2.16 KB 0.65
GroupBy_SingleElementPerGroup main IEnumerable 3,314.4 ns 64.48 ns 68.99 ns 3,302.6 ns 3,216.6 ns 3,443.7 ns 1.00 0.03 0.6562 0.0134 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 2,969.2 ns 31.88 ns 29.82 ns 2,957.2 ns 2,938.3 ns 3,031.1 ns 0.90 0.02 0.5126 0.0125 8.41 KB 0.78

Size = 4

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1,133.8 ns 14.43 ns 12.05 ns 1,132.6 ns 1,115.6 ns 1,158.6 ns 1.00 0.01 0.1997 - 3.34 KB 1.00
GroupBy pr IEnumerable 984.7 ns 5.43 ns 4.24 ns 985.4 ns 974.3 ns 989.1 ns 0.87 0.01 0.1488 - 2.48 KB 0.74
GroupBy_SingleElementPerGroup main IEnumerable 3,311.6 ns 56.52 ns 52.87 ns 3,305.4 ns 3,229.1 ns 3,411.9 ns 1.00 0.02 0.6505 0.0130 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 2,959.8 ns 38.26 ns 31.95 ns 2,954.0 ns 2,879.3 ns 3,012.5 ns 0.89 0.02 0.5554 0.0118 9.2 KB 0.85

Benchmark Results (Larger Starting Size Array)

Size = 2

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1.144 us 0.0206 us 0.0193 us 1.146 us 1.112 us 1.187 us 1.00 0.02 0.2012 - 3.34 KB 1.00
GroupBy pr IEnumerable 1.065 us 0.0169 us 0.0150 us 1.063 us 1.040 us 1.089 us 0.93 0.02 0.1816 - 3.02 KB 0.91
GroupBy_SingleElementPerGroup main IEnumerable 3.352 us 0.0401 us 0.0375 us 3.356 us 3.293 us 3.438 us 1.00 0.02 0.6564 0.0153 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 3.310 us 0.0416 us 0.0369 us 3.308 us 3.259 us 3.369 us 0.99 0.02 0.6489 0.0130 10.76 KB 1.00

Size = 3

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1,134.6 ns 17.87 ns 15.84 ns 1,133.5 ns 1,110.5 ns 1,171.0 ns 1.00 0.02 0.2015 - 3.34 KB 1.00
GroupBy pr IEnumerable 991.1 ns 7.34 ns 6.13 ns 990.7 ns 979.9 ns 1,000.0 ns 0.87 0.01 0.1482 - 2.48 KB 0.74
GroupBy_SingleElementPerGroup main IEnumerable 3,238.1 ns 19.92 ns 16.63 ns 3,231.4 ns 3,215.3 ns 3,271.7 ns 1.00 0.01 0.6571 0.0129 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 3,323.6 ns 58.17 ns 62.24 ns 3,304.9 ns 3,243.3 ns 3,437.1 ns 1.03 0.02 0.7048 0.0266 11.54 KB 1.07

Size = 4

Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1.165 us 0.0219 us 0.0205 us 1.168 us 1.1302 us 1.200 us 1.00 0.02 0.2013 - 3.34 KB 1.00
GroupBy pr IEnumerable 1.013 us 0.0137 us 0.0115 us 1.014 us 0.9895 us 1.030 us 0.87 0.02 0.1658 - 2.71 KB 0.81
GroupBy_SingleElementPerGroup main IEnumerable 3.355 us 0.0471 us 0.0504 us 3.356 us 3.2637 us 3.436 us 1.00 0.02 0.6446 0.0150 10.76 KB 1.00
GroupBy_SingleElementPerGroup pr IEnumerable 3.370 us 0.0674 us 0.0749 us 3.359 us 3.2599 us 3.544 us 1.00 0.03 0.6996 0.0264 11.54 KB 1.07

@ghost ghost added the area-System.Linq label Oct 5, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Oct 5, 2024
@huoyaoyuan
Copy link
Member

I'm concerned about the scope and extensibility of this change. It's effectively recreating small_vector<T> and use it specifically for GroupBy. My concerns are:

  • Why not creating a more generic small_vector<T>, but just for GroupBy?
  • Since the C# language incorporation is still poor and may evolve in the future, is it worthy to manually implement the operations verbosely?
  • Should small size of groups be favored at all?

@andrewjsaid
Copy link
Contributor Author

andrewjsaid commented Oct 6, 2024

This change does add some complexity to improve performance, by implementing lower level concepts. However one could argue that this is already happening as Grouping uses Array instead of List for exactly this trade-off. The decision then becomes one where the maintainers must decide.

Creating a more generic small_vector<T> could make sense but it's outside of my comfort zone and so I'll leave it to others to decide on what that would look like.

As for the point on small size groups be favored. It's interesting because this change actually has the biggest improvement for medium sized groups as it avoids the allocations of arrays size 1, 2 and 4. For small groups there's some unused memory in the inline array and for large groups these 3 allocations get lost in the noise.

Of course if we wanted to keep complexity low and didn't mind extra unused memory for small group sizes then we could just bump the initial array size to 4 in a single-character code change which according to the benchmarks above improves perf for the GroupBy with buckets of length 10 and doesn't slow down buckets of length 1 besides allocating extra memory.

Out of curiosity I also attempted to implement an alternative algorithm which allocates no extra memory allocations by loading temporary data into a buffer (SegmentedArrayBuilder) to calculate the correct group sizes. Interestingly enough we did reduce memory allocations but suffered a ~13% performance hit so I discounted this approach despite the reduction of memory allocation to 46% (at least in it's current form).

        internal static Lookup<TKey, TElement> Create(IEnumerable<TElement> source, Func<TElement, TKey> keySelector, IEqualityComparer<TKey>? comparer)
        {
            Debug.Assert(source is not null);
            Debug.Assert(keySelector is not null);

            SegmentedArrayBuilder<GroupingElementAssignment>.ScratchBuffer scratch = default;
            SegmentedArrayBuilder<GroupingElementAssignment> arrayBuilder = new(scratch);

            CollectionLookup<TKey, TElement> lookup = new (comparer);

            foreach (TElement item in source)
            {
                TKey key = keySelector(item);
                Grouping<TKey, TElement> grouping = lookup.GetGrouping(key, create: true)!;
                grouping._count++;

                arrayBuilder.Add(new GroupingElementAssignment
                {
                    Element = item,
                    Grouping = grouping
                });
            }

            Grouping<TKey, TElement> g = lookup._lastGrouping!;
            if (g is not null)
            {
                do
                {
                    g = g._next!;
                    g._elements = new TElement[g._count];
                    g._count = 0;
                } while (g != lookup._lastGrouping);
            }

            int enumeratorState = 0;
            while (arrayBuilder.MoveNextSegment(ref enumeratorState, out bool moveNext) is ReadOnlySpan<GroupingElementAssignment> nextSegment && moveNext)
            {
                foreach (GroupingElementAssignment assignment in nextSegment)
                {
                    assignment.Grouping._elements[assignment.Grouping._count++] = assignment.Element;
                }
            }

            arrayBuilder.Dispose();

            return lookup;
        }
Method Toolchain input Mean Error StdDev Median Min Max Ratio RatioSD Gen0 Gen1 Allocated Alloc Ratio
GroupBy main IEnumerable 1.134 us 0.0195 us 0.0162 us 1.128 us 1.111 us 1.173 us 1.00 0.02 0.2010 - 3.34 KB 1.00
GroupBy idea IEnumerable 1.269 us 0.0229 us 0.0235 us 1.268 us 1.240 us 1.320 us 1.12 0.03 0.0914 - 1.54 KB 0.46
GroupBy_SingleElementPerGroup main IEnumerable 3.308 us 0.0378 us 0.0335 us 3.297 us 3.273 us 3.388 us 1.00 0.01 0.6469 0.0132 10.76 KB 1.00
GroupBy_SingleElementPerGroup idea IEnumerable 3.781 us 0.0592 us 0.0554 us 3.785 us 3.678 us 3.878 us 1.14 0.02 0.6517 0.0152 10.76 KB 1.00

@eiriktsarpalis
Copy link
Member

Hey @andrewjsaid, apologies for the delay in reviewing this. Is there a chance you could rebase your changes on top of the latest main? I don't have write permissions on your PR branch.

@eiriktsarpalis eiriktsarpalis added the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 1, 2025
@dotnet-policy-service dotnet-policy-service bot removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Jul 1, 2025
@andrewjsaid
Copy link
Contributor Author

Hi @eiriktsarpalis thank you for revisiting my PR. I have merged the main branch into my code.

While reviewing this PR, I would emphasize reading the analysis before making final judgement, especially this following part.

Of course if we wanted to keep complexity low and didn't mind extra unused memory for small group sizes then we could just bump the initial array size to 4 in a single-character code change which according to the benchmarks above improves perf for the GroupBy with buckets of length 10 and doesn't slow down buckets of length 1 besides allocating extra memory.

@eiriktsarpalis
Copy link
Member

Thanks. I just went through the code and benchmarks and from my perspective the performance improvements seem too marginal to justify the potential for code bloat that this change introduces. Seems like increasing the starting array seems like a reasonable enough compromise that doesn't involve a huge code change.

@andrewjsaid
Copy link
Contributor Author

@eiriktsarpalis Thank you for your considered review.

If you let me know which starting array size you prefer based on the benchmarks in the original commit message, I can create a follow-up PR and re-run benchmarks etc.

@eiriktsarpalis
Copy link
Member

I leave it up to you. From my perspective picking 4 seems reasonable because we use that in other collections as well.

@andrewjsaid andrewjsaid closed this Jul 3, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Aug 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Linq community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants