Skip to content

Continue work for QueryCollection #32829

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Jun 17, 2021
Merged

Continue work for QueryCollection #32829

merged 17 commits into from
Jun 17, 2021

Conversation

gfoidl
Copy link
Member

@gfoidl gfoidl commented May 18, 2021

Bring #31594 (from @jkotalik) to an end.
It's his PR with my feedback left in the other PR.

|            Method |     Categories |        Mean |     Error |      StdDev |        Median |         Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------ |--------------- |------------:|----------:|------------:|--------------:|-------------:|------:|------:|------:|----------:|
|          ParseNew |    QueryString | 7,064.13 ns | 462.74 ns | 1,305.15 ns | 7,000.0000 ns |    141,560.2 |     - |     - |     - |     656 B |
| QueryHelpersParse |    QueryString | 8,168.48 ns | 563.58 ns | 1,589.59 ns | 7,750.0000 ns |    122,421.8 |     - |     - |     - |     888 B |
|                   |                |             |           |             |               |              |       |       |       |           |
|          ParseNew |         Single | 4,889.25 ns | 315.91 ns |   896.18 ns | 4,600.0000 ns |    204,530.5 |     - |     - |     - |     448 B |
| QueryHelpersParse |         Single | 6,276.34 ns | 552.40 ns | 1,567.08 ns | 5,800.0000 ns |    159,328.4 |     - |     - |     - |     432 B |
|                   |                |             |           |             |               |              |       |       |       |           |
|          ParseNew | SingleWithPlus | 4,484.88 ns | 293.60 ns |   798.75 ns | 4,200.0000 ns |    222,971.2 |     - |     - |     - |     472 B |
| QueryHelpersParse | SingleWithPlus | 5,773.56 ns | 389.77 ns | 1,067.00 ns | 5,500.0000 ns |    173,203.3 |     - |     - |     - |     520 B |
|                   |                |             |           |             |               |              |       |       |       |           |
|       Constructor |    Constructor |    43.82 ns |  23.05 ns |    63.88 ns |     0.0000 ns | 22,820,512.8 |     - |     - |     - |     144 B |

@gfoidl gfoidl requested review from jkotalik and Tratcher as code owners May 18, 2021 20:47
@ghost ghost added area-runtime community-contribution Indicates that the PR has been added by a community member labels May 18, 2021

do
{
var vec = Sse2.LoadVector128(pVec + i);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this vectorization overkill?
Cf. #31594 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vectorized is faster:

|     Method |   Value |     Mean |    Error |   StdDev | Ratio | RatioSD |
|----------- |-------- |---------:|---------:|---------:|------:|--------:|
|    Default |  Len: 1 | 11.50 ns | 0.228 ns | 0.202 ns |  1.00 |    0.00 |
| Vectorized |  Len: 1 | 11.08 ns | 0.160 ns | 0.133 ns |  0.96 |    0.02 |
|            |         |          |          |          |       |         |
|    Default |  Len: 7 | 18.74 ns | 0.431 ns | 0.461 ns |  1.00 |    0.00 |
| Vectorized |  Len: 7 | 16.97 ns | 0.393 ns | 0.453 ns |  0.91 |    0.03 |
|            |         |          |          |          |       |         |
|    Default |  Len: 8 | 20.65 ns | 0.458 ns | 0.450 ns |  1.00 |    0.00 |
| Vectorized |  Len: 8 | 13.44 ns | 0.231 ns | 0.205 ns |  0.65 |    0.02 |
|            |         |          |          |          |       |         |
|    Default |  Len: 9 | 20.56 ns | 0.462 ns | 0.633 ns |  1.00 |    0.00 |
| Vectorized |  Len: 9 | 13.82 ns | 0.308 ns | 0.258 ns |  0.67 |    0.02 |
|            |         |          |          |          |       |         |
|    Default | Len: 15 | 28.33 ns | 0.520 ns | 0.694 ns |  1.00 |    0.00 |
| Vectorized | Len: 15 | 19.75 ns | 0.377 ns | 0.334 ns |  0.70 |    0.02 |
|            |         |          |          |          |       |         |
|    Default | Len: 16 | 29.66 ns | 0.479 ns | 0.448 ns |  1.00 |    0.00 |
| Vectorized | Len: 16 | 14.55 ns | 0.258 ns | 0.216 ns |  0.49 |    0.01 |
|            |         |          |          |          |       |         |
|    Default | Len: 17 | 29.92 ns | 0.619 ns | 0.579 ns |  1.00 |    0.00 |
| Vectorized | Len: 17 | 14.95 ns | 0.281 ns | 0.312 ns |  0.50 |    0.01 |
|            |         |          |          |          |       |         |
|    Default | Len: 31 | 48.54 ns | 0.703 ns | 0.623 ns |  1.00 |    0.00 |
| Vectorized | Len: 31 | 24.05 ns | 0.188 ns | 0.147 ns |  0.50 |    0.01 |
|            |         |          |          |          |       |         |
|    Default | Len: 32 | 49.41 ns | 1.054 ns | 1.370 ns |  1.00 |    0.00 |
| Vectorized | Len: 32 | 20.02 ns | 0.454 ns | 0.424 ns |  0.40 |    0.02 |
|            |         |          |          |          |       |         |
|    Default | Len: 33 | 49.79 ns | 0.980 ns | 0.869 ns |  1.00 |    0.00 |
| Vectorized | Len: 33 | 20.12 ns | 0.470 ns | 0.594 ns |  0.41 |    0.02 |

Why it's faster for <= 8 length, so the scalar code-path, I don't know. Maybe alignment?
But it's stable accross multiple runs of the benchmark.

Benchmark code
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

BenchmarkRunner.Run<Bench>();

public class Bench
{
    [ParamsSource(nameof(GetData))]
    public Data Value { get; set; }

    [Benchmark(Baseline = true)]
    public string Default() => SpanHelper.ReplacePlusWithSpace(Value.Value);

    [Benchmark]
    public string Vectorized() => SpanHelperVectorized.ReplacePlusWithSpace(Value.Value);

    public static IEnumerable<Data> GetData()
    {
        yield return new Data(1);
        yield return new Data(7);
        yield return new Data(8);
        yield return new Data(9);
        yield return new Data(15);
        yield return new Data(16);
        yield return new Data(17);
        yield return new Data(31);
        yield return new Data(32);
        yield return new Data(33);
    }

    public class Data
    {
        public string Value { get; }

        public Data(string value) => Value = value;
        public Data(int length) : this(new string('+', length)) { }

        public override string ToString() => $"Len: {Value.Length}";
    }
}

public static class SpanHelper
{
    private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(span))
        {
            return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
        }
    }

    private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
        {
            var input = (ushort*)state.ToPointer();
            var output = (ushort*)ptr;

            var i = (nint)0;
            var n = (nint)(uint)buffer.Length;

            for (; i < n; ++i)
            {
                if (input[i] != '+')
                {
                    output[i] = input[i];
                }
                else
                {
                    output[i] = ' ';
                }
            }
        }
    }
}

public static class SpanHelperVectorized
{
    private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;

    [MethodImpl(MethodImplOptions.AggressiveInlining)]
    public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(span))
        {
            return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
        }
    }

    private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
    {
        fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
        {
            var input = (ushort*)state.ToPointer();
            var output = (ushort*)ptr;

            var i = (nint)0;
            var n = (nint)(uint)buffer.Length;

            if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
            {
                var vecPlus = Vector128.Create((ushort)'+');
                var vecSpace = Vector128.Create((ushort)' ');

                do
                {
                    var vec = Sse2.LoadVector128(input + i);
                    var mask = Sse2.CompareEqual(vec, vecPlus);
                    var res = Sse41.BlendVariable(vec, vecSpace, mask);
                    Sse2.Store(output + i, res);
                    i += Vector128<ushort>.Count;
                } while (i <= n - Vector128<ushort>.Count);
            }

            for (; i < n; ++i)
            {
                if (input[i] != '+')
                {
                    output[i] = input[i];
                }
                else
                {
                    output[i] = ' ';
                }
            }
        }
    }
}

@gfoidl gfoidl requested a review from BrennanConroy as a code owner May 19, 2021 12:30
@gfoidl
Copy link
Member Author

gfoidl commented May 19, 2021

Last benchmark numbers:

Edit: updated after bbcda61

|            Method |     Categories |        Mean |       Error |     StdDev |        Op/s | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------ |--------------- |------------:|------------:|-----------:|------------:|------:|------:|------:|----------:|
|          ParseNew |    QueryString | 11,784.5 ns | 1,184.37 ns | 3,436.1 ns |    84,857.0 |     - |     - |     - |     560 B |
| QueryHelpersParse |    QueryString | 13,828.3 ns | 1,556.29 ns | 4,564.3 ns |    72,315.6 |     - |     - |     - |     792 B |
|                   |                |             |             |            |             |       |       |       |           |
|          ParseNew |         Single |  8,467.2 ns |   704.20 ns | 1,997.7 ns |   118,102.7 |     - |     - |     - |     352 B |
| QueryHelpersParse |         Single | 11,621.6 ns | 1,525.80 ns | 4,426.6 ns |    86,046.3 |     - |     - |     - |     336 B |
|                   |                |             |             |            |             |       |       |       |           |
|          ParseNew | SingleWithPlus |  9,111.0 ns |   871.45 ns | 2,443.7 ns |   109,757.6 |     - |     - |     - |     376 B |
| QueryHelpersParse | SingleWithPlus | 12,923.5 ns | 1,647.52 ns | 4,805.9 ns |    77,378.6 |     - |     - |     - |     424 B |
|                   |                |             |             |            |             |       |       |       |           |
|          ParseNew |        Encoded | 11,065.6 ns | 1,427.14 ns | 4,117.6 ns |    90,370.0 |     - |     - |     - |     392 B |
| QueryHelpersParse |        Encoded | 15,443.2 ns | 1,430.73 ns | 4,105.0 ns |    64,753.6 |     - |     - |     - |     376 B |
|                   |                |             |             |            |             |       |       |       |           |
|       Constructor |    Constructor |    359.7 ns |    99.61 ns |   282.6 ns | 2,780,269.1 |     - |     - |     - |      48 B |

I don't like that for the simple / short queries 16 bytes are allocted more (2 objects?), and at the moment I have no idea on how to avoid them, as these needs some touch to the adaptive dictionary maybe?

@gfoidl gfoidl changed the title Reduce allocations in QueryCollection (Pt. II) Continue work for QueryCollection May 19, 2021
Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the vectors, you can test with longer values to see if it really helps.

@gfoidl gfoidl requested a review from Pilchie as a code owner May 20, 2021 12:13
Copy link
Member

@Tratcher Tratcher left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/azp run

@Tratcher
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@Tratcher
Copy link
Member

@halter73 I think this is good to go, but please do take a look at the vectors code.

@Tratcher
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@Tratcher Tratcher self-assigned this Jun 15, 2021
@Tratcher
Copy link
Member

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@Tratcher Tratcher merged commit dc7e80b into dotnet:main Jun 17, 2021
@ghost ghost added this to the 6.0-preview7 milestone Jun 17, 2021
@Tratcher
Copy link
Member

Thanks

@gfoidl gfoidl deleted the dictMorePlaces branch June 18, 2021 13:45
}
}

private static class SpanHelper
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels like it should be in .NET proper.

jeffl8n added a commit to jeffl8n/aspnetcore that referenced this pull request Sep 18, 2021
* main:
  Handle more cases with the new entry point pattern (dotnet#33500)
  [main] Update dependencies from dotnet/runtime dotnet/efcore (dotnet#33560)
  Refactor LongPolling in Java to avoid stackoverflow (dotnet#33564)
  Optimize QueryCollection (dotnet#32829)
  Switch to in-org action (dotnet#33610)
  Improve Codespaces + C# extension interaction (dotnet#33614)
@amcasey amcasey added area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions and removed area-runtime labels Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-networking Includes servers, yarp, json patch, bedrock, websockets, http client factory, and http abstractions community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants