-
Notifications
You must be signed in to change notification settings - Fork 10.3k
Continue work for QueryCollection #32829
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
do | ||
{ | ||
var vec = Sse2.LoadVector128(pVec + i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this vectorization overkill?
Cf. #31594 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Vectorized is faster:
| Method | Value | Mean | Error | StdDev | Ratio | RatioSD |
|----------- |-------- |---------:|---------:|---------:|------:|--------:|
| Default | Len: 1 | 11.50 ns | 0.228 ns | 0.202 ns | 1.00 | 0.00 |
| Vectorized | Len: 1 | 11.08 ns | 0.160 ns | 0.133 ns | 0.96 | 0.02 |
| | | | | | | |
| Default | Len: 7 | 18.74 ns | 0.431 ns | 0.461 ns | 1.00 | 0.00 |
| Vectorized | Len: 7 | 16.97 ns | 0.393 ns | 0.453 ns | 0.91 | 0.03 |
| | | | | | | |
| Default | Len: 8 | 20.65 ns | 0.458 ns | 0.450 ns | 1.00 | 0.00 |
| Vectorized | Len: 8 | 13.44 ns | 0.231 ns | 0.205 ns | 0.65 | 0.02 |
| | | | | | | |
| Default | Len: 9 | 20.56 ns | 0.462 ns | 0.633 ns | 1.00 | 0.00 |
| Vectorized | Len: 9 | 13.82 ns | 0.308 ns | 0.258 ns | 0.67 | 0.02 |
| | | | | | | |
| Default | Len: 15 | 28.33 ns | 0.520 ns | 0.694 ns | 1.00 | 0.00 |
| Vectorized | Len: 15 | 19.75 ns | 0.377 ns | 0.334 ns | 0.70 | 0.02 |
| | | | | | | |
| Default | Len: 16 | 29.66 ns | 0.479 ns | 0.448 ns | 1.00 | 0.00 |
| Vectorized | Len: 16 | 14.55 ns | 0.258 ns | 0.216 ns | 0.49 | 0.01 |
| | | | | | | |
| Default | Len: 17 | 29.92 ns | 0.619 ns | 0.579 ns | 1.00 | 0.00 |
| Vectorized | Len: 17 | 14.95 ns | 0.281 ns | 0.312 ns | 0.50 | 0.01 |
| | | | | | | |
| Default | Len: 31 | 48.54 ns | 0.703 ns | 0.623 ns | 1.00 | 0.00 |
| Vectorized | Len: 31 | 24.05 ns | 0.188 ns | 0.147 ns | 0.50 | 0.01 |
| | | | | | | |
| Default | Len: 32 | 49.41 ns | 1.054 ns | 1.370 ns | 1.00 | 0.00 |
| Vectorized | Len: 32 | 20.02 ns | 0.454 ns | 0.424 ns | 0.40 | 0.02 |
| | | | | | | |
| Default | Len: 33 | 49.79 ns | 0.980 ns | 0.869 ns | 1.00 | 0.00 |
| Vectorized | Len: 33 | 20.12 ns | 0.470 ns | 0.594 ns | 0.41 | 0.02 |
Why it's faster for <= 8 length, so the scalar code-path, I don't know. Maybe alignment?
But it's stable accross multiple runs of the benchmark.
Benchmark code
using System;
using System.Buffers;
using System.Collections.Generic;
using System.Runtime.CompilerServices;
using System.Runtime.InteropServices;
using System.Runtime.Intrinsics;
using System.Runtime.Intrinsics.X86;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
BenchmarkRunner.Run<Bench>();
public class Bench
{
[ParamsSource(nameof(GetData))]
public Data Value { get; set; }
[Benchmark(Baseline = true)]
public string Default() => SpanHelper.ReplacePlusWithSpace(Value.Value);
[Benchmark]
public string Vectorized() => SpanHelperVectorized.ReplacePlusWithSpace(Value.Value);
public static IEnumerable<Data> GetData()
{
yield return new Data(1);
yield return new Data(7);
yield return new Data(8);
yield return new Data(9);
yield return new Data(15);
yield return new Data(16);
yield return new Data(17);
yield return new Data(31);
yield return new Data(32);
yield return new Data(33);
}
public class Data
{
public string Value { get; }
public Data(string value) => Value = value;
public Data(int length) : this(new string('+', length)) { }
public override string ToString() => $"Len: {Value.Length}";
}
}
public static class SpanHelper
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
public static class SpanHelperVectorized
{
private static readonly SpanAction<char, IntPtr> s_replacePlusWithSpace = ReplacePlusWithSpaceCore;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static unsafe string ReplacePlusWithSpace(ReadOnlySpan<char> span)
{
fixed (char* ptr = &MemoryMarshal.GetReference(span))
{
return string.Create(span.Length, (IntPtr)ptr, s_replacePlusWithSpace);
}
}
private static unsafe void ReplacePlusWithSpaceCore(Span<char> buffer, IntPtr state)
{
fixed (char* ptr = &MemoryMarshal.GetReference(buffer))
{
var input = (ushort*)state.ToPointer();
var output = (ushort*)ptr;
var i = (nint)0;
var n = (nint)(uint)buffer.Length;
if (Sse41.IsSupported && n >= Vector128<ushort>.Count)
{
var vecPlus = Vector128.Create((ushort)'+');
var vecSpace = Vector128.Create((ushort)' ');
do
{
var vec = Sse2.LoadVector128(input + i);
var mask = Sse2.CompareEqual(vec, vecPlus);
var res = Sse41.BlendVariable(vec, vecSpace, mask);
Sse2.Store(output + i, res);
i += Vector128<ushort>.Count;
} while (i <= n - Vector128<ushort>.Count);
}
for (; i < n; ++i)
{
if (input[i] != '+')
{
output[i] = input[i];
}
else
{
output[i] = ' ';
}
}
}
}
}
Last benchmark numbers: Edit: updated after bbcda61
I don't like that for the simple / short queries 16 bytes are allocted more (2 objects?), and at the moment I have no idea on how to avoid them, as these needs some touch to the adaptive dictionary maybe? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for the vectors, you can test with longer values to see if it really helps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/azp run
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
@halter73 I think this is good to go, but please do take a look at the vectors code. |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 2 pipeline(s). |
Thanks |
} | ||
} | ||
|
||
private static class SpanHelper |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like it should be in .NET proper.
* main: Handle more cases with the new entry point pattern (dotnet#33500) [main] Update dependencies from dotnet/runtime dotnet/efcore (dotnet#33560) Refactor LongPolling in Java to avoid stackoverflow (dotnet#33564) Optimize QueryCollection (dotnet#32829) Switch to in-org action (dotnet#33610) Improve Codespaces + C# extension interaction (dotnet#33614)
Bring #31594 (from @jkotalik) to an end.
It's his PR with my feedback left in the other PR.