-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Add a packed variant of single-value SearchValues<string> #118108
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a packed variant of single-value SearchValues<string> #118108
Conversation
Tagging subscribers to this area: @dotnet/area-system-memory |
@EgorBot -aws_sapphirelake -azure_cascadelake -azure_milano -azure_genoa -azure_cobalt100 -azure_ampere using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers;
BenchmarkRunner.Run<SingleString>(args: args);
public class SingleString
{
private static readonly SearchValues<string> s_values = SearchValues.Create([Needle], StringComparison.Ordinal);
private static readonly SearchValues<string> s_valuesIC = SearchValues.Create([Needle], StringComparison.OrdinalIgnoreCase);
private static readonly string s_text_noMatches = new('a', Length);
private static readonly string s_text_falsePositives = string.Concat(Enumerable.Repeat("Sherlock Holm_s", Length / Needle.Length));
public const int Length = 100_000;
public const string Needle = "Sherlock Holmes";
[Benchmark] public void Throughput() => s_text_noMatches.AsSpan().Contains(Needle, StringComparison.Ordinal);
[Benchmark] public void SV_Throughput() => s_text_noMatches.AsSpan().ContainsAny(s_values);
[Benchmark] public void SV_ThroughputIC() => s_text_noMatches.AsSpan().ContainsAny(s_valuesIC);
[Benchmark] public void FalsePositives() => s_text_falsePositives.AsSpan().Contains(Needle, StringComparison.Ordinal);
[Benchmark] public void SV_FalsePositives() => s_text_falsePositives.AsSpan().ContainsAny(s_values);
[Benchmark] public void SV_FalsePositivesIC() => s_text_falsePositives.AsSpan().ContainsAny(s_valuesIC);
} |
1d10f74
to
500c4e3
Compare
a0f9571
to
2b0ba5f
Compare
This reverts commit 70ff070.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds a packed variant of single-value SearchValues to improve performance through vectorized string searching with input packing. The implementation processes double the number of characters per loop iteration by packing character inputs.
Key changes:
- Introduces a new packed implementation that leverages vectorized operations to double processing throughput
- Refactors the creation logic to automatically choose between packed and non-packed variants based on character compatibility
- Updates the project structure to include the new implementation
Reviewed Changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
StringSearchValues.cs | Refactors single value creation logic to choose between packed and non-packed implementations |
SingleStringSearchValuesThreeChars.cs | Updates constructor to accept precomputed character offsets and fixes comment accuracy |
SingleStringSearchValuesPackedThreeChars.cs | New packed implementation using vectorized operations for improved performance |
StringSearchValuesHelper.cs | Enhances equality comparison helpers to support both packed and non-packed variants |
AsciiStringSearchValuesTeddyBase.cs | Fixes comment describing algorithm complexity |
System.Private.CoreLib.Shared.projitems | Adds new packed implementation file to build |
StringSearchValues.cs (tests) | Improves test coverage for edge cases with null chars and non-ASCII values |
@MihuBot benchmark Regex_Industry https://github.com/MihaZupan/performance/tree/compiled-regex-only -medium |
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple
|
Almost a copy-paste of
SingleStringSearchValuesThreeChars
, but with input packing to double number of chars processed per loop iteration.SliceSlice on my machine (Zen5) looks promising enough:
Throughput on 100k chars with no matches:
Decent numbers across different CPUs: EgorBot/runtime-utils#454