Skip to content

Conversation

MihaZupan
Copy link
Member

@MihaZupan MihaZupan commented Jul 28, 2025

Almost a copy-paste of SingleStringSearchValuesThreeChars, but with input packing to double number of chars processed per loop iteration.

SliceSlice on my machine (Zen5) looks promising enough:

Method Toolchain Options Mean Error Ratio
Count main-default Compiled 188.1 ms 3.67 ms 1.00
Count main-noAvx512 Compiled 159.8 ms 3.10 ms 0.85
Count pr Compiled 126.0 ms 2.34 ms 0.67
Count main-default IgnoreCase, Compiled 361.9 ms 2.51 ms 1.00
Count main-noAvx512 IgnoreCase, Compiled 331.6 ms 0.93 ms 0.92
Count pr IgnoreCase, Compiled 304.5 ms 1.78 ms 0.84

Throughput on 100k chars with no matches:

Method Toolchain Mean Error Ratio
SV_Throughput main-noAvx512 2,683.332 ns 51.5568 ns 0.86
SV_Throughput main 3,129.936 ns 29.6291 ns 1.00
SV_Throughput pr 1,960.702 ns 34.3938 ns 0.63
SV_ThroughputIC main-noAvx512 2,682.815 ns 32.0081 ns 0.77
SV_ThroughputIC main 3,464.058 ns 68.5074 ns 1.00
SV_ThroughputIC pr 2,108.218 ns 40.3087 ns 0.61

Decent numbers across different CPUs: EgorBot/runtime-utils#454

@MihaZupan MihaZupan added this to the 10.0.0 milestone Jul 28, 2025
@MihaZupan MihaZupan self-assigned this Jul 28, 2025
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-memory
See info in area-owners.md if you want to be subscribed.

@MihaZupan
Copy link
Member Author

@EgorBot -aws_sapphirelake -azure_cascadelake -azure_milano -azure_genoa -azure_cobalt100 -azure_ampere

using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using System.Buffers;

BenchmarkRunner.Run<SingleString>(args: args);

public class SingleString
{
    private static readonly SearchValues<string> s_values = SearchValues.Create([Needle], StringComparison.Ordinal);
    private static readonly SearchValues<string> s_valuesIC = SearchValues.Create([Needle], StringComparison.OrdinalIgnoreCase);
    private static readonly string s_text_noMatches = new('a', Length);
    private static readonly string s_text_falsePositives = string.Concat(Enumerable.Repeat("Sherlock Holm_s", Length / Needle.Length));

    public const int Length = 100_000;
    public const string Needle = "Sherlock Holmes";

    [Benchmark] public void Throughput() => s_text_noMatches.AsSpan().Contains(Needle, StringComparison.Ordinal);
    [Benchmark] public void SV_Throughput() => s_text_noMatches.AsSpan().ContainsAny(s_values);
    [Benchmark] public void SV_ThroughputIC() => s_text_noMatches.AsSpan().ContainsAny(s_valuesIC);

    [Benchmark] public void FalsePositives() => s_text_falsePositives.AsSpan().Contains(Needle, StringComparison.Ordinal);
    [Benchmark] public void SV_FalsePositives() => s_text_falsePositives.AsSpan().ContainsAny(s_values);
    [Benchmark] public void SV_FalsePositivesIC() => s_text_falsePositives.AsSpan().ContainsAny(s_valuesIC);
}

@MihaZupan MihaZupan marked this pull request as ready for review July 28, 2025 21:05
@Copilot Copilot AI review requested due to automatic review settings July 28, 2025 21:05
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds a packed variant of single-value SearchValues to improve performance through vectorized string searching with input packing. The implementation processes double the number of characters per loop iteration by packing character inputs.

Key changes:

  • Introduces a new packed implementation that leverages vectorized operations to double processing throughput
  • Refactors the creation logic to automatically choose between packed and non-packed variants based on character compatibility
  • Updates the project structure to include the new implementation

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated no comments.

Show a summary per file
File Description
StringSearchValues.cs Refactors single value creation logic to choose between packed and non-packed implementations
SingleStringSearchValuesThreeChars.cs Updates constructor to accept precomputed character offsets and fixes comment accuracy
SingleStringSearchValuesPackedThreeChars.cs New packed implementation using vectorized operations for improved performance
StringSearchValuesHelper.cs Enhances equality comparison helpers to support both packed and non-packed variants
AsciiStringSearchValuesTeddyBase.cs Fixes comment describing algorithm complexity
System.Private.CoreLib.Shared.projitems Adds new packed implementation file to build
StringSearchValues.cs (tests) Improves test coverage for edge cases with null chars and non-ASCII values

@MihaZupan MihaZupan requested a review from stephentoub July 28, 2025 21:06
@MihaZupan
Copy link
Member Author

@MihuBot
Copy link

MihuBot commented Jul 29, 2025

System.Text.RegularExpressions.Tests.Perf_Regex_Industry_SliceSlice
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
MediumRun : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job=MediumRun  OutlierMode=Default  IterationCount=15
LaunchCount=2  MemoryRandomization=Default  MinIterationCount=3
WarmupCount=10
Method Toolchain Options Mean Error Ratio Allocated Alloc Ratio
Count Main Compiled 374.3 ms 0.28 ms 1.00 736 B 1.00
Count PR Compiled 292.9 ms 0.11 ms 0.78 536 B 0.73
Count Main IgnoreCase, Compiled 424.1 ms 2.05 ms 1.00 1072 B 1.00
Count PR IgnoreCase, Compiled 333.5 ms 0.28 ms 0.79 1072 B 1.00
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_RustLang_Sherlock
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
MediumRun : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job=MediumRun  OutlierMode=DontRemove  IterationCount=15
LaunchCount=2  MemoryRandomization=True  WarmupCount=10
Method Toolchain Pattern Mean Error Ratio Allocated Alloc Ratio
Count Main .* 539,907.31 ns 1,709.843 ns 1.00 2 B 1.00
Count PR .* 539,128.38 ns 1,868.960 ns 1.00 2 B 1.00
Count Main (?i)Holmes 53,940.91 ns 66.817 ns 1.00 - NA
Count PR (?i)Holmes 41,778.79 ns 139.344 ns 0.77 - NA
Count Main (?i)Sher[a-z]+|Hol[a-z]+ 118,348.78 ns 23,851.644 ns 1.09 - NA
Count PR (?i)Sher[a-z]+|Hol[a-z]+ 138,673.27 ns 38,464.872 ns 1.28 - NA
Count Main (?i)Sherlock 45,348.01 ns 92.803 ns 1.00 - NA
Count PR (?i)Sherlock 34,005.60 ns 121.721 ns 0.75 - NA
Count Main (?i)Sherlock Holmes 45,532.63 ns 87.486 ns 1.00 - NA
Count PR (?i)Sherlock Holmes 33,881.03 ns 100.627 ns 0.74 - NA
Count Main (?i)Sherlock|Holmes|Watson 122,106.95 ns 23,733.683 ns 1.09 1 B 1.00
Count PR (?i)Sherlock|Holmes|Watson 121,691.03 ns 24,374.047 ns 1.08 1 B 1.00
Count Main (?i)Sherlock|(...)er|John|Baker [49] 205,502.14 ns 33,977.542 ns 1.06 1 B 1.00
Count PR (?i)Sherlock|(...)er|John|Baker [49] 189,729.36 ns 21,068.613 ns 0.98 1 B 1.00
Count Main (?i)the 201,547.41 ns 3,735.179 ns 1.00 - NA
Count PR (?i)the 207,257.99 ns 4,122.475 ns 1.03 1 B NA
Count Main (?m)^Sherlock(...)rlock Holmes$ [37] 41,816.97 ns 82.351 ns 1.00 - NA
Count PR (?m)^Sherlock(...)rlock Holmes$ [37] 32,225.94 ns 92.261 ns 0.77 - NA
Count Main (?s).* 32.86 ns 0.090 ns 1.00 - NA
Count PR (?s).* 33.59 ns 0.586 ns 1.02 - NA
Count Main [^\\n]* 540,420.18 ns 2,504.041 ns 1.00 2 B 1.00
Count PR [^\\n]* 552,061.55 ns 9,382.229 ns 1.02 2 B 1.00
Count Main [a-q][^u-z]{13}x 23,209.67 ns 54.260 ns 1.00 - NA
Count PR [a-q][^u-z]{13}x 23,153.17 ns 75.146 ns 1.00 - NA
Count Main [a-zA-Z]+ing 3,335,613.03 ns 4,927.671 ns 1.00 11 B 1.00
Count PR [a-zA-Z]+ing 3,335,885.09 ns 6,215.490 ns 1.00 11 B 1.00
Count Main \b\w+n\b 6,321,686.90 ns 10,605.651 ns 1.00 22 B 1.00
Count PR \b\w+n\b 6,343,399.76 ns 27,467.705 ns 1.00 22 B 1.00
Count Main \p{L} 8,796,832.78 ns 17,509.674 ns 1.00 35 B 1.00
Count PR \p{L} 8,797,197.59 ns 27,362.624 ns 1.00 35 B 1.00
Count Main \p{Ll} 8,771,265.06 ns 5,176.800 ns 1.00 35 B 1.00
Count PR \p{Ll} 8,498,432.46 ns 10,116.231 ns 0.97 35 B 1.00
Count Main \p{Lu} 349,700.21 ns 14,572.458 ns 1.00 1 B 1.00
Count PR \p{Lu} 366,755.97 ns 9,147.756 ns 1.05 1 B 1.00
Count Main \s[a-zA-Z]{0,12}ing\s 3,448,458.45 ns 6,982.924 ns 1.00 11 B 1.00
Count PR \s[a-zA-Z]{0,12}ing\s 3,450,268.45 ns 2,538.315 ns 1.00 12 B 1.09
Count Main \w+ 4,061,743.91 ns 12,537.892 ns 1.00 21 B 1.00
Count PR \w+ 4,088,171.65 ns 5,799.888 ns 1.01 21 B 1.00
Count Main \w+\s+Holmes 2,793,928.42 ns 7,601.329 ns 1.00 11 B 1.00
Count PR \w+\s+Holmes 2,796,245.44 ns 6,758.263 ns 1.00 11 B 1.00
Count Main \w+\s+Holmes\s+\w+ 3,232,889.00 ns 54,367.837 ns 1.00 10 B 1.00
Count PR \w+\s+Holmes\s+\w+ 3,161,984.03 ns 18,354.471 ns 0.98 10 B 1.00
Count Main aei 38,908.27 ns 543.173 ns 1.00 - NA
Count PR aei 28,627.22 ns 342.810 ns 0.74 - NA
Count Main aqj 38,927.37 ns 524.873 ns 1.00 - NA
Count PR aqj 28,699.19 ns 348.472 ns 0.74 - NA
Count Main Holmes 50,322.60 ns 305.291 ns 1.00 - NA
Count PR Holmes 39,633.26 ns 300.353 ns 0.79 - NA
Count Main Holmes.{0,25}(...).{0,25}Holmes [39] 47,225.62 ns 72.458 ns 1.00 - NA
Count PR Holmes.{0,25}(...).{0,25}Holmes [39] 47,159.51 ns 85.498 ns 1.00 - NA
Count Main Sher[a-z]+|Hol[a-z]+ 48,386.05 ns 104.483 ns 1.00 - NA
Count PR Sher[a-z]+|Hol[a-z]+ 48,463.13 ns 125.895 ns 1.00 - NA
Count Main Sherlock 42,264.36 ns 197.596 ns 1.00 - NA
Count PR Sherlock 32,691.55 ns 78.898 ns 0.77 - NA
Count Main Sherlock Holmes 42,323.38 ns 89.958 ns 1.00 - NA
Count PR Sherlock Holmes 32,748.43 ns 65.522 ns 0.77 - NA
Count Main Sherlock\s+Holmes 42,616.51 ns 259.635 ns 1.00 - NA
Count PR Sherlock\s+Holmes 33,181.25 ns 86.274 ns 0.78 - NA
Count Main Sherlock|Holmes 46,322.69 ns 998.372 ns 1.00 - NA
Count PR Sherlock|Holmes 44,906.56 ns 56.498 ns 0.97 - NA
Count Main Sherlock|Holmes|Watson 58,716.31 ns 86.098 ns 1.00 - NA
Count PR Sherlock|Holmes|Watson 58,875.88 ns 526.804 ns 1.00 - NA
Count Main Sherlock|Holm(...)er|John|Baker [45] 88,140.26 ns 88.103 ns 1.00 - NA
Count PR Sherlock|Holm(...)er|John|Baker [45] 88,382.19 ns 205.548 ns 1.00 - NA
Count Main Sherlock|Street 25,051.13 ns 54.048 ns 1.00 - NA
Count PR Sherlock|Street 25,192.78 ns 69.049 ns 1.01 - NA
Count Main the 168,158.59 ns 1,113.835 ns 1.00 - NA
Count PR the 168,974.79 ns 503.872 ns 1.00 1 B NA
Count Main The 54,302.17 ns 278.874 ns 1.00 - NA
Count PR The 43,983.94 ns 67.218 ns 0.81 - NA
Count Main the\s+\w+ 241,719.92 ns 1,889.313 ns 1.00 1 B 1.00
Count PR the\s+\w+ 263,616.36 ns 3,124.466 ns 1.09 1 B 1.00
Count Main zqj 39,011.86 ns 543.390 ns 1.00 - NA
Count PR zqj 28,803.96 ns 302.455 ns 0.74 - NA
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Mariomkas
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
MediumRun : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job=MediumRun  IterationCount=15  LaunchCount=2
WarmupCount=10
Method Toolchain Pattern Mean Error Ratio Allocated Alloc Ratio
Ctor Main (?:(?:250-5]?[0-9][0-9]) [87] 19.31 μs 0.076 μs 1.00 30552 B 1.00
Ctor PR (?:(?:250-5]?[0-9][0-9]) [87] 19.48 μs 0.092 μs 1.01 30552 B 1.00
Count Main (?:(?:250-5]?[0-9][0-9]) [87] 2,676.67 μs 10.543 μs 1.00 15 B 1.00
Count PR (?:(?:250-5]?[0-9][0-9]) [87] 2,703.01 μs 72.154 μs 1.01 15 B 1.00
Ctor Main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] 15.90 μs 0.147 μs 1.00 23216 B 1.00
Ctor PR [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] 15.61 μs 0.051 μs 0.98 23216 B 1.00
Count Main [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] 766.02 μs 1.555 μs 1.00 4 B 1.00
Count PR [\w]+://[^/\s(...)?(?:#[^\\s]*)? [51] 767.49 μs 1.561 μs 1.00 3 B 0.75
Ctor Main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ 12.19 μs 0.038 μs 1.00 13888 B 1.00
Ctor PR [\w\.+-]+@[\w\.-]+\.[\w\.-]+ 12.22 μs 0.025 μs 1.00 13888 B 1.00
Count Main [\w\.+-]+@[\w\.-]+\.[\w\.-]+ 184.52 μs 0.208 μs 1.00 1 B 1.00
Count PR [\w\.+-]+@[\w\.-]+\.[\w\.-]+ 184.14 μs 0.248 μs 1.00 1 B 1.00
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_Leipzig
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
MediumRun : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job=MediumRun  OutlierMode=DontRemove  IterationCount=15
LaunchCount=2  MemoryRandomization=True  WarmupCount=10
Method Toolchain Pattern Mean Error Ratio Allocated Alloc Ratio
Count Main .{0,2}(Tom|Sawyer|Huckleberry|Finn) 186,760.1 μs 49.00 μs 1.00 979 B 1.00
Count PR .{0,2}(Tom|Sawyer|Huckleberry|Finn) 186,593.8 μs 250.95 μs 1.00 979 B 1.00
Count Main .{2,4}(Tom|Sawyer|Huckleberry|Finn) 192,195.4 μs 159.10 μs 1.00 984 B 1.00
Count PR .{2,4}(Tom|Sawyer|Huckleberry|Finn) 192,244.4 μs 148.35 μs 1.00 984 B 1.00
Count Main (?i)Tom|Sawyer|Huckleberry|Finn 2,933.5 μs 660.56 μs 1.13 7 B 1.00
Count PR (?i)Tom|Sawyer|Huckleberry|Finn 2,813.2 μs 671.94 μs 1.08 14 B 2.00
Count Main (?i)Twain 1,181.3 μs 10.01 μs 1.00 5 B 1.00
Count PR (?i)Twain 912.8 μs 5.04 μs 0.77 2 B 0.40
Count Main ([A-Za-z]awyer|[A-Za-z]inn)\s 12,778.9 μs 23.38 μs 1.00 45 B 1.00
Count PR ([A-Za-z]awyer|[A-Za-z]inn)\s 12,756.1 μs 4.75 μs 1.00 45 B 1.00
Count Main [a-z]shing 1,148.2 μs 4.20 μs 1.00 5 B 1.00
Count PR [a-z]shing 910.7 μs 7.56 μs 0.79 2 B 0.40
Count Main \p{Sm} 638.7 μs 9.43 μs 1.00 2 B 1.00
Count PR \p{Sm} 632.3 μs 4.55 μs 0.99 2 B 1.00
Count Main Huck[a-zA-Z]+|Saw[a-zA-Z]+ 1,555.1 μs 4.89 μs 1.00 6 B 1.00
Count PR Huck[a-zA-Z]+|Saw[a-zA-Z]+ 1,553.2 μs 1.11 μs 1.00 6 B 1.00
Count Main Tom.{10,25}river|river.{10,25}Tom 6,366.3 μs 23.64 μs 1.00 24 B 1.00
Count PR Tom.{10,25}river|river.{10,25}Tom 6,333.8 μs 1.94 μs 0.99 19 B 0.79
Count Main Tom|Sawyer|Huckleberry|Finn 2,625.4 μs 2.21 μs 1.00 11 B 1.00
Count PR Tom|Sawyer|Huckleberry|Finn 2,641.8 μs 11.55 μs 1.01 10 B 0.91
Count Main Twain 1,093.2 μs 1.26 μs 1.00 4 B 1.00
Count PR Twain 840.8 μs 4.40 μs 0.77 2 B 0.50
System.Text.RegularExpressions.Tests.Perf_Regex_Industry_BoostDocs_Simple
BenchmarkDotNet v0.14.1-nightly.20250107.205, Linux Ubuntu 22.04.5 LTS (Jammy Jellyfish)
AMD EPYC 9V74, 1 CPU, 8 logical and 4 physical cores
MediumRun : .NET 10.0.0 (42.42.42.42424), X64 RyuJIT AVX-512F+CD+BW+DQ+VL+VBMI
Job=MediumRun  OutlierMode=DontRemove  IterationCount=15
LaunchCount=2  MemoryRandomization=True  WarmupCount=10
Method Toolchain Id Mean Error Ratio Allocated Alloc Ratio
IsMatch Main 0 20.30 ns 0.140 ns 1.00 - NA
IsMatch PR 0 20.58 ns 0.075 ns 1.01 - NA
IsMatch Main 1 43.84 ns 0.042 ns 1.00 - NA
IsMatch PR 1 44.46 ns 0.350 ns 1.01 - NA
IsMatch Main 2 48.68 ns 0.223 ns 1.00 - NA
IsMatch PR 2 49.06 ns 0.137 ns 1.01 - NA
IsMatch Main 3 83.45 ns 1.160 ns 1.00 - NA
IsMatch PR 3 86.39 ns 3.957 ns 1.04 - NA
IsMatch Main 4 72.59 ns 0.365 ns 1.00 - NA
IsMatch PR 4 72.58 ns 0.362 ns 1.00 - NA
IsMatch Main 5 71.35 ns 0.297 ns 1.00 - NA
IsMatch PR 5 71.21 ns 0.061 ns 1.00 - NA
IsMatch Main 6 21.73 ns 0.127 ns 1.00 - NA
IsMatch PR 6 22.63 ns 1.279 ns 1.04 - NA
IsMatch Main 7 21.36 ns 0.053 ns 1.00 - NA
IsMatch PR 7 21.32 ns 0.043 ns 1.00 - NA
IsMatch Main 8 21.37 ns 0.029 ns 1.00 - NA
IsMatch PR 8 21.65 ns 0.233 ns 1.01 - NA
IsMatch Main 9 23.34 ns 0.055 ns 1.00 - NA
IsMatch PR 9 23.34 ns 0.099 ns 1.00 - NA
IsMatch Main 10 23.57 ns 0.016 ns 1.00 - NA
IsMatch PR 10 23.56 ns 0.019 ns 1.00 - NA
IsMatch Main 11 22.53 ns 0.019 ns 1.00 - NA
IsMatch PR 11 22.73 ns 0.182 ns 1.01 - NA
IsMatch Main 12 25.97 ns 0.024 ns 1.00 - NA
IsMatch PR 12 26.26 ns 0.239 ns 1.01 - NA
IsMatch Main 13 26.08 ns 0.125 ns 1.00 - NA
IsMatch PR 13 25.97 ns 0.020 ns 1.00 - NA

@MihaZupan MihaZupan merged commit c37e685 into dotnet:main Jul 31, 2025
139 of 143 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants