.NET 8 Per-Preview Performance report on WASM, Mono AOT, and Interpreter

This report provides an overview of the major performance improvements and regressions in WASM, Mono AOT, and Interpreter during the timeframe of .NET 8 per-preview releases. It focuses on relevant improvements and regressions that are either in progress or investigating, and they are tracked separately. Reports https://github.com/dotnet/runtime/issues/77490 and https://github.com/dotnet/runtime/issues/79288 track active speed and size regressions respectively.

Full benchmark report will be available in form similar to https://github.com/dotnet/runtime/issues/79245 and https://devblogs.microsoft.com/dotnet/performance_improvements_in_net_7/ when .NET 8 is released. 

## Setup

According to the https://github.com/dotnet/perf-autofiling-issues, the following configurations are used.

| Operating System    | Bit   | Processor Name                                |
| ------------------- | ----- | --------------------------------------------- |
| macOS 13.0          | Arm64 | Apple M1                                      |
| ubuntu 18.04        | X64   | Intel Xeon CPU E5-1650 v4 3.60GHz             |

More details on .NET performance benchmarking are available at https://github.com/dotnet/performance. 

# Preview 7

The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT compiler

The performance regressions and improvements are analyzed separately in #89238.

## Mono Interpreter

The following sections presents improvements and regressions introduced in Interpreter in the Preview 7.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 7.

Name | Baseline Value | Compare Value | % Difference
-|-|-|-
PerfLabTests.EnumPerf.EnumEquals | 646.25 | 229.29 | -64.52
System.Tests.Perf_Enum.ToString_NonFlags_Small(value: TopDirectoryOnly) | 633.28 | 235.90 | -62.74
"System.Tests.Perf_Enum.ToString_Format_Flags_Large(value: All |  format: ""g"")" | 667.24 | 271.04 | -59.37
System.Reflection.Attributes.IsDefinedClassHitInherit | 1315.59 | 562.93 | -57.21
System.Reflection.Activator\<EmptyStruct>.CreateInstanceGeneric | 721.39 | 330.82 | -54.14
System.Numerics.Tests.Perf_Vector4.SubtractOperatorBenchmark | 20.82 | 9.59 | -53.92
System.Reflection.Invoke.Method0_NoParms | 853.86 | 399.59 | -53.20
System.Numerics.Tests.Perf_Matrix4x4.CreateRotationZBenchmark | 78.54 | 40.02 | -49.03
System.Reflection.Attributes.IsDefinedMethodBaseMissInherit | 2512.81 | 1431.26 | -43.04
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByScalarBenchmark | 183.31 | 106.83 | -41.71
System.Tests.Perf_Enum.InterpolateIntoStringBuilder_Flags(value: 32) | 7501.15 | 4383.76 | -41.55
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark | 189.92 | 111.79 | -41.13
"System.IO.Tests.Perf_RandomAccess.ReadScatter(fileSize: 1048576 |  buffersSize: 16384 |  options: None)" | 400115.22 | 265189.08 | -33.72
System.Numerics.Tests.Perf_Matrix4x4.CreateRotationXWithCenterBenchmark | 90.04 | 60.34 | -32.98
"System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US |  IgnoreCase |  True))" | 1024.28 | 714.93 | -30.20
"System.Tests.Perf_Enum.StringFormat(value: Red |  Green)" | 7002.80 | 4942.10 | -29.42
"System.Tests.Perf_Enum.ToString_Flags(value: Red |  Orange |  Yellow |  Green |  Blue)" | 1272.44 | 922.39 | -27.50
System.Numerics.Tests.Perf_VectorOf\<Byte>.AddBenchmark | 11.28 | 8.19 | -27.44
System.Numerics.Tests.Perf_Vector4.DivideByScalarBenchmark | 30.25 | 21.97 | -27.36
System.Numerics.Tests.Perf_Vector2.EqualsBenchmark | 35.85 | 27.68 | -22.78

Vectorization of Vector4 in https://github.com/dotnet/runtime/pull/87822 improved over 100 microbenchmarks in https://github.com/dotnet/perf-autofiling-issues/issues/19758 and https://github.com/dotnet/perf-autofiling-issues/issues/19760.

Fix path for empty partition in Enumerable.Select in https://github.com/dotnet/runtime/pull/88425 improved EmptyTakeSelectToArray microbenchmarks as reported in https://github.com/dotnet/perf-autofiling-issues/issues/19761.

Improved BigInteger operators +, - and * for trivial cases in https://github.com/dotnet/runtime/pull/84733 improved some of BigInteger microbenchmarks in https://github.com/dotnet/perf-autofiling-issues/issues/19762.

Precomputing the CallInfo structure in https://github.com/dotnet/runtime/pull/88369 improved about 200 microbenchmarks.

The BCL change https://github.com/dotnet/runtime/pull/86287 and vectorization of Vector128 in https://github.com/dotnet/runtime/pull/88064 improved a dozen of Equals microbenchmarks.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 7.

Name | Baseline Value | Compare Value     | % Difference
-|-|-|-
System.Collections.CtorFromCollection\<String>.FrozenDictionary(Size: 512) | 44266.49 | 396363.53 | 795.40
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.EqualsAllBenchmark | 6.90 | 9.58 | 38.82
"Microsoft.Extensions.DependencyInjection.TimeToFirstService.Scoped(Mode: ""Expressions"")" | 49567.25 | 65031.35 | 31.19
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.BitwiseOrOperatorBenchmark | 9.62 | 12.45 | 29.41
System.Numerics.Tests.Perf_VectorOf\<SByte>.OnesComplementOperatorBenchmark | 6.04 | 7.80 | 29.23
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.AllBitsSetBenchmark | 2.04 | 2.61 | 28.32
System.Tests.Perf_GC\<Byte>.NewOperator_Array(length: 10000) | 4495.94 | 5733.46 | 27.52
System.Memory.Span\<Char>.SequenceEqual(Size: 33) | 85.83 | 108.56 | 26.49
System.Numerics.Tests.Perf_VectorOf\<Single>.AddOperatorBenchmark | 7.67 | 9.58 | 24.98
"Microsoft.Extensions.DependencyInjection.TimeToFirstService.Scoped(Mode: ""ILEmit"")" | 49928.88 | 62377.01 | 24.93
System.Memory.Constructors\<String>.SpanFromArray | 15.59 | 19.40 | 24.46
Microsoft.Extensions.DependencyInjection.ScopeValidation.TransientWithScopeValidation | 1815.08 | 2227.85 | 22.74
System.Numerics.Tests.Perf_VectorOf\<Int64>.EqualityOperatorBenchmark | 6.56 | 7.77 | 18.48
System.IO.Tests.Perf_File.CopyToOverwrite(size: 4096) | 47118.52 | 55507.12 | 17.80
"System.Tests.Perf_Decimal.TryParse(value: ""123456.789"")" | 895.48 | 1023.98 | 14.34
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.AllBitsSetBenchmark | 1.48 | 1.69 | 14.11
System.Numerics.Tests.Perf_VectorOf\<UInt16>.AndNotBenchmark | 9.16 | 10.44 | 13.96
System.Memory.Span\<Byte>.IndexOfValue(Size: 33) | 58.20 | 65.95 | 13.31
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrOperatorBenchmark | 7.62 | 8.61 | 12.96
"System.Tests.Perf_Int32.ParseSpan(value: ""2147483647"")" | 206.91 | 233.69 | 12.94


# Preview 6

The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT WASM

The following sections presents improvements and regressions introduced in Mono AOT WASM in the Preview 6.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 6.

Name | Baseline Value | Compare Value | % Difference
-|-|-|-
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark|0.38|0.00|-100
System.Numerics.Tests.Perf_Quaternion.NegationOperatorBenchmark|1.87|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.CountBenchmark|0.34|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.CountBenchmark|0.22|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt16>.InequalityOperatorBenchmark|0.97|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.CountBenchmark|0.29|0.00|-100
System.Tests.Perf_Enum.HasFlag|1.35|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt16>.EqualityOperatorBenchmark|2.28|0.01|\<|-99.62
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.CountBenchmark|0.22|0.00|-99.57
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.GreaterThanAllBenchmark|2.50|0.02|-99.35
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.UnaryNegateOperatorBenchmark|85.94|2.58|-97.00
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.UnaryNegateOperatorBenchmark|85.93|4.27|-95.02
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.UnaryNegateOperatorBenchmark|85.94|4.30|-94.99
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.UnaryNegateOperatorBenchmark|85.93|4.35|-94.94
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.LessThanOrEqualBenchmark|2.91|0.26|-91.04
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.EqualityOperatorBenchmark|2.26|0.25|-88.80
System.Numerics.Tests.Perf_Vector3.UnitZBenchmark|3.84|0.54|-85.93
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.BitwiseAndBenchmark|4.07|0.69|-83.07
System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorFloatBenchmark|20.82|3.59|-82.73
System.Net.Primitives.Tests.IPAddressPerformanceTests.TryWriteBytes(address: 1020:3040:5060:7080:9010:1112:1314:1516)|78.86|13.78|-82.52

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 6.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.CountBenchmark|0.00|0.14|26004.19
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.CountBenchmark|0.00|0.07|12106.45
System.Numerics.Tests.Perf_VectorOf\<Double>.CountBenchmark|0.09|3.36|3767.73
System.Numerics.Tests.Perf_VectorOf\<Single>.CountBenchmark|0.00|0.06|2106.86
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.AllBitsSetBenchmark|1.95|10.77|452.08
System.Numerics.Tests.Perf_VectorOf\<Single>.CountBenchmark|0.00|0.01|405.57
System.Numerics.Tests.Perf_VectorOf\<UInt16>.MaxBenchmark|0.75|3.50|365.24
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.DotBenchmark|0.87|3.58|312.42
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark|0.92|3.67|300.46
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark|0.92|3.55|286.90
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.DotBenchmark|0.78|2.61|236.42
System.Numerics.Tests.Perf_VectorOf\<SByte>.OnesComplementOperatorBenchmark|0.75|2.51|236.33
System.Numerics.Tests.Perf_VectorOf\<SByte>.BitwiseOrBenchmark|2.62|8.52|225.70
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.ZeroBenchmark|2.00|5.96|198.55
System.Numerics.Tests.Perf_VectorOf\<Int64>.ZeroBenchmark|1.98|5.88|196.21
System.Numerics.Tests.Perf_VectorOf\<UInt16>.MultiplyBenchmark|3.10|9.12|194.26
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark|0.98|2.75|180.71
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark|0.98|2.69|174.16
System.Numerics.Tests.Perf_VectorOf\<SByte>.UnaryNegateOperatorBenchmark|1.08|2.80|159.06
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.MinBenchmark|2.70|6.92|156.32

## Mono AOT compiler

The performance regressions and improvements are analyzed separately in https://github.com/dotnet/runtime/issues/89238.

## Mono Interpreter

The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 6.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 6.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Double>.CountBenchmark|0.00|0.00|-100
System.Numerics.Tests.Perf_VectorOf\<Int32>.CountBenchmark|0.02|0.00|-100
System.Numerics.Tests.Perf_VectorOf\<UInt32>.CountBenchmark|0.00|0.00|-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.CountBenchmark|0.40|0.00|-100
System.Numerics.Tests.Perf_VectorOf\<SByte>.OneBenchmark|76.06|1.57|-97.93
System.Numerics.Tests.Perf_VectorOf\<Byte>.OneBenchmark|76.01|1.87|-97.53
System.Numerics.Tests.Perf_VectorOf\<SByte>.NegateBenchmark|221.32|6.26|-97.16
System.Numerics.Tests.Perf_VectorOf\<SByte>.UnaryNegateOperatorBenchmark|221.61|6.27|-97.16
System.Numerics.Tests.Perf_VectorOf\<Byte>.UnaryNegateOperatorBenchmark|214.44|6.20|-97.10
System.Numerics.Tests.Perf_VectorOf\<Byte>.NegateBenchmark|214.55|6.37|-97.02
System.Numerics.Tests.Perf_VectorOf\<SByte>.SubtractBenchmark|231.29|7.90|-96.58
System.Numerics.Tests.Perf_VectorOf\<SByte>.SubtractionOperatorBenchmark|221.04|7.90|-96.42
System.Numerics.Tests.Perf_VectorOf\<UInt16>.OneBenchmark|50.92|1.83|-96.41
System.Numerics.Tests.Perf_VectorOf\<Byte>.AddBenchmark|216.21|7.83|-96.37
System.Numerics.Tests.Perf_VectorOf\<Byte>.SubtractBenchmark|214.79|7.79|-96.37
System.Numerics.Tests.Perf_VectorOf\<Byte>.SubtractionOperatorBenchmark|215.60|7.92|-96.32
System.Numerics.Tests.Perf_VectorOf\<SByte>.MultiplyOperatorBenchmark|225.86|8.35|-96.30
System.Numerics.Tests.Perf_VectorOf\<Byte>.AddOperatorBenchmark|209.41|7.95|-96.20
System.Numerics.Tests.Perf_VectorOf\<SByte>.MultiplyBenchmark|217.21|8.39|-96.13
System.Numerics.Tests.Perf_VectorOf\<SByte>.AddOperatorBenchmark|214.44|8.33|-96.11

Vectorization of `Vector<T> operators` in https://github.com/dotnet/perf-autofiling-issues/issues/18537 improved over 200 microbenchmarks. 

Changes in https://github.com/dotnet/runtime/pull/87219 introduced `Math.BigMul` in NextUInt64 random method and improved several microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/18690.

About 120 microbenchmarks were improved https://github.com/dotnet/perf-autofiling-issues/issues/19027 potentialy by https://github.com/dotnet/runtime/pull/87555 or other interpreter and BCL changes.

Fozen dictionary creation is improved by 72% in https://github.com/dotnet/runtime/pull/87510.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 6.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Int64>.CountBenchmark|0.01|0.23|2775.54
System.Numerics.Tests.Perf_VectorOf\<UInt64>.CountBenchmark|0.01|0.17|2177.17
System.Numerics.Tests.Perf_VectorOf\<UInt16>.ZeroBenchmark|2.24|4.95|121.29
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.EqualityOperatorBenchmark|7.65|16.63|117.46
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt16>.OnesComplementOperatorBenchmark|3.03|6.11|101.75
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.CountBenchmark|0.04|0.08|86.25
System.Numerics.Tests.Perf_VectorOf\<UInt64>.GreaterThanAllBenchmark|18.37|33.12|80.26
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get_EnumerateHeaders_Validated(ssl: True, chunkedResponse: False, responseLength: 100000)"|2230622.93|3965252.94|77.76
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.CountBenchmark|0.12|0.20|69.81
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: False, responseLength: 100000)"|2181340.94|3635706.61|66.67
System.Numerics.Tests.Perf_VectorOf\<Byte>.LessThanOrEqualAnyBenchmark|18.27|30.07|64.56
System.Numerics.Tests.Perf_Vector4.ZeroBenchmark|1.36|2.10|55.23
HardwareIntrinsics.RayTracer.SoA.Render|1.15|1.76|52.81
System.Numerics.Tests.Perf_Vector2.DivideByScalarBenchmark|13.77|20.17|46.46
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: True, responseLength: 100000)"|2621801.93|3807493.79|45.22
System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToLongBenchmark|64.48|89.74|39.17
System.Linq.Tests.Perf_Enumerable.WhereSingleOrDefault_LastElementMatches(input: Array)|2714.67|3708.23|36.59
System.Memory.Constructors_ValueTypesOnly\<Byte>.SpanFromPointerLength|6.95|9.47|36.28
Span.IndexerBench.CoveredIndex3(length: 1024)|16595.22|22106.92|33.21
"System.Buffers.Tests.RentReturnArrayPoolTests\<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False)"|867.68|1154.02|33.00

# Preview 5

There are a number of improvements introduced in Preview 5 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT compiler

The performance regressions and improvements are analyzed separately in https://github.com/dotnet/runtime/issues/89238.

## Mono Interpreter

The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 5.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 5.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Single>.CountBenchmark|	0.18 | 	0.00 |  	-100
System.Numerics.Tests.Perf_VectorOf\<UInt16>.CountBenchmark|	0.10 | 	0.00 |  	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.CountBenchmark | 	0.01 | 	0.00	 |	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.CountBenchmark | 	0.03 | 	0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.CountBenchmark | 	1.12 | 	0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt16>.CountBenchmark | 	0.22 | 	0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.CountBenchmark | 	0.08 | 	0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.CountBenchmark | 	0.48 |	0.00  	 | 	-99.74
System.Numerics.Tests.Perf_VectorOf\<UInt32>.CountBenchmark | 	0.14 |	0.00 	 | 	-99.30
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.CountBenchmark | 	2.36 | 	0.12	 | 	-95.07
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.DivideBenchmark | 	127.11 | 	7.82	 | 	-93.85
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.MultiplyOperatorBenchmark | 	123.89 | 	7.68	 | 	-93.80
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyBenchmark | 	126.45|	7.94  | 	-93.71
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyOperatorBenchmark | 	125.08 | 	7.87	 | 	-93.70
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivisionOperatorBenchmark | 	123.79 | 	7.83	 | 	-93.67
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivideBenchmark | 	126.19 | 	8.05	 | 	-93.62
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.MultiplyBenchmark | 	127.05 | 	8.23	 | 	-93.52
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.DivisionOperatorBenchmark | 	123.95 | 	8.22	 | 	-93.37
System.Numerics.Tests.Perf_VectorOf\<UInt64>.CountBenchmark | 	0.06 | 	0.01	 | 	-86.49
System.Collections.Tests.Perf_Dictionary.ContainsValue(Items: 3000) | 	483385521.57 | 	66414495.75	 | 	-86.26

Vectorization of IndexOf in https://github.com/dotnet/runtime/pull/85437 improved `System.Text.RegularExpressions` microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/17517. Addition of Vector128 and PackedSimd in https://github.com/dotnet/runtime/pull/82773 improved about 70 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/17563 and https://github.com/dotnet/perf-autofiling-issues/issues/17819.

Change in [Plane and Quaternion](https://github.com/dotnet/runtime/pull/86481) improved several microbenchmarks in https://github.com/dotnet/perf-autofiling-issues/issues/18043.

Change in https://github.com/dotnet/runtime/pull/85528 addressed performance problems with code like `EqualityComparer<T>.Default.Equals()` which improved over 200 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/18349. Implementation of `float32 Vector128.Equals` intrnsic improved `System.Numerics.Tests` microbenchmarks.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 5.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_Vector2.ZeroBenchmark	 | 0.03	 | 1.05	 | 3076.49
System.Numerics.Tests.Perf_VectorOf\<Double>.ZeroBenchmark	 | 2.96	 | 9.10	 | 207.86
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.BitwiseOrOperatorBenchmark | 	8.51	 | 21.64 | 	154.37
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt32>.GreaterThanOrEqualAnyBenchmark | 	24.29 | 	47.23 | 	94.44
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.InequalityOperatorBenchmark | 	3.94 | 	7.15 | 	81.24
System.Numerics.Tests.Perf_Plane.CreateFromVerticesBenchmark | 	76.92	 | 132.40 | 	72.12
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.ConditionalSelectBenchmark | 	11.14	 | 17.45 | 	56.64
System.Buffers.Tests.RentReturnArrayPoolTests\<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False)	 | 1877.78	 | 2918.99	 | 55.44
System.Diagnostics.Perf_Process.StartAndWaitForExit	 | 1286337.51 | 	1968645.19 | 	53.04
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.LessThanAllBenchmark | 	24.23	 | 36.78	 | 51.79
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.ZeroBenchmark	 | 2.99	 | 4.47	 | 49.41
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.SubtractionOperatorBenchmark | 	7.62	 | 11.13	 | 45.99
System.Memory.Span\<Char>.Reverse(Size: 512)	 | 789.89	 | 1116.00 | 	41.28
System.Buffers.Tests.RentReturnArrayPoolTests\<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False) | 	1963.38 | 	2745.38 | 	39.82
System.Numerics.Tests.Perf_VectorOf\<Single>.LessThanAllBenchmark	 | 59.72 | 	82.75	 | 38.57
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.EqualityOperatorBenchmark	 | 27.40 | 	37.64	 | 37.35
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False)) | 	6382.39 | 	8678.93	 | 35.98
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.OnesComplementBenchmark | 	6.38 | 	8.61 | 	34.98
System.Numerics.Tests.Perf_VectorOf\<Int64>.ZeroBenchmark	 | 2.81 | 	3.78	 | 34.72
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAllBenchmark	 | 26.61 | 	35.79	 | 34.51

# Preview 4

There are a number of improvements introduced in Preview 4 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT compiler

The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 4.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 4.

Name | Baseline Value | Compare Value | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<SByte>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Numerics.Tests.Perf_VectorOf\<UInt16>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Numerics.Tests.Perf_VectorOf\<UInt32>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.CountBenchmark | 	0.01 | 	0.00 | 	-100
System.Tests.Perf_DateTime.ToString(format: "s") | 	417.41 | 	103.88 | 	-75.11
System.Tests.Perf_DateTimeOffset.ToString(format: "s") | 	431.57 | 	114.37 | 	-73.49
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 100000) | 	25903.87 | 	7803.06 | 	-69.87
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 10000)	 | 	25653.57	 | 	7923.08 | 	-69.11
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 10000000) | 	24916.24 | 	7700.13 | 	-69.09
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 1000000) | 	25328.88 | 	7962.83 | 	-68.56
System.Collections.Tests.Add_Remove_SteadyState\<Int32>.Queue(Count: 512) | 	18.37 | 	8.31 | 	-54.78
System.Threading.Tests.Perf_Volatile.Read_double | 	0.26 | 	0.12 | 	-53.92
System.Numerics.Tests.Perf_VectorOf\<Byte>.ZeroBenchmark | 	5.66 | 	2.67 | 	-52.77
System.Net.Primitives.Tests.IPAddressPerformanceTests.TryFormat(address: 1020:3040:5060:7080:9010:1112:1314:1516) | 	243.27 | 	128.93 | 	-46.99
System.Numerics.Tests.Perf_Vector3.DistanceSquaredBenchmark | 	16.92 | 	9.15 | 	-45.90
System.Numerics.Tests.Perf_Vector3.DistanceBenchmark	 | 23.13 | 	13.70 | 	-40.79
PerfLabTests.EnumPerf.ObjectGetType	 | 0.03	 | 0.02 | 	-38.31
System.Numerics.Tests.Perf_Vector3.DivideByVector3OperatorBenchmark	| 17.44	| 10.91	|	-37.47

BCL changes in https://github.com/dotnet/runtime/pull/84210 and https://github.com/dotnet/runtime/pull/84210 improved `Guid.Parse` and vectorized all sets in `Regex`, as reported in https://github.com/dotnet/perf-autofiling-issues/issues/15183 and https://github.com/dotnet/perf-autofiling-issues/issues/15177.

Implementation of fast path for mini_init_method_rgctx in https://github.com/dotnet/runtime/pull/84226 improved over 50 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/15717, https://github.com/dotnet/perf-autofiling-issues/issues/15796, and https://github.com/dotnet/perf-autofiling-issues/issues/15799.

Intrinsics `get_Count` and `get_AllBitsSet` on arm64 improved around 400 microbenchmarks, as reported in https://github.com/dotnet/perf-autofiling-issues/issues/15800, https://github.com/dotnet/perf-autofiling-issues/issues/15718, and https://github.com/dotnet/perf-autofiling-issues/issues/15797.

Allow inlining methods containing constructor calls and Intrinsified additional calls to `Type:op_Equality` improved over 100 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/16371 and https://github.com/dotnet/perf-autofiling-issues/issues/16509.

V128 SIMD intrinsics on Arm64 across all codegen engines in https://github.com/dotnet/runtime/pull/84289 improved over 400 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/16460, https://github.com/dotnet/perf-autofiling-issues/issues/16621, and https://github.com/dotnet/perf-autofiling-issues/issues/16660. Adding Vector128.ConvertXX and Vector128.Create as intrinsics on arm64 improved 48 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/17314 and in https://github.com/dotnet/perf-autofiling-issues/issues/17315.

Make Guid.HexsToChars aggressively inlined in https://github.com/dotnet/runtime/pull/85322 improved a couple of microbenchmarks.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 4.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 7, i2: 4)	| 23.92 |	42.38	| 	77.13
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterUInt64(value: 0)		| 14.05		| 23.66		| 68.37
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterInt32(value: 4)		| 13.98	| 	22.92		| 64.00
Benchstone.BenchI.IniArray.Test		| 186909527.87	| 	304502098.85	| 	62.91
Span.IndexerBench.Ref(length: 1024)	| 	686.54		| 1110.42		| 61.74
System.Tests.Perf_Int64.TryParse(value: "9223372036854775807")	| 	58.15	| 	93.40	| 	60.60
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DivideBenchmark		| 23.30	| 	37.16		| 59.44
System.Tests.Perf_Int64.TryParse(value: "-9223372036854775808")		| 59.06		| 93.58		| 58.45
System.Tests.Perf_Int64.TryParseSpan(value: "9223372036854775807")		| 59.71	| 	93.89	| 	57.26
System.Buffers.Binary.Tests.BinaryReadAndWriteTests.MeasureReverseUsingNtoH		| 1432.42		| 2191.50		| 52.99
System.Tests.Perf_Int64.TryParseSpan(value: "-9223372036854775808")	| 	61.80		| 94.18		| 52.39
System.Threading.Tests.Perf_Volatile.Write_double		| 0.23		| 0.35		| 52.13
System.Numerics.Tests.Perf_VectorOf\<Int32>.EqualsBenchmark		| 0.81	| 	1.23	| 	50.47
System.Tests.Perf_String.Trim(s: "Test ")		| 76.12	| 	113.79	| 	49.48
System.Tests.Perf_UInt16.Parse(value: "12345")	| 	35.63	| 	52.72	| 	47.98
System.Tests.Perf_Int64.Parse(value: "-9223372036854775808")	| 	62.30		| 91.72	| 	47.22
System.Tests.Perf_UInt64.Parse(value: "18446744073709551615")	| 	70.51		| 103.27	| 	46.44
System.Tests.Perf_Int64.Parse(value: "9223372036854775807")		| 61.62		| 90.17	| 	46.34
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.SumBenchmark		| 2.76	| 	3.99	| 	44.34
System.Collections.Tests.Perf_BitArray.BitArrayGet(Size: 512)		| 8039.61		| 11602.79	| 	44.32

## Mono Interpreter

The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 4.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 4.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Byte>.CountBenchmark  |	0.00 |	0.00 |	-100
System.Numerics.Tests.Perf_VectorOf\<Int16>.CountBenchmark |	0.18	| 0.00	|	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.CountBenchmark |	0.16 |	0.00	|	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.CountBenchmark |	1.29 |	0.00	|	-100
System.Numerics.Tests.Perf_VectorOf\<SByte>.CountBenchmark |	0.20 |	0.00	|	-99.20
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.CountBenchmark |	0.07 |	0.00	|	-95.73
System.Tests.Perf_DateTime.ToString(format: "s") |	2233.23	| 281.76	|	-87.38
System.Text.Json.Serialization.Tests.ColdStartSerialization\<SimpleStructWithProperties>.NewJsonSerializerContext |	185975.98	| 28969.63	|	-84.42
System.Tests.Perf_DateTimeOffset.ToString(format: "s") |	2311.74	| 385.39	|	-83.32
System.Numerics.Tests.Perf_VectorOf\<Int32>.CountBenchmark	| 0.44	| 0.10	|	-77.43
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 10000000)	| 45039.52 |	12494.67	|	-72.25
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 10000) |	44649.63	| 12502.98	|	-71.99
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 1000000) |	45124.15| 	13007.76	|	-71.17
System.IO.MemoryMappedFiles.Tests.Perf_MemoryMappedFile.CreateNew(capacity: 100000)|	44604.36 |	13258.02	|	-70.27
System.Reflection.Invoke.Ctor0_NoParams	| 393.98 |	123.35	|	-68.69
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.CountBenchmark |	0.00 |	0.00	|	-68.38
System.Tests.Perf_DateTimeOffset.ToString(format: null) |	6639.43	| 2509.03	|	-62.21
System.Reflection.Activator\<EmptyClass>.CreateInstanceGeneric |	575.27 |	221.73	|	-61.45
System.Tests.Perf_DateTimeOffset.ToString(value: 12/30/2017 3:45:22 AM -08:00) |	6959.23 |	2746.69	|	-60.53
System.Memory.ReadOnlySpan.Trim(input: "")	| 49.19	| 	19.80	| 	-59.73

Implementation of `IUtf8SpanFormattable` in https://github.com/dotnet/runtime/pull/84469 caused both improvements and regressions as reported in https://github.com/dotnet/perf-autofiling-issues/issues/15630 and https://github.com/dotnet/perf-autofiling-issues/issues/15626. `DateTime{Offset}` formatting improvement about 120 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/17009. PR https://github.com/dotnet/runtime/pull/85288 improved about 30 microbenchmarks reported in https://github.com/dotnet/perf-autofiling-issues/issues/17245. Handling of the Utf8Formatter.TryFormat and then delegating to the relevant helpers in https://github.com/dotnet/runtime/pull/85277 improved about 30 microbenchmarks.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 4.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.CountBenchmark|	0.00|	0.23|	9893.94
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int64>.CountBenchmark|	0.02|	0.75|	4216.78
System.Numerics.Tests.Perf_VectorOf\<UInt32>.CountBenchmark|	0.00|	0.12|	3988.20
Microsoft.Extensions.DependencyInjection.ActivatorUtilitiesBenchmark.Factory|	276.60|	852.40|	208.17
System.Numerics.Tests.Perf_VectorOf\<UInt64>.AbsBenchmark	|2.32	|4.51|	94.06
System.Numerics.Tests.Perf_VectorOf\<UInt16>.AbsBenchmark	|2.37	|4.34|	83.29
System.Numerics.Tests.Perf_Vector2.ZeroBenchmark	|0.44|	0.78	|78.01
System.Memory.Constructors\<Byte>.ArrayAsSpan	|12.20|	21.63	|77.34
Microsoft.Extensions.Primitives.Performance.StringValuesBenchmark.Indexer_FirstElement_String	|8.60	|14.85|	72.68
System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: False, responseLength: 100000)|	1903905.78|	3227992.49|	69.54
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.OnesComplementBenchmark|	6.62|	10.83|	63.43
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterDecimal(value: 123456.789)	|491.42	|801.06	|63.00
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt64>.OnesComplementOperatorBenchmark	|6.29	|10.12|	60.75
Microsoft.AspNetCore.Server.Kestrel.Performance.PipeThroughputBenchmark.Parse_ParallelAsync(Length: 4096, Chunks: 1)	|8112.10|	12805.61|	57.85
System.Memory.Constructors\<Byte>.MemoryMarshalCreateReadOnlySpan|	7.75	|12.19|	57.15
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.CountBenchmark	|0.12|	0.19	|54.21
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.BitwiseAndBenchmark	|8.47	|12.73	|50.32
System.Numerics.Tests.Constructor.ConstructorBenchmark_Int16|	29.48|	43.17|	46.45
System.Numerics.Tests.Perf_VectorOf\<UInt16>.InequalityOperatorBenchmark	|19.53|	27.98	|43.23
System.Numerics.Tests.Perf_VectorOf\<UInt64>.BitwiseOrBenchmark |	39.39	 |55.74	 |	41.51

# Preview 3

The following section overviews only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT compiler

The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 3.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 3.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Byte>.CountBenchmark|	0.01	|0.00|	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int32>.CountBenchmark|	0.01|	0.00	|-100
System.Tests.Perf_Boolean.ToString(value: True)	|0.23	|0.00	|-100
System.Numerics.Tests.Perf_Vector4.EqualityOperatorBenchmark	|1.96	|0.80	|-59.04
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt32>.SumBenchmark|	6.65	|3.26	|-50.93
System.Numerics.Tests.Perf_Vector4.InequalityOperatorBenchmark	|1.39|	0.74|	-46.53
System.Tests.Perf_Enum.HasFlag|	0.23|	0.13	|-44.47
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_uint	|1096.23|	667.83|	-39.07
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_ulong|	1102.75	|746.09	|-32.34
System.Numerics.Tests.Perf_BitOperations.Log2_ulong	|1320.59	|895.14	|-32.21
System.Tests.Perf_String.IndexerCheckLengthHoisting|88.84	|60.29	|-32.13
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.LessThanOrEqualAllBenchmark	|4.44|	3.03	|-31.65
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.SumBenchmark|	4.02|	2.76	|-31.25
System.Numerics.Tests.Perf_VectorOf\<SByte>.MinBenchmark	|48.27	|33.34	|-30.93
Inlining.InlineGCStruct.WithFormat	|2.86	|1.99	|-30.52
PerfLabTests.CastingPerf.ObjScalarValueType	|108762.72	|76497.64	|-29.66
System.Numerics.Tests.Perf_VectorOf\<Byte>.InequalityOperatorBenchmark	|0.55	|0.39|	-29.07
Microsoft.Extensions.Primitives.StringSegmentBenchmark.Equals_Object_Invalid	|2.86|	2.04|	-28.66
System.Numerics.Tests.Perf_VectorOf\<UInt64>.EqualityOperatorBenchmark	|0.52	|0.37|	-28.49
System.Numerics.Tests.Perf_VectorOf\<UInt64>.InequalityOperatorBenchmark|	0.62	|0.45|	-28.32

The most improved groupings of benchmark are `System.Numerics` as outlined https://github.com/dotnet/perf-autofiling-issues/issues/14023, https://github.com/dotnet/perf-autofiling-issues/issues/14224, https://github.com/dotnet/perf-autofiling-issues/issues/14573, and https://github.com/dotnet/perf-autofiling-issues/issues/14322. The changes implemented in https://github.com/dotnet/runtime/pull/82420, https://github.com/dotnet/runtime/pull/83337, and https://github.com/dotnet/runtime/pull/83094 introduced Arm64 SIMD operations and improved about 1000 microbenchmarks.

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 3.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Byte>.ZeroBenchmark	| 2.65| 	5.66	| 	113.78
System.Numerics.Tests.Perf_BitOperations.Log2_uint| 	791.53| 	1539.09	| 	94.44
System.Collections.Tests.Add_Remove_SteadyState\<Int32>.Queue(Count: 512)| 	9.64	| 18.37	| 	90.64
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 1000)| 	2769.97	| 5142.05	| 	85.63
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 1000)| 	2771.03	| 5139.62	| 	85.47
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 100)| 	377.30| 	646.53	| 71.35
System.Numerics.Tests.Perf_BitOperations.PopCount_uint| 	668.42	| 1104.04	| 	65.17
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 100)	| 377.61	| 598.53	| 	58.50
System.Threading.Tests.Perf_Volatile.Read_double| 	0.16| 	0.26	| 	57.96
System.Memory.Span\<Char>.Reverse(Size: 512)	| 258.69| 	407.47	| 	57.51
PerfLabTests.LowLevelPerf.StructWithInterfaceInterfaceMethod| 	154024.04| 	239168.34	| 	55.27
System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes: 8192, TestCase: Json4KB)| 	13635.35| 	20935.97	| 	53.54
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCase: Json4KB)| 	10415.86	| 15732.85	| 	51.04
System.Text.Json.Tests.Perf_Reader.ReadSingleSpanSequenceEmptyLoop(IsDataCompact: True, TestCase: Json4KB)	| 10436.16	| 15712.23	| 	50.55
System.Numerics.Tests.Perf_VectorOf\<Int32>.EqualityOperatorBenchmark| 	0.24| 	0.36	| 	50.01
System.Collections.IndexerSetReverse<Int32>.Array(Size: 512)	| 456.86| 	681.13	| 	49.08
System.Collections.IndexerSet\<Int32>.Span(Size: 512)	| 458.27	| 682.26	| 	48.87
System.Numerics.Tests.Perf_VectorOf\<Int64>.EqualityOperatorBenchmark| 	0.27| 	0.40	| 	48.57
System.Numerics.Tests.Perf_BitOperations.PopCount_ulong	| 745.13| 	1102.84	| 	48.00
System.Text.Json.Tests.Perf_Reader.ReadReturnBytes(IsDataCompact: False, TestCase: Json40KB)| 	158074.36| 	231420.75| 		46.39

## Mono Interpreter

The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 3.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 3.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Single>.CountBenchmark | 	0.16 | 	0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.CountBenchmark | 	0.01	 | 0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt16>.CountBenchmark | 	0.11	 | 0.00	 | 	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<UInt32>.CountBenchmark | 	0.43 | 	0.00	 | 	-100
System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab2wei1kxfbvsbpzwhanjczcqa2psra3aacxb67qnwbnfp2tok6v0a58lzfdql1fehvs91yzkt9xam7ahjbhvpd9edll13ab46i74ktwwgkgbi792e5gkuuzevo5qm8qt83edag7zovoe686gmtw730kms2i5xgji4xcp25287q68fvhwszd3mszht2uh7bchlgkj5qnq1x9m4lg7vwn8cq5l756akua6oyx9k71bmxbysnmhvxvlxde4k9maumfgxd8gxhxx4mwpph2ttyox9zilt3ylv1q9s4bopfuoa8qlrzodg2q67sh85wx4slcd6w7ufnendaxai633ove2ktbaxdt2sz6y6mo42473xd274gz833p6hj3mu77c4m4od9e5s8btxleh0efqnu9zj9rwtbk5758lio35b3q426j5fwwq1qyknfedrsmqyfw1m38mkkotdf7n0vr6p3erhy8dkzntr9fwjrslxjgrbegih0n6bpb5bfuy55bu65ce9kejcfifxwpcs05umrsb8kvd64q2iwugbbi7vd35g5ho0rff9rhombgzzaniyq7bbjbqr88jyw4ccgnoyl31of3a5thv0vg08gnrqzxas800hewtw8tnwgw5pav81ntdpdd62689x3iqpc317y82b3e2trbpdzieoxldaz009tz37gqmh4bdp1bv9lnl5s58udb11z0h7i2sdl5nbyhjyfzxwzezmp4qx0i3eyvsd3fg8sryq9jhlvkonnfcvb4snl4mcbimdzg49tzdhqjmfxfcq3p1st6b9x2xyevo17evpqp4yc4f2rm0f26ivr3t2f5m0boc44vituxaovcqy1jrkcs6im2kdu3jvcexx2k76egve63aon5a6nbxss4rcke90npmqp35qluf571ms160y2nhaqef835wah41qru8tauu362v0r8konl8", oldChar: 'b', newChar: '+')	 | 99861.87 | 	2074.68	 | 	-97.92
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.CountBenchmark	 | 2.79	 | 0.07	 | 	-97.41
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.UnaryNegateOperatorBenchmark	 | 234.80 | 	6.26 | 	-97.33
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.UnaryNegateOperatorBenchmark	 | 246.33 | 	6.63	 | 	-97.30
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.NegateBenchmark	 | 235.81	 | 6.49	 | 	-97.24
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.NegateBenchmark	 | 235.54	 | 6.56	 | 	-97.21
System.Numerics.Tests.Perf_VectorOf\<UInt64>.CountBenchmark	 | 3.10	 | 0.09	 | 	-97.00
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.LessThanBenchmark | 	273.32 | 	8.63	 | 	-96.84
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.LessThanBenchmark | 	273.20	 | 8.91	 | 	-96.73
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.EqualsStaticBenchmark | 	273.84	 | 9.19	 | 	-96.64
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.SubtractBenchmark	 | 247.26	 | 8.65	 | 	-96.50
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.GreaterThanBenchmark	 | 250.97 | 	8.85	 | 	-96.47
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.SubtractBenchmark	 | 244.27 | 	8.76	 | 	-96.41
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.MultiplyOperatorBenchmark	 | 249.17	 | 8.97	 | 	-96.40
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Byte>.AddBenchmark	 | 238.40 | 	8.67	 | 	-96.36
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.AddOperatorBenchmark	 | 236.35 | 	8.68	 | 	-96.32

The most improved groupings of benchmark are `System.Buffers`, `System.Collections`, `System.Memory`, and `System.Text` as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/14324, https://github.com/dotnet/perf-autofiling-issues/issues/14325, https://github.com/dotnet/perf-autofiling-issues/issues/14326, https://github.com/dotnet/perf-autofiling-issues/issues/14325, https://github.com/dotnet/perf-autofiling-issues/issues/14355, https://github.com/dotnet/perf-autofiling-issues/issues/14359, and https://github.com/dotnet/perf-autofiling-issues/issues/14361. The changes implemented in https://github.com/dotnet/runtime/pull/83498 and https://github.com/dotnet/runtime/pull/83490 increased inlining length limit from 20 to 30 and implemented `shr.un.imm` which improved over 1000 microbenchmarks.

Add vector horizontal sums on Arm64 https://github.com/dotnet/runtime/pull/83675 improved about 20 microbenchmarks, as detailed in https://github.com/dotnet/perf-autofiling-issues/issues/14531.

Changes in https://github.com/dotnet/runtime/pull/83512 caused both improvements and regressions as reported in https://github.com/dotnet/perf-autofiling-issues/issues/15008 and https://github.com/dotnet/perf-autofiling-issues/issues/15154. 

### Regressions

Here is a list of top 20 regressed microbenchmarks in Preview 3.

Name | Baseline Value | Compare Value	 | % Difference
-|-|-|-
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<SByte>.CountBenchmark	| 0.00	| 	0.12		| 	661187.94
System.Numerics.Tests.Perf_VectorOf\<Int16>.CountBenchmark	| 	0.01	| 0.18		| 	2061.26
System.Numerics.Tests.Perf_Vector3.EqualsBenchmark		| 23.78	| 	443.27		| 	1764.35
System.Numerics.Tests.Perf_Vector4.EqualsBenchmark		| 24.01	| 406.03		| 	1590.83
System.Numerics.Tests.Perf_Vector2.EqualsBenchmark		| 33.71	| 	435.39		| 	1191.71
System.Numerics.Tests.Perf_Matrix3x2.EqualsBenchmark	| 	162.13	| 	1346.77		| 	730.69
System.Numerics.Tests.Perf_Plane.EqualsBenchmark	| 	57.84	| 	411.46		| 	611.36
System.Numerics.Tests.Perf_Quaternion.EqualsBenchmark		| 80.35	| 436.94		| 	443.80
System.Numerics.Tests.Perf_VectorOf\<SByte>.CountBenchmark		| 0.04	| 	0.20		| 	431.24
System.Numerics.Tests.Perf_Matrix4x4.EqualsBenchmark		| 376.19	| 	1808.21		| 	380.66
System.Numerics.Tests.Perf_Vector4.ZeroBenchmark		| 0.99	| 	2.52		| 	154.02
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Double>.EqualsBenchmark	| 	124.90	| 	305.09		| 	144.27
System.Numerics.Tests.Perf_VectorOf\<Int32>.CountBenchmark	| 	0.19		| 0.44		| 	127.07
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.EqualsBenchmark		| 191.86		| 410.58		| 	113.99
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Single>.EqualsBenchmark		| 199.71	| 	410.56		| 	105.57
System.Threading.Tests.Perf_Thread.CurrentThread		| 3.50		| 6.37		| 	81.95
System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get_EnumerateHeaders_Unvalidated(ssl: True, chunkedResponse: True, responseLength: 100000)		| 1951914.28	| 	3529445.53		| 	80.81
System.Text.Json.Serialization.Tests.ReadJson\<BinaryData>.DeserializeFromReader(Mode: SourceGen)		| 33011.31		| 59326.04		| 	79.71
System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US, OrdinalIgnoreCase, False))	| 	913.26		| 1618.90		| 	77.26
System.Text.Json.Serialization.Tests.ReadJson\<BinaryData>.DeserializeFromReader(Mode: Reflection)		| 32968.66		| 58440.45		| 	77.26

# Preview 2

There are a number of improvements introduced in Preview 2 to individually call out. The following section presents only major improvements with high-level analysis. The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis. We encourage readers to examine the benchmark reports and to call out major improvements not mentioned in this report.

## Mono AOT compiler

The following sections presents improvements and regressions introduced in Mono AOT compiler in the Preview 2.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 2. Full report available [here](https://pvscmdupload.blob.core.windows.net/parkertest/8.0prev1vsprev2/report_Daily_ca%3Dx64_cb%3Drefs-heads-release-8.0-preview2_co%3DUbuntu1804_cr%3Ddotnetruntime_cc%3DCompliationMode%3Dtiered-LLVM%3Dfalse-MonoAOT%3Dtrue-MonoInterpreter%3Dfalse-RunKind%3Dmicro_mono_Baseline_bb%3Drefs-heads-release-8.0-preview1_2023-03-20.html).

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Collections.Concurrent.Count\<Int32>.Dictionary(Size: 512) | 34.07 μs | 310.43 ns | -33756.76 ns | 99%
System.Collections.Concurrent.Count\<String>.Dictionary(Size: 512) | 17.32 μs | 314.25 ns | -17007.28 ns | 98%
System.Tests.Perf_Decimal.Floor | 81.17 ns | 16.81 ns | -64.36 ns | 79%
System.Tests.Perf_Decimal.Round | 82.24 ns | 18.69 ns | -63.55 ns | 77%
System.Tests.Perf_UInt32.TryFormat(value: 0) | 78.23 ns | 20.05 ns | -58.18 ns | 74%
System.Tests.Perf_Int32.TryFormat(value: 4) | 78.02 ns | 20.47 ns | -57.55 ns | 74%
System.Collections.TryGetValueFalse\<String, String>.ConcurrentDictionary(Size: 512) | 44.69 μs | 12.92 μs | -31.77 μs | 71%
System.Tests.Perf_Decimal.Divide | 346.08 ns | 102.16 ns | -243.92 ns | 70%
System.Collections.ContainsKeyFalse\<String, String>.ConcurrentDictionary(Size: 512) | 45.29 μs | 13.50 μs | -31.79 μs | 70%
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 1000) | 8.93 μs | 2.77 μs | -6.16 μs | 69%
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 1000) | 8.83 μs | 2.77 μs | -6.06 μs | 69%
System.Tests.Perf_UInt64.TryFormat(value: 0) | 84.40 ns | 26.53 ns | -57.87 ns | 69%
System.Tests.Perf_Byte.ToString(value: 255) | 91.65 ns | 29.95 ns | -61.69 ns | 67%
System.Tests.Perf_Version.TryFormat3 | 265.42 ns | 88.04 ns | -177.38 ns | 67%
System.Tests.Perf_Version.TryFormat4 | 345.05 ns | 115.05 ns | -230.00 ns | 67%
System.Collections.TryGetValueTrue\<String, String>.ConcurrentDictionary(Size: 512) | 49.50 μs | 16.53 μs | -32.97 μs | 67%
System.Tests.Perf_Version.TryFormat2 | 176.63 ns | 59.61 ns | -117.02 ns | 66%
System.Collections.ContainsKeyTrue\<String, String>.ConcurrentDictionary(Size: 512) | 50.43 μs | 17.54 μs | -32.89 μs | 65%
LinqBenchmarks.Where01ForX | 1.57 secs | 548.00 ms | -1022.61 ms | 65%
LinqBenchmarks.Where01LinqMethodX | 1.68 secs | 588.39 ms | -1095.38 ms | 65%

The most improved groupings of benchmark are `System.Collections`, `System.Decimal`, `System.Int`, and `System.Text` as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/12996, https://github.com/dotnet/perf-autofiling-issues/issues/13006, https://github.com/dotnet/perf-autofiling-issues/issues/13217, and https://github.com/dotnet/perf-autofiling-issues/issues/13264. The changes implemented in https://github.com/dotnet/runtime/pull/81695 intrinsified `RuntimeHelpers.CreateSpan<T>` widely used in the BCL and replaced `icall` performance path.

Arm64 SIMD operations implemented in https://github.com/dotnet/runtime/pull/83094 and https://github.com/dotnet/runtime/pull/82420 improved over 1000 microbenchmarks according to the https://github.com/dotnet/perf-autofiling-issues/issues/13808, https://github.com/dotnet/perf-autofiling-issues/issues/13807, https://github.com/dotnet/perf-autofiling-issues/issues/14023, and https://github.com/dotnet/perf-autofiling-issues/issues/13990.

The grouping of benchmarks related to `System.Collections` have been improved by the changes made in https://github.com/dotnet/runtime/pull/81902. as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/13220. The changes added support for v128 constants and improved performance in about 75 microbenchmarks.

The benchmark grouping of `System.Text` has been improved by the addition of S.R.I Vectors in JsonReaderHelper, introduced in https://github.com/dotnet/runtime/pull/81758 and outlined in https://github.com/dotnet/perf-autofiling-issues/issues/12993. Furthermore, improved handling of the `ldtoken+ltoken+Type::op_EqualThe` optimization implemented in https://github.com/dotnet/runtime/pull/81277 have significantly improved the benchmark grouping of `System.Text`, as detailed in https://github.com/dotnet/perf-autofiling-issues/issues/12313.

The changes introduced in https://github.com/dotnet/runtime/pull/81306 removed types deriving from `JsonTypeInfo<T>` have had a positive impact on the benchmark groupings of both `System.Numerics` and `System.Collections`, as reported in https://github.com/dotnet/perf-autofiling-issues/issues/12488 and https://github.com/dotnet/perf-autofiling-issues/issues/12550.

All above mentioned changes are speed-related improvements of microbechmarks. There was a significant size improvement on WASM and iOS by enabling deduplication of generics. Issue https://github.com/dotnet/runtime/issues/80419 contains references to changes that reduced size on disk (SOD) for about 11% and 3% respectively.

### Regressions

Here is a list of top 20 microbenchmarks regressions in Preview 2. Full report available [here](https://pvscmdupload.blob.core.windows.net/parkertest/8.0prev1vsprev2/report_Daily_ca%3Dx64_cb%3Drefs-heads-release-8.0-preview2_co%3DUbuntu1804_cr%3Ddotnetruntime_cc%3DCompliationMode%3Dtiered-LLVM%3Dfalse-MonoAOT%3Dtrue-MonoInterpreter%3Dfalse-RunKind%3Dmicro_mono_Baseline_bb%3Drefs-heads-release-8.0-preview1_2023-03-20.html).

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Tests.Perf_Random.Next_long_unseeded | 10.17 ns | 28.84 ns | 18.67 ns | -184%
System.Numerics.Tests.Perf_Vector4.EqualityOperatorBenchmark | 0.79 ns | 1.96 ns | 1.17 ns | -148%
System.Numerics.Tests.Perf_Vector3.TransformByMatrix4x4Benchmark | 60.14 ns | 140.30 ns | 80.17 ns | -133%
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark | 60.73 ns | 132.19 ns | 71.46 ns | -118%
System.Numerics.Tests.Perf_Vector4.TransformVector3ByMatrix4x4Benchmark | 62.72 ns | 131.48 ns | 68.76 ns | -110%
System.Numerics.Tests.Perf_Vector4.TransformByMatrix4x4Benchmark | 63.09 ns | 131.10 ns | 68.00 ns | -108%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix4x4Benchmark | 56.47 ns | 112.12 ns | 55.65 ns | -99%
System.Numerics.Tests.Perf_Quaternion.LengthSquaredBenchmark | 7.76 ns | 14.35 ns | 6.59 ns | -85%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix4x4Benchmark | 56.66 ns | 103.10 ns | 46.44 ns | -82%
System.Numerics.Tests.Perf_Vector4.TransformVector2ByMatrix4x4Benchmark | 61.08 ns | 103.66 ns | 42.58 ns | -70%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix3x2Benchmark | 20.85 ns | 35.00 ns | 14.15 ns | -68%
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_uint | 667.85 ns | 1.10 μs | 428.39 ns | -64%
System.Tests.Perf_Random.Next_long_long_unseeded | 14.28 ns | 22.44 ns | 8.15 ns | -57%
System.Numerics.Tests.Perf_Quaternion.ConjugateBenchmark | 18.32 ns | 28.76 ns | 10.44 ns | -57%
System.Numerics.Tests.Perf_Quaternion.InverseBenchmark | 26.70 ns | 41.60 ns | 14.89 ns | -56%
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark | 13.45 ns | 20.35 ns | 6.90 ns | -51%
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_ulong | 745.74 ns | 1.10 μs | 357.01 ns | -48%
System.Numerics.Tests.Perf_BitOperations.Log2_ulong | 894.61 ns | 1.32 μs | 425.98 ns | -48%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark | 21.03 ns | 30.87 ns | 9.85 ns | -47%
System.Numerics.Tests.Perf_Vector3.ReflectBenchmark | 37.23 ns | 54.13 ns | 16.90 ns | -45%

Here is a list of ongoing regressions in Preview 2 snapshot with short description.

| Issue report    | Description                                |
| --------------- | ------------------------------------------ |
https://github.com/dotnet/perf-autofiling-issues/issues/12546 |  Quaternion and Plane SIMD intrinsics
https://github.com/dotnet/perf-autofiling-issues/issues/12957 |  Improve `ConcurrentDictionary` performance for strings
https://github.com/dotnet/perf-autofiling-issues/issues/12660 |  Improved codegen of the vector accelerated `System.Numerics.*` types
https://github.com/dotnet/perf-autofiling-issues/issues/13187 |  Implementation of Lemire's nearly divisionless method
https://github.com/dotnet/perf-autofiling-issues/issues/13500 |  Use of `Array.Reverse<T>` in `ImmutableArray<T>.Builder.Reverse`

## Mono Interpreter

The following sections presents improvements and regressions introduced in Mono Interpreter in the Preview 2.

### Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 2. Full report available [here](https://pvscmdupload.blob.core.windows.net/parkertest/8.0prev1vsprev2/report_Daily_ca%3Dx64_cb%3Drefs-heads-release-8.0-preview2_co%3DUbuntu1804_cr%3Ddotnetruntime_cc%3DCompliationMode%3Dtiered-LLVM%3Dfalse-MonoAOT%3Dfalse-MonoInterpreter%3Dtrue-RunKind%3Dmicro_mono_Baseline_bb%3Drefs-heads-release-8.0-preview1_2023-03-20.html).

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Collections.Concurrent.Count\<Int32>.Dictionary(Size: 512) |  140.03 μs |  1.76 μs |  -138.26 μs |  99%
System.Collections.Concurrent.Count\<String>.Dictionary(Size: 512) |  136.03 μs |  1.86 μs |  -134.17 μs |  99%
System.Threading.Tests.Perf_Interlocked.CompareExchange_long |  37.56 ns |  6.66 ns |  -30.90 ns |  82%
System.Threading.Tests.Perf_Interlocked.CompareExchange_int |  34.18 ns |  8.33 ns |  -25.85 ns |  76%
System.Buffers.Tests.RentReturnArrayPoolTests\<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False) |  3.81 μs |  1.09 μs |  -2.72 μs |  71%
System.Numerics.Tests.Perf_Vector4.ZeroBenchmark |  3.21 ns |  0.99 ns |  -2.22 ns |  69%
System.Buffers.Tests.RentReturnArrayPoolTests\<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False) |  3.42 μs |  1.06 μs |  -2.36 μs |  69%
System.Tests.Perf_Decimal.Floor |  175.25 ns |  65.77 ns |  -109.48 ns |  62%
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark |  63.64 ns |  24.08 ns |  -39.56 ns |  62%
System.Numerics.Tests.Perf_Quaternion.InequalityOperatorBenchmark |  89.74 ns |  34.82 ns |  -54.93 ns |  61%
System.Buffers.Tests.RentReturnArrayPoolTests\<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False) |  4.34 μs |  1.70 μs |  -2.64 μs |  61%
System.Tests.Perf_Decimal.Round |  191.52 ns |  75.77 ns |  -115.76 ns |  60%
System.Numerics.Tests.Perf_Quaternion.DotBenchmark |  77.60 ns |  31.33 ns |  -46.27 ns |  60%
System.Numerics.Tests.Perf_Quaternion.DivideBenchmark |  88.55 ns |  36.47 ns |  -52.07 ns |  59%
System.Tests.Perf_Random.Next_int_int_unseeded |  154.47 ns |  65.37 ns |  -89.11 ns |  58%
System.Numerics.Tests.Perf_Quaternion.IsIdentityBenchmark |  81.52 ns |  35.06 ns |  -46.46 ns |  57%
System.Numerics.Tests.Perf_Quaternion.SubtractionOperatorBenchmark |  83.75 ns |  36.09 ns |  -47.67 ns |  57%
System.Numerics.Tests.Perf_Quaternion.SubtractBenchmark |  84.49 ns |  36.50 ns |  -47.99 ns |  57%
System.Collections.CtorFromCollection\<Int32>.ConcurrentDictionary(Size: 512) |  461.77 μs |  200.10 μs |  -261.67 μs |  57%
System.Tests.Perf_UInt64.TryFormat(value: 0) |  250.12 ns |  109.72 ns |  -140.40 ns | 56%

The most improved groupings of benchmark are `System.Collections`, `System.Numerics`, and `System.Decimal` as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/12504, https://github.com/dotnet/perf-autofiling-issues/issues/12544, https://github.com/dotnet/perf-autofiling-issues/issues/13303, https://github.com/dotnet/perf-autofiling-issues/issues/13247, https://github.com/dotnet/perf-autofiling-issues/issues/13752, https://github.com/dotnet/perf-autofiling-issues/issues/13761, and https://github.com/dotnet/perf-autofiling-issues/issues/12744. The changes implemented in https://github.com/dotnet/runtime/pull/81335 which intrinsified `System.Numerics.*` types, in https://github.com/dotnet/runtime/pull/82093 which intrinsified `CreateSpan`, and in https://github.com/dotnet/runtime/pull/81782 which introduced common Vector128 SIMD operations widely used in the BCL improved over 1000 microbenchmarks.

Implementation of synch block fast paths created a regression in Mono AOT compiler https://github.com/dotnet/runtime/pull/81380, but led to an improvement of about 100 microbenchmarks in Mono Interpreter, as detailed in https://github.com/dotnet/perf-autofiling-issues/issues/13245.

Similar to a change in AOT compiler, changes introduced in https://github.com/dotnet/runtime/pull/81306 removed types deriving from `JsonTypeInfo<T>` improved several microbenchmarks in Mono Interpreter. Improve ConcurrentDictionary performance for strings in https://github.com/dotnet/runtime/pull/81557 improved https://github.com/dotnet/perf-autofiling-issues/issues/13003. Also, code refactors led to several improvements presented in https://github.com/dotnet/perf-autofiling-issues/issues/12301.

### Regressions

Here is a list of top 20 microbenchmarks regressions in Preview 2. Full report available [here](https://pvscmdupload.blob.core.windows.net/parkertest/8.0prev1vsprev2/report_Daily_ca%3Dx64_cb%3Drefs-heads-release-8.0-preview2_co%3DUbuntu1804_cr%3Ddotnetruntime_cc%3DCompliationMode%3Dtiered-LLVM%3Dfalse-MonoAOT%3Dfalse-MonoInterpreter%3Dtrue-RunKind%3Dmicro_mono_Baseline_bb%3Drefs-heads-release-8.0-preview1_2023-03-20.html).

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<UInt64>.CountBenchmark |  0.06 ns |  3.10 ns |  3.04 ns |  -5,059%
System.Runtime.Intrinsics.Tests.Perf_Vector128Of\<Int16>.CountBenchmark |  0.36 ns |  1.75 ns |  1.39 ns |  -391%
System.Collections.TryAddDefaultSize\<String>.ConcurrentDictionary(Count: 512) |  297.96 μs |  574.34 μs |  276.38 μs |  -93%
System.Numerics.Tests.Perf_Vector2.UnitYBenchmark |  7.38 ns |  13.69 ns |  6.31 ns |  -85%
HardwareIntrinsics.RayTracer.SoA.Render |  2.41 ns |  4.38 ns |  1.97 ns |  -82%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix3x2Benchmark |  48.06 ns |  86.28 ns |  38.22 ns |  -80%
System.IO.Compression.Brotli.Compress_WithoutState(level: Fastest, file: "TestDocument.pdf") |  291.36 μs |  522.83 μs |  231.47 μs |  -79%
System.IO.Compression.Brotli.Compress_WithState(level: Fastest, file: "TestDocument.pdf") |  296.93 μs |  525.99 μs |  229.06 μs |  -77%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark |  44.65 ns |  75.61 ns |  30.96 ns |  -69%
System.Memory.Constructors_ValueTypesOnly\<Byte>.ReadOnlyFromPointerLength |  6.33 ns |  10.49 ns |  4.16 ns |  -66%
PerfLabTests.EnumPerf.ObjectGetTypeNoBoxing |  3.87 ns |  6.20 ns |  2.32 ns |  -60%
System.Numerics.Tests.Perf_Vector3.SquareRootBenchmark |  23.34 ns |  37.02 ns |  13.68 ns |  -59%
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark |  124.53 ns |  196.66 ns |  72.12 ns |  -58%
System.Diagnostics.Perf_Process.StartAndWaitForExit |  871.51 μs |  1.35 ms |  474.57 μs |  -54%
System.Numerics.Tests.Perf_Vector3.TransformByMatrix4x4Benchmark |  144.68 ns |  217.99 ns |  73.31 ns |  -51%
System.Collections.AddGivenSize\<String>.List(Size: 512) |  12.21 μs |  18.32 μs |  6.11 μs |  -50%
System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 2000000) |  8.14 ms |  12.20 ms |  4.06 ms |  -50%
System.Numerics.Tests.Perf_VectorOf\<Int32>.ZeroBenchmark |  3.20 ns |  4.80 ns |  1.59 ns |  50%
System.Buffers.Tests.RentReturnArrayPoolTests\<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True) |  5.73 μs |  8.56 μs |  2.83 μs |  -49%
System.Buffers.Tests.RentReturnArrayPoolTests\<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True) |  5.62 μs |  8.37 μs |  2.75 μs |  -49%

Here is a list of ongoing regressions in Preview 2 snapshot with short description.

| Issue report    | Description                                |
| --------------- | ------------------------------------------ |
https://github.com/dotnet/perf-autofiling-issues/issues/12707 | use of not implemented Vector operations
https://github.com/dotnet/perf-autofiling-issues/issues/13747 | Intrinsified common `Vector128` operations

---

# Preview 1

This report presents .NET 8 Preview 1 overview of major performance improvements and regressions in Mono Interpreter.

## Improvements

Here is a list of top 20 microbenchmarks improvements in Preview 1.

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Byte>.LessThanAnyBenchmark  | 292.17 ns	 | 18.88 ns	 | -273.29 ns	 | 94%
System.Numerics.Tests.Perf_VectorOf\<Byte>.LessThanOrEqualAnyBenchmark | 	298.08 ns	 | 20.47 ns	 | -277.61 ns	 | 93%
System.Numerics.Tests.Perf_VectorOf\<SByte>.LessThanOrEqualAnyBenchmark | 	294.38 ns | 	20.33 ns	 | -274.05 ns	 | 93%
System.Numerics.Tests.Perf_VectorOf\<SByte>.LessThanAnyBenchmark | 	298.45 ns | 	20.63 ns	 | -277.82 ns	 | 93%
System.Numerics.Tests.Perf_VectorOf\<Byte>.GreaterThanOrEqualAllBenchmark | 	331.73 ns | 	24.25 ns	 | -307.48 ns	 | 93%
System.Numerics.Tests.Perf_VectorOf\<UInt16>.GreaterThanOrEqualAllBenchmark | 	218.05 ns | 	20.58 ns	 | -197.47 ns	 | 91%
System.Numerics.Tests.Perf_VectorOf\<Int16>.GreaterThanAllBenchmark | 	209.57 ns | 	20.48 ns	 | -189.08 ns	 | 90%
System.Numerics.Tests.Perf_VectorOf\<Int16>.GreaterThanOrEqualAllBenchmark | 	231.47 ns | 	23.03 ns	 | -208.44 ns	 | 90%
System.Numerics.Tests.Perf_VectorOf\<Int16>.LessThanOrEqualAnyBenchmark | 	188.87 ns | 	20.02 ns	 | -168.84 ns	 | 89%
System.Numerics.Tests.Perf_VectorOf\<Int16>.LessThanAnyBenchmark | 	186.21 ns | 	20.05 ns	 | -166.16 ns	 | 89%
System.Numerics.Tests.Perf_VectorOf\<UInt16>.LessThanOrEqualAnyBenchmark | 	189.87 ns | 	20.76 ns	 | -169.11 ns	 | 89%
System.Numerics.Tests.Perf_VectorOf\<UInt16>.LessThanAnyBenchmark | 	186.54 ns | 	21.38 ns	 | -165.15 ns	 | 89%
System.Memory.Span\<Byte>.IndexOfAnyFourValues(Size: 512) | 	11.82 μs | 	1.60 μs	 | -10.23 μs	 | 87%
System.Memory.Span\<Byte>.IndexOfAnyFiveValues(Size: 512) | 	14.32 μs | 	2.42 μs	 | -11.90 μs	 | 83%
System.Numerics.Tests.Perf_VectorOf\<Int32>.GreaterThanAllBenchmark | 	120.71 ns | 	20.59 ns	 | -100.11 ns	 | 83%
System.Numerics.Tests.Perf_VectorOf\<UInt32>.GreaterThanAllBenchmark	 | 124.72 ns | 	21.39 ns	 | -103.32 ns | 	83%
System.Numerics.Tests.Perf_VectorOf\<Single>.GreaterThanOrEqualAllBenchmark | 	136.11 ns | 	24.20 ns	 | -111.91 ns	 | 82%
System.Numerics.Tests.Perf_VectorOf\<Single>.GreaterThanAllBenchmark	 | 128.50 ns | 	24.30 ns	 | -104.20 ns	 | 81%
System.Numerics.Tests.Perf_VectorOf\<UInt64>.GreaterThanAllBenchmark | 	105.81 ns | 	20.48 ns	 | -85.33 ns	 | 81%
System.Numerics.Tests.Perf_VectorOf\<Int64>.GreaterThanAllBenchmark | 	105.16 ns | 	20.57 ns	 | -84.60 ns	 | 80%

There are a number of improvements introduced in Preview 1 to individually call out. The following section presents only major improvements with high-level analysis.
The analysis should be taken dubiously and readers are encouraged to examine benchmark reports for thorough analysis.

The most improved groupings of benchmark are `System.Runtime.Vectors`, `System.Runtime.Intrinsics` and `System.Collections` as outlined [here](https://pvscmdupload.blob.core.windows.net/parkertest/02_15_2023/report_Daily_ca=x64_cb=refs-heads-release-8.0-preview1_co=Ubuntu1804_cr=dotnetruntime_cc=CompliationMode=wasm-RunKind=micro_Baseline_bb=refs-heads-release-7.0_2023-02-15.html.) and in https://github.com/dotnet/perf-autofiling-issues/issues/10468. 
Adding [`stobj.vt.noref` version for no reference case](https://github.com/dotnet/runtime/pull/79165) that is twice as fast compared to the `stobj.v` improved over 400 microbenchmarks as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/10468 and https://github.com/dotnet/perf-autofiling-issues/issues/10464.

SpanHelpers are widly used in BCL and improvements related to them could significantly improve performance. Changes in https://github.com/dotnet/runtime/commit/200a90aae4905567e79aafe49380a899a8f2b0c7, https://github.com/dotnet/runtime/commit/7fa0d5b5942f138c362f2753398b4a8d4f71eb73, and https://github.com/dotnet/runtime/commit/c0447bcbbaf4510b5788ccc2e75504900e5261f1 removed mono-specific SpanHelpers, replaced branch patterns with super-instructions, and improved detection of dead bblocks. Over 300 microbenchmarks are improved as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/10989 and https://github.com/dotnet/perf-autofiling-issues/issues/11155.
Change https://github.com/dotnet/runtime/pull/77331 simplified `getitem.span` opcode and avoided typical use of ldloca with it, which improved over 50 microbenchmarks.

Allow passing vtypes with a single scalar field to native code using the faster code path improved `System.Text` an `System.Collections` groupings of benchmarks as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/10987 and https://github.com/dotnet/perf-autofiling-issues/issues/10938. The assumption is that those libraries rely on [ObjectHandleOnStack types](https://github.com/dotnet/runtime/pull/79686).

Intrinsic for string allocation `newstr` in https://github.com/dotnet/runtime/pull/79392 improved various microbenchmarks as outlined in https://github.com/dotnet/perf-autofiling-issues/issues/10694 and https://github.com/dotnet/perf-autofiling-issues/issues/10670.

https://github.com/dotnet/runtime/commit/9a651097fddbae42fd2aa68faee9b990b160d596 contributed to https://github.com/dotnet/perf-autofiling-issues/issues/10695 and https://github.com/dotnet/perf-autofiling-issues/issues/10671.

All above mentioned changes are speed improvements of microbechmarks. There was a significant size improvement in web assembly by https://github.com/dotnet/runtime/pull/79672 that reduced size on disk (SOD) in blazor template application for ~270kb by trimming `S.N.Vector` class in non-SIMD cases. With [deduplication of symbols](https://github.com/dotnet/runtime/pull/80260) in web assembly additional size savings are achieved.

## Regressions

Here is a list of top 20 microbenchmarks regressions in Preview 1.

Name | Baseline Value | Compare Value	 | Difference	 | % Difference
-|-|-|-|-
System.Numerics.Tests.Perf_VectorOf\<Byte>.CountBenchmark | 	0.10 ns | 	1.10 ns | 1.00 ns | 	-969%
System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab2wei1kxfbvsbpzwhanjczcqa2psra3aacxb67qnwbnfp2tok6v0a58lzfdql | 	11.63 μs	 | 101.96 μs | 	90.33 μs | 	-777%
System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab2wei1kxfbvsbpzwhanjczcqa2psra3aacxb67qnwbnfp2tok6v0a58l", ol | 	1.30 μs	 | 8.82 μs	 | 7.52 μs	 | -578%
System.Tests.Perf_Byte.ToString(value: 255)	 | 38.31 ns | 	257.96 ns | 	219.65 ns | 	-573%
System.Tests.Perf_String.Replace_String(text: "This is a very nice sentence. This is another very nice sentence.", oldValue: "a", newValue: "b")	 | 962.59 ns | 	6.30 μs | 	5335.40 ns | 	-554%
PerfLabTests.LowLevelPerf.IntegerFormatting	 | 6.08 ms | 	34.30 ms	 | 28.21 ms | 	-464%
System.Tests.Perf_Int32.ToString(value: 2147483647) | 	59.17 ns	 | 332.19 ns | 	273.01 ns | 	-461%
System.Tests.Perf_Int16.ToString(value: 32767) | 	53.24 ns | 	297.84 ns | 	244.60 ns | -459%
System.Tests.Perf_Int32.ToString(value: 12345) | 	52.90 ns | 	293.56 ns | 	240.66 ns | 	-455%
System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldChar: 'i', newChar: 'I') | 	531.46 ns	 | 2.89 μs	 | 2355.30 ns	 | -443%
System.Tests.Perf_SByte.ToString(value: 127) | 	52.62 ns	 | 276.41 ns | 	223.79 ns | 	-425%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix4x4Benchmark | 	21.70 ns | 	108.97 ns	 | 87.28 ns	 | -402%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix4x4Benchmark	 | 26.37 ns | 	114.02 ns | 	87.65 ns | 	-332%
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByMatrixOperatorBenchmark	 | 246.08 ns | 	1.04 μs | 	797.11 ns | 	-324%
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByMatrixBenchmark	 | 243.24 ns	 | 1.02 μs | 	779.98 ns | 	-321%
System.Tests.Perf_Byte.ToString(value: 0) | 	7.06 ns | 	27.18 ns | 	20.11 ns | 	-285%
System.Numerics.Tests.Perf_Matrix4x4.CreateTranslationFromScalarXYZ | 	25.27 ns | 	91.61 ns | 	66.34 ns | 	-263%
System.Numerics.Tests.Perf_Matrix4x4.AddBenchmark	 | 90.93 ns | 	304.20 ns | 	213.27 ns | 	-235%
System.Numerics.Tests.Perf_Matrix4x4.LerpBenchmark | 	141.51 ns | 	443.45 ns | 	301.94 ns | 	-213%
System.Numerics.Tests.Perf_Matrix4x4.SubtractOperatorBenchmark | 	100.31 ns | 	307.60 ns | 	207.29 ns	 | -207%

Here is a list of ongoing regressions in Preview 1 snapshot with short description.

| Issue report    | Description                                |
| --------------- | ------------------------------------------ |
https://github.com/dotnet/perf-autofiling-issues/issues/12299 | Extracted code outside of interp main loop
https://github.com/dotnet/perf-autofiling-issues/issues/11449 | Investigating
https://github.com/dotnet/perf-autofiling-issues/issues/11453 | Redundant `ldloca` and `stfld` opcodes in the new `Matrix4x4` implementation
https://github.com/dotnet/perf-autofiling-issues/issues/11147 | New ASCII APIs
https://github.com/dotnet/runtime/issues/79973 | Dependencies update
https://github.com/dotnet/runtime/issues/79336 | Managed implementation of UInt32ToDecStr
https://github.com/dotnet/runtime/issues/79876 | Unoptimized pattern `ldstr; if (uncommon) throw ex (string)`

Issue report	Description
dotnet/perf-autofiling-issues#12546	Quaternion and Plane SIMD intrinsics
dotnet/perf-autofiling-issues#12957	Improve `ConcurrentDictionary` performance for strings
dotnet/perf-autofiling-issues#12660	Improved codegen of the vector accelerated `System.Numerics.*` types
dotnet/perf-autofiling-issues#13187	Implementation of Lemire's nearly divisionless method
dotnet/perf-autofiling-issues#13500	Use of `Array.Reverse<T>` in `ImmutableArray<T>.Builder.Reverse`

Issue report	Description
dotnet/perf-autofiling-issues#12299	Extracted code outside of interp main loop
dotnet/perf-autofiling-issues#11449	Investigating
dotnet/perf-autofiling-issues#11453	Redundant `ldloca` and `stfld` opcodes in the new `Matrix4x4` implementation
dotnet/perf-autofiling-issues#11147	New ASCII APIs
#79973	Dependencies update
#79336	Managed implementation of UInt32ToDecStr
#79876	Unoptimized pattern `ldstr; if (uncommon) throw ex (string)`

Operating System	Bit	Processor Name
macOS 13.0	Arm64	Apple M1
ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz

Name	Baseline Value	Compare Value	% Difference
PerfLabTests.EnumPerf.EnumEquals	646.25	229.29	-64.52
System.Tests.Perf_Enum.ToString_NonFlags_Small(value: TopDirectoryOnly)	633.28	235.90	-62.74
"System.Tests.Perf_Enum.ToString_Format_Flags_Large(value: All	format: ""g"")"	667.24	271.04
System.Reflection.Attributes.IsDefinedClassHitInherit	1315.59	562.93	-57.21
System.Reflection.Activator<EmptyStruct>.CreateInstanceGeneric	721.39	330.82	-54.14
System.Numerics.Tests.Perf_Vector4.SubtractOperatorBenchmark	20.82	9.59	-53.92
System.Reflection.Invoke.Method0_NoParms	853.86	399.59	-53.20
System.Numerics.Tests.Perf_Matrix4x4.CreateRotationZBenchmark	78.54	40.02	-49.03
System.Reflection.Attributes.IsDefinedMethodBaseMissInherit	2512.81	1431.26	-43.04
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByScalarBenchmark	183.31	106.83	-41.71
System.Tests.Perf_Enum.InterpolateIntoStringBuilder_Flags(value: 32)	7501.15	4383.76	-41.55
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark	189.92	111.79	-41.13
"System.IO.Tests.Perf_RandomAccess.ReadScatter(fileSize: 1048576	buffersSize: 16384	options: None)"	400115.22
System.Numerics.Tests.Perf_Matrix4x4.CreateRotationXWithCenterBenchmark	90.04	60.34	-32.98
"System.Globalization.Tests.StringSearch.IsSuffix_DifferentLastChar(Options: (en-US	IgnoreCase	True))"	1024.28
"System.Tests.Perf_Enum.StringFormat(value: Red	Green)"	7002.80	4942.10
"System.Tests.Perf_Enum.ToString_Flags(value: Red	Orange	Yellow	Green
System.Numerics.Tests.Perf_VectorOf<Byte>.AddBenchmark	11.28	8.19	-27.44
System.Numerics.Tests.Perf_Vector4.DivideByScalarBenchmark	30.25	21.97	-27.36
System.Numerics.Tests.Perf_Vector2.EqualsBenchmark	35.85	27.68	-22.78

Name	Baseline Value	Compare Value	% Difference
System.Collections.CtorFromCollection<String>.FrozenDictionary(Size: 512)	44266.49	396363.53	795.40
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.EqualsAllBenchmark	6.90	9.58	38.82
"Microsoft.Extensions.DependencyInjection.TimeToFirstService.Scoped(Mode: ""Expressions"")"	49567.25	65031.35	31.19
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.BitwiseOrOperatorBenchmark	9.62	12.45	29.41
System.Numerics.Tests.Perf_VectorOf<SByte>.OnesComplementOperatorBenchmark	6.04	7.80	29.23
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.AllBitsSetBenchmark	2.04	2.61	28.32
System.Tests.Perf_GC<Byte>.NewOperator_Array(length: 10000)	4495.94	5733.46	27.52
System.Memory.Span<Char>.SequenceEqual(Size: 33)	85.83	108.56	26.49
System.Numerics.Tests.Perf_VectorOf<Single>.AddOperatorBenchmark	7.67	9.58	24.98
"Microsoft.Extensions.DependencyInjection.TimeToFirstService.Scoped(Mode: ""ILEmit"")"	49928.88	62377.01	24.93
System.Memory.Constructors<String>.SpanFromArray	15.59	19.40	24.46
Microsoft.Extensions.DependencyInjection.ScopeValidation.TransientWithScopeValidation	1815.08	2227.85	22.74
System.Numerics.Tests.Perf_VectorOf<Int64>.EqualityOperatorBenchmark	6.56	7.77	18.48
System.IO.Tests.Perf_File.CopyToOverwrite(size: 4096)	47118.52	55507.12	17.80
"System.Tests.Perf_Decimal.TryParse(value: ""123456.789"")"	895.48	1023.98	14.34
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.AllBitsSetBenchmark	1.48	1.69	14.11
System.Numerics.Tests.Perf_VectorOf<UInt16>.AndNotBenchmark	9.16	10.44	13.96
System.Memory.Span<Byte>.IndexOfValue(Size: 33)	58.20	65.95	13.31
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.BitwiseOrOperatorBenchmark	7.62	8.61	12.96
"System.Tests.Perf_Int32.ParseSpan(value: ""2147483647"")"	206.91	233.69	12.94

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark	0.38	0.00	-100
System.Numerics.Tests.Perf_Quaternion.NegationOperatorBenchmark	1.87	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.CountBenchmark	0.34	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.CountBenchmark	0.22	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.InequalityOperatorBenchmark	0.97	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.CountBenchmark	0.29	0.00	-100
System.Tests.Perf_Enum.HasFlag	1.35	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.EqualityOperatorBenchmark	2.28	0.01	<
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.CountBenchmark	0.22	0.00	-99.57
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.GreaterThanAllBenchmark	2.50	0.02	-99.35
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.UnaryNegateOperatorBenchmark	85.94	2.58	-97.00
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.UnaryNegateOperatorBenchmark	85.93	4.27	-95.02
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.UnaryNegateOperatorBenchmark	85.94	4.30	-94.99
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.UnaryNegateOperatorBenchmark	85.93	4.35	-94.94
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.LessThanOrEqualBenchmark	2.91	0.26	-91.04
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualityOperatorBenchmark	2.26	0.25	-88.80
System.Numerics.Tests.Perf_Vector3.UnitZBenchmark	3.84	0.54	-85.93
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.BitwiseAndBenchmark	4.07	0.69	-83.07
System.Runtime.Intrinsics.Tests.Perf_Vector128.FloorFloatBenchmark	20.82	3.59	-82.73
System.Net.Primitives.Tests.IPAddressPerformanceTests.TryWriteBytes(address: 1020:3040:5060:7080:9010:1112:1314:1516)	78.86	13.78	-82.52

Name	Baseline Value	Compare Value	% Difference
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.CountBenchmark	0.00	0.14	26004.19
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.CountBenchmark	0.00	0.07	12106.45
System.Numerics.Tests.Perf_VectorOf<Double>.CountBenchmark	0.09	3.36	3767.73
System.Numerics.Tests.Perf_VectorOf<Single>.CountBenchmark	0.00	0.06	2106.86
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.AllBitsSetBenchmark	1.95	10.77	452.08
System.Numerics.Tests.Perf_VectorOf<Single>.CountBenchmark	0.00	0.01	405.57
System.Numerics.Tests.Perf_VectorOf<UInt16>.MaxBenchmark	0.75	3.50	365.24
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.DotBenchmark	0.87	3.58	312.42
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark	0.92	3.67	300.46
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.GreaterThanOrEqualBenchmark	0.92	3.55	286.90
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.DotBenchmark	0.78	2.61	236.42
System.Numerics.Tests.Perf_VectorOf<SByte>.OnesComplementOperatorBenchmark	0.75	2.51	236.33
System.Numerics.Tests.Perf_VectorOf<SByte>.BitwiseOrBenchmark	2.62	8.52	225.70
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.ZeroBenchmark	2.00	5.96	198.55
System.Numerics.Tests.Perf_VectorOf<Int64>.ZeroBenchmark	1.98	5.88	196.21
System.Numerics.Tests.Perf_VectorOf<UInt16>.MultiplyBenchmark	3.10	9.12	194.26
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark	0.98	2.75	180.71
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.EqualsBenchmark	0.98	2.69	174.16
System.Numerics.Tests.Perf_VectorOf<SByte>.UnaryNegateOperatorBenchmark	1.08	2.80	159.06
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.MinBenchmark	2.70	6.92	156.32

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_VectorOf<Int64>.CountBenchmark	0.01	0.23	2775.54
System.Numerics.Tests.Perf_VectorOf<UInt64>.CountBenchmark	0.01	0.17	2177.17
System.Numerics.Tests.Perf_VectorOf<UInt16>.ZeroBenchmark	2.24	4.95	121.29
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.EqualityOperatorBenchmark	7.65	16.63	117.46
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.OnesComplementOperatorBenchmark	3.03	6.11	101.75
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.CountBenchmark	0.04	0.08	86.25
System.Numerics.Tests.Perf_VectorOf<UInt64>.GreaterThanAllBenchmark	18.37	33.12	80.26
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get_EnumerateHeaders_Validated(ssl: True, chunkedResponse: False, responseLength: 100000)"	2230622.93	3965252.94	77.76
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.CountBenchmark	0.12	0.20	69.81
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: False, responseLength: 100000)"	2181340.94	3635706.61	66.67
System.Numerics.Tests.Perf_VectorOf<Byte>.LessThanOrEqualAnyBenchmark	18.27	30.07	64.56
System.Numerics.Tests.Perf_Vector4.ZeroBenchmark	1.36	2.10	55.23
HardwareIntrinsics.RayTracer.SoA.Render	1.15	1.76	52.81
System.Numerics.Tests.Perf_Vector2.DivideByScalarBenchmark	13.77	20.17	46.46
"System.Net.Http.Tests.SocketsHttpHandlerPerfTest.Get(ssl: True, chunkedResponse: True, responseLength: 100000)"	`2621801`.93	`3807493`.79	45.22
System.Runtime.Intrinsics.Tests.Perf_Vector128.ConvertDoubleToLongBenchmark	64.48	89.74	39.17
System.Linq.Tests.Perf_Enumerable.WhereSingleOrDefault_LastElementMatches(input: Array)	2714.67	3708.23	36.59
System.Memory.Constructors_ValueTypesOnly<Byte>.SpanFromPointerLength	6.95	9.47	36.28
Span.IndexerBench.CoveredIndex3(length: 1024)	16595.22	22106.92	33.21
"System.Buffers.Tests.RentReturnArrayPoolTests<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False)"	867.68	1154.02	33.00

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_VectorOf<Single>.CountBenchmark	0.18	0.00	-100
System.Numerics.Tests.Perf_VectorOf<UInt16>.CountBenchmark	0.10	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.CountBenchmark	0.01	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.CountBenchmark	0.03	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.CountBenchmark	1.12	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt16>.CountBenchmark	0.22	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt64>.CountBenchmark	0.08	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.CountBenchmark	0.48	0.00	-99.74
System.Numerics.Tests.Perf_VectorOf<UInt32>.CountBenchmark	0.14	0.00	-99.30
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.CountBenchmark	2.36	0.12	-95.07
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivideBenchmark	127.11	7.82	-93.85
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyOperatorBenchmark	123.89	7.68	-93.80
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyBenchmark	126.45	7.94	-93.71
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.MultiplyOperatorBenchmark	125.08	7.87	-93.70
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivisionOperatorBenchmark	123.79	7.83	-93.67
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.DivideBenchmark	126.19	8.05	-93.62
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.MultiplyBenchmark	127.05	8.23	-93.52
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.DivisionOperatorBenchmark	123.95	8.22	-93.37
System.Numerics.Tests.Perf_VectorOf<UInt64>.CountBenchmark	0.06	0.01	-86.49
System.Collections.Tests.Perf_Dictionary.ContainsValue(Items: 3000)	483385521.57	66414495.75	-86.26

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_Vector2.ZeroBenchmark	0.03	1.05	3076.49
System.Numerics.Tests.Perf_VectorOf<Double>.ZeroBenchmark	2.96	9.10	207.86
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.BitwiseOrOperatorBenchmark	8.51	21.64	154.37
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<UInt32>.GreaterThanOrEqualAnyBenchmark	24.29	47.23	94.44
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<SByte>.InequalityOperatorBenchmark	3.94	7.15	81.24
System.Numerics.Tests.Perf_Plane.CreateFromVerticesBenchmark	76.92	132.40	72.12
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.ConditionalSelectBenchmark	11.14	17.45	56.64
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False)	1877.78	2918.99	55.44
System.Diagnostics.Perf_Process.StartAndWaitForExit	1286337.51	1968645.19	53.04
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Byte>.LessThanAllBenchmark	24.23	36.78	51.79
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.ZeroBenchmark	2.99	4.47	49.41
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int32>.SubtractionOperatorBenchmark	7.62	11.13	45.99
System.Memory.Span<Char>.Reverse(Size: 512)	789.89	1116.00	41.28
System.Buffers.Tests.RentReturnArrayPoolTests<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False)	1963.38	2745.38	39.82
System.Numerics.Tests.Perf_VectorOf<Single>.LessThanAllBenchmark	59.72	82.75	38.57
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.EqualityOperatorBenchmark	27.40	37.64	37.35
System.Globalization.Tests.StringSearch.IndexOf_Word_NotFound(Options: (, None, False))	6382.39	8678.93	35.98
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int64>.OnesComplementBenchmark	6.38	8.61	34.98
System.Numerics.Tests.Perf_VectorOf<Int64>.ZeroBenchmark	2.81	3.78	34.72
System.Runtime.Intrinsics.Tests.Perf_Vector128Float.LessThanOrEqualAllBenchmark	26.61	35.79	34.51

Name	Baseline Value	Compare Value	% Difference
System.Tests.Perf_String.Substring_IntInt(s: "dzsdzsDDZSDZSDZSddsz", i1: 7, i2: 4)	23.92	42.38	77.13
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterUInt64(value: 0)	14.05	23.66	68.37
System.Buffers.Text.Tests.Utf8FormatterTests.FormatterInt32(value: 4)	13.98	22.92	64.00
Benchstone.BenchI.IniArray.Test	186909527.87	304502098.85	62.91
Span.IndexerBench.Ref(length: 1024)	686.54	1110.42	61.74
System.Tests.Perf_Int64.TryParse(value: "9223372036854775807")	58.15	93.40	60.60
System.Runtime.Intrinsics.Tests.Perf_Vector128Int.DivideBenchmark	23.30	37.16	59.44
System.Tests.Perf_Int64.TryParse(value: "-9223372036854775808")	59.06	93.58	58.45
System.Tests.Perf_Int64.TryParseSpan(value: "9223372036854775807")	59.71	93.89	57.26
System.Buffers.Binary.Tests.BinaryReadAndWriteTests.MeasureReverseUsingNtoH	1432.42	2191.50	52.99
System.Tests.Perf_Int64.TryParseSpan(value: "-9223372036854775808")	61.80	94.18	52.39
System.Threading.Tests.Perf_Volatile.Write_double	0.23	0.35	52.13
System.Numerics.Tests.Perf_VectorOf<Int32>.EqualsBenchmark	0.81	1.23	50.47
System.Tests.Perf_String.Trim(s: "Test ")	76.12	113.79	49.48
System.Tests.Perf_UInt16.Parse(value: "12345")	35.63	52.72	47.98
System.Tests.Perf_Int64.Parse(value: "-9223372036854775808")	62.30	91.72	47.22
System.Tests.Perf_UInt64.Parse(value: "18446744073709551615")	70.51	103.27	46.44
System.Tests.Perf_Int64.Parse(value: "9223372036854775807")	61.62	90.17	46.34
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Double>.SumBenchmark	2.76	3.99	44.34
System.Collections.Tests.Perf_BitArray.BitArrayGet(Size: 512)	8039.61	11602.79	44.32

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_VectorOf<Byte>.ZeroBenchmark	2.65	5.66	113.78
System.Numerics.Tests.Perf_BitOperations.Log2_uint	791.53	1539.09	94.44
System.Collections.Tests.Add_Remove_SteadyState<Int32>.Queue(Count: 512)	9.64	18.37	90.64
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 1000)	2769.97	5142.05	85.63
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 1000)	2771.03	5139.62	85.47
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 100)	377.30	646.53	71.35
System.Numerics.Tests.Perf_BitOperations.PopCount_uint	668.42	1104.04	65.17
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 100)	377.61	598.53	58.50
System.Threading.Tests.Perf_Volatile.Read_double	0.16	0.26	57.96
System.Memory.Span<Char>.Reverse(Size: 512)	258.69	407.47	57.51
PerfLabTests.LowLevelPerf.StructWithInterfaceInterfaceMethod	154024.04	239168.34	55.27
System.Text.Json.Tests.Perf_Segment.ReadSingleSegmentSequenceByN(numberOfBytes: 8192, TestCase: Json4KB)	13635.35	20935.97	53.54
System.Text.Json.Tests.Perf_Reader.ReadSpanEmptyLoop(IsDataCompact: True, TestCase: Json4KB)	10415.86	15732.85	51.04
System.Text.Json.Tests.Perf_Reader.ReadSingleSpanSequenceEmptyLoop(IsDataCompact: True, TestCase: Json4KB)	10436.16	15712.23	50.55
System.Numerics.Tests.Perf_VectorOf<Int32>.EqualityOperatorBenchmark	0.24	0.36	50.01
System.Collections.IndexerSetReverse.Array(Size: 512)	456.86	681.13	49.08
System.Collections.IndexerSet<Int32>.Span(Size: 512)	458.27	682.26	48.87
System.Numerics.Tests.Perf_VectorOf<Int64>.EqualityOperatorBenchmark	0.27	0.40	48.57
System.Numerics.Tests.Perf_BitOperations.PopCount_ulong	745.13	1102.84	48.00
System.Text.Json.Tests.Perf_Reader.ReadReturnBytes(IsDataCompact: False, TestCase: Json40KB)	158074.36	231420.75	46.39

Name	Baseline Value	Compare Value	% Difference
System.Numerics.Tests.Perf_VectorOf<Double>.CountBenchmark	0.00	0.00	-100
System.Numerics.Tests.Perf_VectorOf<Int32>.CountBenchmark	0.02	0.00	-100
System.Numerics.Tests.Perf_VectorOf<UInt32>.CountBenchmark	0.00	0.00	-100
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Single>.CountBenchmark	0.40	0.00	-100
System.Numerics.Tests.Perf_VectorOf<SByte>.OneBenchmark	76.06	1.57	-97.93
System.Numerics.Tests.Perf_VectorOf<Byte>.OneBenchmark	76.01	1.87	-97.53
System.Numerics.Tests.Perf_VectorOf<SByte>.NegateBenchmark	221.32	6.26	-97.16
System.Numerics.Tests.Perf_VectorOf<SByte>.UnaryNegateOperatorBenchmark	221.61	6.27	-97.16
System.Numerics.Tests.Perf_VectorOf<Byte>.UnaryNegateOperatorBenchmark	214.44	6.20	-97.10
System.Numerics.Tests.Perf_VectorOf<Byte>.NegateBenchmark	214.55	6.37	-97.02
System.Numerics.Tests.Perf_VectorOf<SByte>.SubtractBenchmark	231.29	7.90	-96.58
System.Numerics.Tests.Perf_VectorOf<SByte>.SubtractionOperatorBenchmark	221.04	7.90	-96.42
System.Numerics.Tests.Perf_VectorOf<UInt16>.OneBenchmark	50.92	1.83	-96.41
System.Numerics.Tests.Perf_VectorOf<Byte>.AddBenchmark	216.21	7.83	-96.37
System.Numerics.Tests.Perf_VectorOf<Byte>.SubtractBenchmark	214.79	7.79	-96.37
System.Numerics.Tests.Perf_VectorOf<Byte>.SubtractionOperatorBenchmark	215.60	7.92	-96.32
System.Numerics.Tests.Perf_VectorOf<SByte>.MultiplyOperatorBenchmark	225.86	8.35	-96.30
System.Numerics.Tests.Perf_VectorOf<Byte>.AddOperatorBenchmark	209.41	7.95	-96.20
System.Numerics.Tests.Perf_VectorOf<SByte>.MultiplyBenchmark	217.21	8.39	-96.13
System.Numerics.Tests.Perf_VectorOf<SByte>.AddOperatorBenchmark	214.44	8.33	-96.11

Name	Baseline Value	Compare Value	Difference	% Difference
System.Collections.Concurrent.Count<Int32>.Dictionary(Size: 512)	34.07 μs	310.43 ns	-33756.76 ns	99%
System.Collections.Concurrent.Count<String>.Dictionary(Size: 512)	17.32 μs	314.25 ns	-17007.28 ns	98%
System.Tests.Perf_Decimal.Floor	81.17 ns	16.81 ns	-64.36 ns	79%
System.Tests.Perf_Decimal.Round	82.24 ns	18.69 ns	-63.55 ns	77%
System.Tests.Perf_UInt32.TryFormat(value: 0)	78.23 ns	20.05 ns	-58.18 ns	74%
System.Tests.Perf_Int32.TryFormat(value: 4)	78.02 ns	20.47 ns	-57.55 ns	74%
System.Collections.TryGetValueFalse<String, String>.ConcurrentDictionary(Size: 512)	44.69 μs	12.92 μs	-31.77 μs	71%
System.Tests.Perf_Decimal.Divide	346.08 ns	102.16 ns	-243.92 ns	70%
System.Collections.ContainsKeyFalse<String, String>.ConcurrentDictionary(Size: 512)	45.29 μs	13.50 μs	-31.79 μs	70%
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_HeavyEscaping(NumberOfBytes: 1000)	8.93 μs	2.77 μs	-6.16 μs	69%
System.Text.Json.Reader.Tests.Perf_Base64.ReadBase64EncodedByteArray_NoEscaping(NumberOfBytes: 1000)	8.83 μs	2.77 μs	-6.06 μs	69%
System.Tests.Perf_UInt64.TryFormat(value: 0)	84.40 ns	26.53 ns	-57.87 ns	69%
System.Tests.Perf_Byte.ToString(value: 255)	91.65 ns	29.95 ns	-61.69 ns	67%
System.Tests.Perf_Version.TryFormat3	265.42 ns	88.04 ns	-177.38 ns	67%
System.Tests.Perf_Version.TryFormat4	345.05 ns	115.05 ns	-230.00 ns	67%
System.Collections.TryGetValueTrue<String, String>.ConcurrentDictionary(Size: 512)	49.50 μs	16.53 μs	-32.97 μs	67%
System.Tests.Perf_Version.TryFormat2	176.63 ns	59.61 ns	-117.02 ns	66%
System.Collections.ContainsKeyTrue<String, String>.ConcurrentDictionary(Size: 512)	50.43 μs	17.54 μs	-32.89 μs	65%
LinqBenchmarks.Where01ForX	1.57 secs	548.00 ms	-1022.61 ms	65%
LinqBenchmarks.Where01LinqMethodX	1.68 secs	588.39 ms	-1095.38 ms	65%

Name	Baseline Value	Compare Value	Difference	% Difference
System.Tests.Perf_Random.Next_long_unseeded	10.17 ns	28.84 ns	18.67 ns	-184%
System.Numerics.Tests.Perf_Vector4.EqualityOperatorBenchmark	0.79 ns	1.96 ns	1.17 ns	-148%
System.Numerics.Tests.Perf_Vector3.TransformByMatrix4x4Benchmark	60.14 ns	140.30 ns	80.17 ns	-133%
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark	60.73 ns	132.19 ns	71.46 ns	-118%
System.Numerics.Tests.Perf_Vector4.TransformVector3ByMatrix4x4Benchmark	62.72 ns	131.48 ns	68.76 ns	-110%
System.Numerics.Tests.Perf_Vector4.TransformByMatrix4x4Benchmark	63.09 ns	131.10 ns	68.00 ns	-108%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix4x4Benchmark	56.47 ns	112.12 ns	55.65 ns	-99%
System.Numerics.Tests.Perf_Quaternion.LengthSquaredBenchmark	7.76 ns	14.35 ns	6.59 ns	-85%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix4x4Benchmark	56.66 ns	103.10 ns	46.44 ns	-82%
System.Numerics.Tests.Perf_Vector4.TransformVector2ByMatrix4x4Benchmark	61.08 ns	103.66 ns	42.58 ns	-70%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix3x2Benchmark	20.85 ns	35.00 ns	14.15 ns	-68%
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_uint	667.85 ns	1.10 μs	428.39 ns	-64%
System.Tests.Perf_Random.Next_long_long_unseeded	14.28 ns	22.44 ns	8.15 ns	-57%
System.Numerics.Tests.Perf_Quaternion.ConjugateBenchmark	18.32 ns	28.76 ns	10.44 ns	-57%
System.Numerics.Tests.Perf_Quaternion.InverseBenchmark	26.70 ns	41.60 ns	14.89 ns	-56%
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark	13.45 ns	20.35 ns	6.90 ns	-51%
System.Numerics.Tests.Perf_BitOperations.LeadingZeroCount_ulong	745.74 ns	1.10 μs	357.01 ns	-48%
System.Numerics.Tests.Perf_BitOperations.Log2_ulong	894.61 ns	1.32 μs	425.98 ns	-48%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark	21.03 ns	30.87 ns	9.85 ns	-47%
System.Numerics.Tests.Perf_Vector3.ReflectBenchmark	37.23 ns	54.13 ns	16.90 ns	-45%

Name	Baseline Value	Compare Value	Difference	% Difference
System.Collections.Concurrent.Count<Int32>.Dictionary(Size: 512)	140.03 μs	1.76 μs	-138.26 μs	99%
System.Collections.Concurrent.Count<String>.Dictionary(Size: 512)	136.03 μs	1.86 μs	-134.17 μs	99%
System.Threading.Tests.Perf_Interlocked.CompareExchange_long	37.56 ns	6.66 ns	-30.90 ns	82%
System.Threading.Tests.Perf_Interlocked.CompareExchange_int	34.18 ns	8.33 ns	-25.85 ns	76%
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False)	3.81 μs	1.09 μs	-2.72 μs	71%
System.Numerics.Tests.Perf_Vector4.ZeroBenchmark	3.21 ns	0.99 ns	-2.22 ns	69%
System.Buffers.Tests.RentReturnArrayPoolTests<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: False)	3.42 μs	1.06 μs	-2.36 μs	69%
System.Tests.Perf_Decimal.Floor	175.25 ns	65.77 ns	-109.48 ns	62%
System.Numerics.Tests.Perf_Quaternion.LengthBenchmark	63.64 ns	24.08 ns	-39.56 ns	62%
System.Numerics.Tests.Perf_Quaternion.InequalityOperatorBenchmark	89.74 ns	34.82 ns	-54.93 ns	61%
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: False, UseSharedPool: False)	4.34 μs	1.70 μs	-2.64 μs	61%
System.Tests.Perf_Decimal.Round	191.52 ns	75.77 ns	-115.76 ns	60%
System.Numerics.Tests.Perf_Quaternion.DotBenchmark	77.60 ns	31.33 ns	-46.27 ns	60%
System.Numerics.Tests.Perf_Quaternion.DivideBenchmark	88.55 ns	36.47 ns	-52.07 ns	59%
System.Tests.Perf_Random.Next_int_int_unseeded	154.47 ns	65.37 ns	-89.11 ns	58%
System.Numerics.Tests.Perf_Quaternion.IsIdentityBenchmark	81.52 ns	35.06 ns	-46.46 ns	57%
System.Numerics.Tests.Perf_Quaternion.SubtractionOperatorBenchmark	83.75 ns	36.09 ns	-47.67 ns	57%
System.Numerics.Tests.Perf_Quaternion.SubtractBenchmark	84.49 ns	36.50 ns	-47.99 ns	57%
System.Collections.CtorFromCollection<Int32>.ConcurrentDictionary(Size: 512)	461.77 μs	200.10 μs	-261.67 μs	57%
System.Tests.Perf_UInt64.TryFormat(value: 0)	250.12 ns	109.72 ns	-140.40 ns	56%

Name	Baseline Value	Compare Value	Difference	% Difference
System.Numerics.Tests.Perf_VectorOf<UInt64>.CountBenchmark	0.06 ns	3.10 ns	3.04 ns	-5,059%
System.Runtime.Intrinsics.Tests.Perf_Vector128Of<Int16>.CountBenchmark	0.36 ns	1.75 ns	1.39 ns	-391%
System.Collections.TryAddDefaultSize<String>.ConcurrentDictionary(Count: 512)	297.96 μs	574.34 μs	276.38 μs	-93%
System.Numerics.Tests.Perf_Vector2.UnitYBenchmark	7.38 ns	13.69 ns	6.31 ns	-85%
HardwareIntrinsics.RayTracer.SoA.Render	2.41 ns	4.38 ns	1.97 ns	-82%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix3x2Benchmark	48.06 ns	86.28 ns	38.22 ns	-80%
System.IO.Compression.Brotli.Compress_WithoutState(level: Fastest, file: "TestDocument.pdf")	291.36 μs	522.83 μs	231.47 μs	-79%
System.IO.Compression.Brotli.Compress_WithState(level: Fastest, file: "TestDocument.pdf")	296.93 μs	525.99 μs	229.06 μs	-77%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix3x2Benchmark	44.65 ns	75.61 ns	30.96 ns	-69%
System.Memory.Constructors_ValueTypesOnly<Byte>.ReadOnlyFromPointerLength	6.33 ns	10.49 ns	4.16 ns	-66%
PerfLabTests.EnumPerf.ObjectGetTypeNoBoxing	3.87 ns	6.20 ns	2.32 ns	-60%
System.Numerics.Tests.Perf_Vector3.SquareRootBenchmark	23.34 ns	37.02 ns	13.68 ns	-59%
System.Numerics.Tests.Perf_Vector3.TransformNormalByMatrix4x4Benchmark	124.53 ns	196.66 ns	72.12 ns	-58%
System.Diagnostics.Perf_Process.StartAndWaitForExit	871.51 μs	1.35 ms	474.57 μs	-54%
System.Numerics.Tests.Perf_Vector3.TransformByMatrix4x4Benchmark	144.68 ns	217.99 ns	73.31 ns	-51%
System.Collections.AddGivenSize<String>.List(Size: 512)	12.21 μs	18.32 μs	6.11 μs	-50%
System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 2000000)	8.14 ms	12.20 ms	4.06 ms	-50%
System.Numerics.Tests.Perf_VectorOf<Int32>.ZeroBenchmark	3.20 ns	4.80 ns	1.59 ns	50%
System.Buffers.Tests.RentReturnArrayPoolTests<Byte>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True)	5.73 μs	8.56 μs	2.83 μs	-49%
System.Buffers.Tests.RentReturnArrayPoolTests<Object>.ProducerConsumer(RentalSize: 4096, ManipulateArray: False, Async: True, UseSharedPool: True)	5.62 μs	8.37 μs	2.75 μs	-49%

Issue report	Description
dotnet/perf-autofiling-issues#12707	use of not implemented Vector operations
dotnet/perf-autofiling-issues#13747	Intrinsified common `Vector128` operations

Name	Baseline Value	Compare Value	Difference	% Difference
System.Numerics.Tests.Perf_VectorOf<Byte>.LessThanAnyBenchmark	292.17 ns	18.88 ns	-273.29 ns	94%
System.Numerics.Tests.Perf_VectorOf<Byte>.LessThanOrEqualAnyBenchmark	298.08 ns	20.47 ns	-277.61 ns	93%
System.Numerics.Tests.Perf_VectorOf<SByte>.LessThanOrEqualAnyBenchmark	294.38 ns	20.33 ns	-274.05 ns	93%
System.Numerics.Tests.Perf_VectorOf<SByte>.LessThanAnyBenchmark	298.45 ns	20.63 ns	-277.82 ns	93%
System.Numerics.Tests.Perf_VectorOf<Byte>.GreaterThanOrEqualAllBenchmark	331.73 ns	24.25 ns	-307.48 ns	93%
System.Numerics.Tests.Perf_VectorOf<UInt16>.GreaterThanOrEqualAllBenchmark	218.05 ns	20.58 ns	-197.47 ns	91%
System.Numerics.Tests.Perf_VectorOf<Int16>.GreaterThanAllBenchmark	209.57 ns	20.48 ns	-189.08 ns	90%
System.Numerics.Tests.Perf_VectorOf<Int16>.GreaterThanOrEqualAllBenchmark	231.47 ns	23.03 ns	-208.44 ns	90%
System.Numerics.Tests.Perf_VectorOf<Int16>.LessThanOrEqualAnyBenchmark	188.87 ns	20.02 ns	-168.84 ns	89%
System.Numerics.Tests.Perf_VectorOf<Int16>.LessThanAnyBenchmark	186.21 ns	20.05 ns	-166.16 ns	89%
System.Numerics.Tests.Perf_VectorOf<UInt16>.LessThanOrEqualAnyBenchmark	189.87 ns	20.76 ns	-169.11 ns	89%
System.Numerics.Tests.Perf_VectorOf<UInt16>.LessThanAnyBenchmark	186.54 ns	21.38 ns	-165.15 ns	89%
System.Memory.Span<Byte>.IndexOfAnyFourValues(Size: 512)	11.82 μs	1.60 μs	-10.23 μs	87%
System.Memory.Span<Byte>.IndexOfAnyFiveValues(Size: 512)	14.32 μs	2.42 μs	-11.90 μs	83%
System.Numerics.Tests.Perf_VectorOf<Int32>.GreaterThanAllBenchmark	120.71 ns	20.59 ns	-100.11 ns	83%
System.Numerics.Tests.Perf_VectorOf<UInt32>.GreaterThanAllBenchmark	124.72 ns	21.39 ns	-103.32 ns	83%
System.Numerics.Tests.Perf_VectorOf<Single>.GreaterThanOrEqualAllBenchmark	136.11 ns	24.20 ns	-111.91 ns	82%
System.Numerics.Tests.Perf_VectorOf<Single>.GreaterThanAllBenchmark	128.50 ns	24.30 ns	-104.20 ns	81%
System.Numerics.Tests.Perf_VectorOf<UInt64>.GreaterThanAllBenchmark	105.81 ns	20.48 ns	-85.33 ns	81%
System.Numerics.Tests.Perf_VectorOf<Int64>.GreaterThanAllBenchmark	105.16 ns	20.57 ns	-84.60 ns	80%

Name	Baseline Value	Compare Value	Difference	% Difference
System.Numerics.Tests.Perf_VectorOf<Byte>.CountBenchmark	0.10 ns	1.10 ns	1.00 ns	-969%
System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab2wei1kxfbvsbpzwhanjczcqa2psra3aacxb67qnwbnfp2tok6v0a58lzfdql	11.63 μs	101.96 μs	90.33 μs	-777%
System.Tests.Perf_String.Replace_Char(text: "yfesgj0sg1ijslnjsb3uofdz3tbzf6ysgblu3at20nfab2wei1kxfbvsbpzwhanjczcqa2psra3aacxb67qnwbnfp2tok6v0a58l", ol	1.30 μs	8.82 μs	7.52 μs	-578%
System.Tests.Perf_Byte.ToString(value: 255)	38.31 ns	257.96 ns	219.65 ns	-573%
System.Tests.Perf_String.Replace_String(text: "This is a very nice sentence. This is another very nice sentence.", oldValue: "a", newValue: "b")	962.59 ns	6.30 μs	5335.40 ns	-554%
PerfLabTests.LowLevelPerf.IntegerFormatting	6.08 ms	34.30 ms	28.21 ms	-464%
System.Tests.Perf_Int32.ToString(value: 2147483647)	59.17 ns	332.19 ns	273.01 ns	-461%
System.Tests.Perf_Int16.ToString(value: 32767)	53.24 ns	297.84 ns	244.60 ns	-459%
System.Tests.Perf_Int32.ToString(value: 12345)	52.90 ns	293.56 ns	240.66 ns	-455%
System.Tests.Perf_String.Replace_Char(text: "This is a very nice sentence", oldChar: 'i', newChar: 'I')	531.46 ns	2.89 μs	2355.30 ns	-443%
System.Tests.Perf_SByte.ToString(value: 127)	52.62 ns	276.41 ns	223.79 ns	-425%
System.Numerics.Tests.Perf_Vector2.TransformNormalByMatrix4x4Benchmark	21.70 ns	108.97 ns	87.28 ns	-402%
System.Numerics.Tests.Perf_Vector2.TransformByMatrix4x4Benchmark	26.37 ns	114.02 ns	87.65 ns	-332%
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByMatrixOperatorBenchmark	246.08 ns	1.04 μs	797.11 ns	-324%
System.Numerics.Tests.Perf_Matrix4x4.MultiplyByMatrixBenchmark	243.24 ns	1.02 μs	779.98 ns	-321%
System.Tests.Perf_Byte.ToString(value: 0)	7.06 ns	27.18 ns	20.11 ns	-285%
System.Numerics.Tests.Perf_Matrix4x4.CreateTranslationFromScalarXYZ	25.27 ns	91.61 ns	66.34 ns	-263%
System.Numerics.Tests.Perf_Matrix4x4.AddBenchmark	90.93 ns	304.20 ns	213.27 ns	-235%
System.Numerics.Tests.Perf_Matrix4x4.LerpBenchmark	141.51 ns	443.45 ns	301.94 ns	-213%
System.Numerics.Tests.Perf_Matrix4x4.SubtractOperatorBenchmark	100.31 ns	307.60 ns	207.29 ns	-207%

.NET 8 Per-Preview Performance report on WASM, Mono AOT, and Interpreter #84302

Description

Setup

Preview 7

Mono AOT compiler

Mono Interpreter

Improvements

Regressions

Preview 6

Mono AOT WASM

Improvements

Regressions

Mono AOT compiler

Mono Interpreter

Improvements

Regressions

Preview 5

Mono AOT compiler

Mono Interpreter

Improvements

Regressions

Preview 4

Mono AOT compiler

Improvements

Regressions

Mono Interpreter

Improvements

Regressions

Preview 3

Mono AOT compiler

Improvements

Regressions

Mono Interpreter

Improvements

Regressions

Preview 2

Mono AOT compiler

Improvements

Regressions

Mono Interpreter

Improvements

Regressions

Preview 1

Improvements

Regressions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions