Skip to content

Conversation

adamsitnik
Copy link
Member

@adamsitnik adamsitnik commented Dec 7, 2020

Contributes to #45315

using System.Threading.Tasks;
using Microsoft.AspNetCore.Builder;
using Microsoft.AspNetCore.Hosting;
using Microsoft.Extensions.Configuration;

namespace Template
{
    public class Startup
    {
        public Startup(IConfiguration configuration) => Configuration = configuration;

        public IConfiguration Configuration { get; }

        // This method gets called by the runtime. Use this method to configure the HTTP request pipeline.
        public void Configure(IApplicationBuilder app, IWebHostEnvironment env)
        {
            app.UseRouting();

            app.UseEndpoints(routeBuilder =>
            {
                routeBuilder.Map("GetProcesses", context =>
                {
                    foreach (var process in System.Diagnostics.Process.GetProcesses())
                    {
                        process.Dispose();
                    }

                    return Task.CompletedTask;
                });
            });
        }
    }
}

Citrine (28 cores):

load before after
CPU Usage (%) 3 4
Cores usage (%) 85 102
Working Set (MB) 48 48
Build Time (ms) 5,057 4,123
Start Time (ms) 0 0
Published Size (KB) 76,401 76,401
.NET Core SDK Version 3.1.404 3.1.404
First Request (ms) 101 106
Requests/sec 14,392 16,706
Requests 216,986 251,968
Mean latency (ms) 35.53 30.61
Max latency (ms) 301.49 310.73
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1.26 1.47
Latency 50th (ms) 31.94 29.67
Latency 75th (ms) 40.05 31.19
Latency 90th (ms) 49.31 36.68
Latency 99th (ms) 69.83 48.34

Perf (12 cores):

load before after
CPU Usage (%) 10 12
Cores usage (%) 118 142
Working Set (MB) 49 49
Build Time (ms) 5,877 5,898
Start Time (ms) 0 0
Published Size (KB) 76,401 76,401
First Request (ms) 103 106
Requests/sec 13,499 17,852
Requests 203,683 269,349
Mean latency (ms) 38.06 28.65
Max latency (ms) 240.17 168.62
Bad responses 0 0
Socket errors 0 0
Read throughput (MB/s) 1.18 1.57
Latency 50th (ms) 38.57 26.50
Latency 75th (ms) 46.44 36.76
Latency 90th (ms) 57.80 40.75
Latency 99th (ms) 84.34 57.51

No difference for micro-benchmarks (this was expected):

|             Method            Toolchain |     Mean | Ratio |   Gen 0 |   Gen 1 | Gen 2 | Allocated |
|------------------- -------------------- |---------:|------:|--------:|--------:|------:|----------:|
|       GetProcesses   \after\CoreRun.exe | 4.948 ms |  1.00 | 78.4314 | 19.6078 |     - |    621 KB |
|       GetProcesses  \before\CoreRun.exe | 4.916 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |
|                                         |          |       |         |         |       |           |
| GetProcessesByName   \after\CoreRun.exe | 4.912 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |
| GetProcessesByName  \before\CoreRun.exe | 4.903 ms |  1.00 | 80.0000 | 20.0000 |     - |    619 KB |

@adamsitnik adamsitnik added this to the 6.0.0 milestone Dec 7, 2020
@ghost
Copy link

ghost commented Dec 7, 2020

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Contributes to #45315

Author: adamsitnik
Assignees: -
Labels:

area-System.Diagnostics.Process, tenet-performance

Milestone: 6.0.0

@adamsitnik adamsitnik closed this Mar 25, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 24, 2021
@jkotas
Copy link
Member

jkotas commented Aug 13, 2021

@adamsitnik Now that ArrayPool was fixed to pool unlimited size arrays, I think this PR can be resurrect and updated to use ArrayPool unconditionally.

@adamsitnik adamsitnik reopened this Aug 16, 2021
@adamsitnik adamsitnik requested a review from jkotas August 16, 2021 09:19
@adamsitnik
Copy link
Member Author

Now that ArrayPool was fixed to pool unlimited size arrays, I think this PR can be resurrect and updated to use ArrayPool unconditionally.

@jkotas done, PTAL

@adamsitnik adamsitnik modified the milestones: 6.0.0, 7.0.0 Aug 16, 2021
@adamsitnik adamsitnik force-pushed the NtProcessInfoHelperRemoveCache branch from 5e56129 to fc4378f Compare August 17, 2021 18:23
@adamsitnik adamsitnik requested a review from jkotas August 17, 2021 18:27
@jkotas
Copy link
Member

jkotas commented Aug 17, 2021

It looks good to me. I think it would be useful to verify that there is advantage in using the array pool instead of a simpler NativeMemory.Alloc in this situation - #45690 (comment) .

@adamsitnik
Copy link
Member Author

I think it would be useful to verify that there is advantage in using the array pool instead of a simpler NativeMemory.Alloc

I did some measurements using modified benchmarks from performance repo:

[Benchmark]
public void GetProcessesByName()
{
    foreach (var process in Process.GetProcessesByName(_nonExistingName))
    {
        process.Dispose();
    }
}


[Benchmark(OperationsPerInvoke = 10 * 24)]
public void GetProcesses_Parallel()
{
    Parallel.For(0, 24, _ => // my PC has 24 cores
    {
        for (int i = 0; i < 10; i++)
        {
            foreach (var process in Process.GetProcesses())
            {
                process.Dispose();
            }
        }
    });
}

[Benchmark(OperationsPerInvoke = 10 * 24)]
public void GetProcessesByName_Parallel()
{
    Parallel.For(0, 24, _ =>
    {
        for (int i = 0; i < 10; i++)
        {
            foreach (var process in Process.GetProcessesByName(_nonExistingName))
            {
                process.Dispose();
            }
        }
    });
}

And the results were following:

BenchmarkDotNet=v0.13.0.1559-nightly, OS=Windows 10.0.19043.1165 (21H1/May2021Update)                                                                                                                    
AMD Ryzen Threadripper PRO 3945WX 12-Cores, 1 CPU, 24 logical and 12 physical cores                                                                                                                      
.NET SDK=6.0.100-rc.1.21417.19                                                                                                                                                                                        
Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
GetProcesses \alignedAlloc\corerun.exe 4.791 ms 1.05 - - - 477 KB
GetProcesses \arrayPool\corerun.exe 4.508 ms 1.00 55.5556 18.5185 - 475 KB
GetProcessesByName \alignedAlloc\corerun.exe 4.658 ms 1.05 46.8750 15.6250 - 475 KB
GetProcessesByName \arrayPool\corerun.exe 4.456 ms 1.00 46.8750 15.6250 - 473 KB
GetProcesses_Parallel \alignedAlloc\corerun.exe 1.154 ms 0.66 58.3333 16.6667 4.1667 476 KB
GetProcesses_Parallel \arrayPool\corerun.exe 1.763 ms 1.00 62.5000 20.8333 4.1667 484 KB
GetProcessesByName_Parallel \alignedAlloc\corerun.exe 1.098 ms 0.60 58.3333 16.6667 4.1667 475 KB
GetProcessesByName_Parallel \arrayPool\corerun.exe 1.843 ms 1.00 62.5000 16.6667 4.1667 484 KB

AlignedAlloc seems to perform worse for single-threaded (5%), but better for parallel usage (30-40%).
With AlignedAlloc the code is also simpler.

@stephentoub @jkotas I don't have a strong opinion here. What are your thoughts on this?

@jkotas
Copy link
Member

jkotas commented Aug 18, 2021

AlignedAlloc

Nit: It can be just NativeLibrary.Alloc. It guarantees sufficient alignment. AlignedAlloc is unnecessary.

What are your thoughts on this?

I would lean towards using NativeLibrary.Alloc. I think it will give lower high-memory watermark for some common usage patterns of this API. I think minimizing working set is more important than saving cycles for this API.

I do not have a strong opinion on this either. Thank you for collecting the numbers!

@adamsitnik
Copy link
Member Author

NativeLibrary.Alloc

I've switched to NativeLibrary.Alloc and ensured that perf characteristics don't look worse than AlignedAlloc.

Method Toolchain Mean Ratio Gen 0 Gen 1 Gen 2 Allocated
GetProcesses \alignedAlloc\corerun.exe 2,991.6 us 1.08 - - - 405 KB
GetProcesses \alloc\corerun.exe 2,903.3 us 1.04 - - - 397 KB
GetProcesses \arrayPool\corerun.exe 2,776.7 us 1.00 47.0588 11.7647 - 395 KB
GetProcessesByName \alignedAlloc\corerun.exe 2,933.2 us 1.06 41.6667 10.4167 - 403 KB
GetProcessesByName \alloc\corerun.exe 2,805.3 us 1.01 37.5000 12.5000 - 395 KB
GetProcessesByName \arrayPool\corerun.exe 2,776.5 us 1.00 41.6667 10.4167 - 394 KB
GetProcesses_Parallel \alignedAlloc\corerun.exe 1,049.0 us 0.70 54.1667 12.5000 4.1667 404 KB
GetProcesses_Parallel \alloc\corerun.exe 963.5 us 0.64 45.8333 12.5000 4.1667 397 KB
GetProcesses_Parallel \arrayPool\corerun.exe 1,521.2 us 1.00 50.0000 12.5000 4.1667 395 KB
GetProcessesByName_Parallel \alignedAlloc\corerun.exe 982.3 us 0.64 50.0000 8.3333 4.1667 402 KB
GetProcessesByName_Parallel \alloc\corerun.exe 1,035.7 us 0.68 54.1667 20.8333 4.1667 400 KB
GetProcessesByName_Parallel \arrayPool\corerun.exe 1,544.2 us 1.00 50.0000 12.5000 4.1667 393 KB

@jkotas thank you for a lot of good hints and great discussion!

Copy link
Member

@jkotas jkotas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@adamsitnik adamsitnik merged commit 17e92db into dotnet:main Aug 19, 2021
@adamsitnik adamsitnik deleted the NtProcessInfoHelperRemoveCache branch August 19, 2021 17:24
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants