Skip to content

Contention of Project Evaluation in parallel builds #7625

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
yuehuang010 opened this issue May 14, 2022 · 7 comments
Open

Contention of Project Evaluation in parallel builds #7625

yuehuang010 opened this issue May 14, 2022 · 7 comments

Comments

@yuehuang010
Copy link
Contributor

yuehuang010 commented May 14, 2022

Issue Description

Project Evaluation in parallel builds have contention causing evaluations of 20-30ms to take over 1000ms.

Steps to Reproduce

Create a solution with lots of small projects, enough to saturate your CPU. I used 4 times CPU threads worth of projects. The contents of each projects is not relevant as I used "Clean" target to do the least amount work. I used nearly identical projects to remove variables. Projects don't have P2P to maximize throughput. Nodereuse:false in all cases.

Case 1:
msbuild /t:clean /bl /v:q

Case 2:
msbuild /t:clean /bl /v:q /m

Used binlog to record results and set verbose to quiet to avoid console print out noise. Observe the Project Evaluation times of all projects.

Data & Analysis

This image is the trace of a single node build (case 1). Observer that each evaluation time took a few 20-30ms, except for the initial project.
image

This image is the trace of a multi node build (case 2) Observer that first evaluation took the same time in case 1, once parallel nodes started, the time of first evaluation takes seconds. Following subsequent project, their evaluation are faster. Notice node 1 is also having slowdown.
image

Theory

I theorize there is single threaded file cache service that handles file IO. The file cache probably serializes the data back to the nodes while holding onto the lock, thus blocking other nodes from using it. Node 0 is affected by the contention, so that disproves the "new" node cost.
Alternative is an evaluation cache where the lock is on the entire evaluation duration.

@yuehuang010 yuehuang010 added needs-triage Have yet to determine what bucket this goes in. Area: Performance labels May 14, 2022
@AR-May AR-May added backlog needs-investigation and removed needs-triage Have yet to determine what bucket this goes in. labels May 16, 2022
@yuehuang010 yuehuang010 changed the title Contention of Project Evaluation in parallel builds 📈 Contention of Project Evaluation in parallel builds May 19, 2022
@rokonec rokonec self-assigned this Jun 6, 2022
@yuehuang010
Copy link
Contributor Author

Thanks to "msbuild /profileEvaluation", I got a few more hints.

image

$([Microsoft.Build.Utilities.ToolLocationHelper]::GetLatestSDKTargetPlatformVersion($(SDKIdentifier), $(SDKVersion))) takes 3-4ms warm and 180ms cold. While the results are cached, there is a lock in RetrieveTargetPlatformList().
Same thing with ToolLocationHelper::GetPlatformSDKLocation() as it calls RetrieveTargetPlatformList().

There is also a few instance of "exists" conditions that takes 4-6ms. Hopefully those results are cached.

@Therzok
Copy link
Contributor

Therzok commented Jul 21, 2022

Disclaimer: not a maintainer, but afaik the CachingFileSystemWrapper is used for Exists evaluation.

@yuehuang010
Copy link
Contributor Author

yuehuang010 commented Jul 21, 2022

Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes, then that would save load time.

@rokonec
Copy link
Member

rokonec commented Jan 10, 2023

@yuehuang010 Is this still active? How serious you think it is? What priority you would give it?

@yuehuang010
Copy link
Contributor Author

yuehuang010 commented Jan 10, 2023 via email

@danmoseley
Copy link
Member

Just thinking out loud, if the main MSBuild node could copy over its caches to the child nodes

Off topic but is there still discussion of the possibility of moving some nodes into the same process, where tasks were known to not assume their own current directory and environment block? Although, without more rearchitecture there would still be serialization costs, there would be other savings.

@yuehuang010
Copy link
Contributor Author

Without going too crazy, I think focusing on a simple problem of GetLatestSDKTargetPlatformVersion() is good enough. Only have the initial node hold on to ToolLocationHelper data and other nodes just request them.

On the hand, I hear that the multi threaded MSBuild is making progress, perhaps that is good enough.

@rokonec rokonec removed their assignment Feb 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants