Skip to content

Conversation

ccastanedaucf
Copy link
Contributor

@ccastanedaucf ccastanedaucf commented Feb 4, 2025

Context

This implements the full node lifecycle and execution flow for an out-of-proc RAR node via an additional nodemode. This does not yet execute the RAR task itself, or serialize anything other than empty dummy payloads - enabling the feature will only run a test message and then fall back to in-proc execution.

Changes Made

New RAR nodemode

  • Setting $env:MSBuildRarNode=1 conditions the RarNodeLauncher to create a new instance of msbuild.exe /nodemode:3. This runs a persistent out-of-proc RAR node which can receive serialized requests from a RAR task client in another process, execute the task, and serialize back the outputs.
  • IPC and serialization logic reuses existing MSBuild node infrastructure.
  • Currently, this does not actually invoke the RAR task, and the request/response objects are empty data structures - but it does implement the full execution and communication flow between the client and server. The only component missing for the feature to be functional (aka executing the RAR task) is populating the request/response with the real RAR inputs/outputs.

Conditioning execution

  • EngineServices.IsOutOfProcRarNodeEnabled tells the RAR task whether to run out-of-proc. This is necessary since we want the task to avoid trying to connect to the node if MSBuild was launched without the flag enabled, which can differ between builds. This is not easily achieved via environment variables due to node reuse.
  • MSBuild property ResolveAssemblyReferencesOutOfProcess can be used to exclude a specific target from running out-of-proc.

RAR Out-Of-Proc Node

  • The out-of-proc node manages two pipe servers - a single-instance server for handling the lifecycle of the node and preventing duplicate nodes from running, and a multi-instance server for executing multiple RAR tasks concurrently.

RAR Out-Of-Proc Client

  • A per-build-node client is shared across all RAR invocations within a build by registering the instance to the host BuildEngine. This ensures that we don't pay overhead cost of reconnecting / repeating handshakes for every task, while still disposing of the client between builds to prevent an idle reused MSBuild node from holding on to a server instance.

Testing

You can validate the feature is functional by setting $env:MSBuildRarNode=1 and checking tracing logs + active processes. Untested code is mostly glue and orchestrating pipe connections. The most brittle code (IPC / serialization) already has test coverage from other MSBuild node implementations.

Notes

  • Env var should be replaced by an actual MSBuild flag in the future, but holding off as the feature is still WIP
  • Need feedback for consistent naming scheme (e.g. is this the RAR out-of-proc service or RAR node? Should RAR be abbreviated everywhere in the node-related types?)

@JanKrivanek
Copy link
Member

Would you be able to separate the pure code moves and actual code changes? It's fine to keep all in a single PR, but separate commits would be great.

It'll help speed up and focus the reviews.

@ccastanedaucf ccastanedaucf force-pushed the dev/chcasta/rar-pr-1 branch 2 times, most recently from e41a063 to 4ba973c Compare February 6, 2025 07:36
@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Feb 21, 2025

FYI I'm separating the code moves (named pipe / IPC stuff) into another PR since I've been further consolidating the IPC code that's duplicated across all node implementations, and that's a bit more risk given the different code paths for the APM and TPL-based versions due to .NET 3.5 compat for the TaskHost. NodeProviderOutOfProcBase, NodeEndpointOutOfProcBase, MSBuildClient, and MSBuildClientPacketPump all use effectively the same logic for setting up the client and server, handshake, sending and receiving packets, serialization, ect., in a way that isn't really reusable via inheritance. Cuts out a pretty big chunk of code from the RAR node implementation too.

Will be updating the main work item with the overall design / ordering of PRs so it helps to see where this is going, since there's still ~5 or so main components after this. But that should be the only real code move / refactoring required for the whole project.

Copy link
Member

@SimaTian SimaTian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So to sum up, we're moving the communication stuff into a common section for communicating nodes and then creating a new type of node.
Looks fine to me.

@SimaTian
Copy link
Member

SimaTian commented Mar 4, 2025

There are some minor conflicts, probably due to the perf work from @Erarndt we recently merged.
You have more context than me @ccastanedaucf , can you take a look please?

@ccastanedaucf
Copy link
Contributor Author

Relevant changes are in second commit - I just rebased on the active IPC pr since this depends on it.

@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Mar 19, 2025

Updated PR description to accurately represent changes since all of the IPC code was extracted into #11546

Can ignore the first commit and merge conflicts since it's just a result of copying over #11546 (will rebase once that's merged)

@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented Mar 27, 2025

Kk since the IPC stuff got merged, this is also good to go other than this compile error I hit after rebasing:

D:\src\msbuild\rawr-pr-1\src\Shared\NodePipeBase.cs(238,23): error CS0246: The type or namespace name 'ValueTask<>' could not be found (are you missing a using directive or an assembly reference?) [D:\src\msb
uild\rawr-pr-1\src\Tasks\Microsoft.Build.Tasks.csproj::TargetFramework=netstandard2.0]

I think this is from the change which swapped Task for ValueTask in some hot paths.

It seems like referencing System.Threading.Channels pulls in ValueTask correctly (since we don't seem have a ref for System.Threading.Tasks.Extensions), but then I get the additional pre-built error in CI:

.packages/microsoft.dotnet.arcade.sdk/9.0.0-beta.25164.2/tools/SourceBuild/AfterSourceBuild.proj#L81

.packages/microsoft.dotnet.arcade.sdk/9.0.0-beta.25164.2/tools/SourceBuild/AfterSourceBuild.proj(81,5): error : (NETCORE_ENGINEERING_TELEMETRY=AfterSourceBuild) 1 new pre-builts discovered! Detailed usage report can be found at /__w/1/s/artifacts/sb/prebuilt-report/baseline-comparison.xml.
See https://aka.ms/dotnet/prebuilts for guidance on what pre-builts are and how to eliminate them.
Package IDs are:
Microsoft.Bcl.AsyncInterfaces.9.0.0

Anything that can be done to resolve this? Ideally I would have access to Channels here anyways since a future PR is planning to use it for queueing log messages.

Otherwise the simplest thing would to swap NodePipeBase back to using Task to get this unblocked.

@ccastanedaucf ccastanedaucf requested a review from a team as a code owner April 4, 2025 20:28
@ccastanedaucf ccastanedaucf force-pushed the dev/chcasta/rar-pr-1 branch 5 times, most recently from 0c16727 to d46b7c8 Compare April 5, 2025 01:40
JanProvaznik added a commit that referenced this pull request Apr 7, 2025
### Context

Minimum change to get shared IPC added in #11546 recompiling again for
#11383 (since the original PR was reverted).

### Changes Made

- Splits up `DeserializeAndRoutePacket()` in `PacketFactoryRecord`
(private class)
- Adds `DeserializePacket()` to `INodePacketFactory` and all
implementations. Does not remove old method.
- Adds ifdef's to shared pipe classes for compilation purposes

No behavior changes to MSBuild.
@ccastanedaucf ccastanedaucf force-pushed the dev/chcasta/rar-pr-1 branch from d46b7c8 to 3d9010d Compare April 7, 2025 18:30
Copy link
Member

@JanProvaznik JanProvaznik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO solid and we want to move forward with this effort. Some questions in code.

also
Is the node supposed to timeout ever?

@ccastanedaucf ccastanedaucf force-pushed the dev/chcasta/rar-pr-1 branch from 3d9010d to a37ff7f Compare May 7, 2025 23:44
@ccastanedaucf
Copy link
Contributor Author

ccastanedaucf commented May 8, 2025

Had to force push to test off main, just small changes starting from ~6 Rename / flip out-of-proc override - ab8eb4b

@SimaTian
Copy link
Member

cc @rainersigwald please final look and we're good to go?

Copy link
Member

@rainersigwald rainersigwald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you flip the default does this cause VS perf/functionality tests to fail? Would love to know that as early as possible.

@SimaTian SimaTian merged commit edd6baf into dotnet:main May 26, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants