-
Notifications
You must be signed in to change notification settings - Fork 322
Allow non-SwiftPM build systems to have larger indexing batch sizes #2238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh interesting, thanks for looking into this @rockbruno! Have you been able to run any experiments with this to see if it helps much? I'm mostly wondering if half the number of processor count actually makes sense here - it would seem to imply that the buildsystem itself isn't very parallel.
Another question is whether we hook this up to maxCoresPercentageToUseForBackgroundIndexing
. It's currently marked as internal and right now controls the number of background tasks. Seems like a nice way to easily control this as well - it could default to 0.5 instead of the 1 it does today, though that would also reduce the number of background tasks 🤔. Similar to above, I wonder what the interaction between these two ends up being.
EDIT: Thinking about it further, I don't know if I'm convinced this should be split according to processor count at all. Really this seems more like a trade off in granularity in when we can start indexing. At one end (eg. a single batch) we allow the buildsystem to be as parallel as it can be, but can't start indexing until everything is finished. At the other (one target per preparation) the buildsystem doesn't have much opportunity to run in parallel (unless it's a high level target), but we can start indexing as soon as each comes back. Maybe the better option here is to just add an option to control it so that users can tweak it as they desire?
@bnbarham Having it customizable sounds better yeah. For Bazel specifically the bottleneck is that starting Bazel takes some time, so individual requests are crazy slow. So having one request that builds tons of targets is much better. |
19831b7
to
9fbdeeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very cool stuff 🤩 Chiming in with some thoughts on how to determine the batch size.
Sources/BuildServerProtocol/Messages/InitializeBuildRequest.swift
Outdated
Show resolved
Hide resolved
XCTAssertEqual(preparedTargetBatches[0].count, 3) | ||
XCTAssertEqual(preparedTargetBatches[1].count, 3) | ||
XCTAssertEqual(preparedTargetBatches[2].count, 3) | ||
XCTAssertEqual(preparedTargetBatches[3].count, 1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test will crash if preparationTasks.value.count < 4
. We should early exit the test if preparationTasks.value.count != 4
.
@ahoppen I like it, sounds like a mix between the two approaches (I force pushed the previous code, but it was using processors / 2 as the number). I'm gonna run some tests and see what seems to work better here for Bazel to get an idea how high this can be. |
36658d6
to
f15b4da
Compare
Out of interest, is it possible to run Bazel as eg. a daemon in order to avoid that startup time? How are requests for compiler arguments handled, is Bazel run once to get all the commands, and then those are cached? |
@bnbarham @ahoppen Gave this another run. I ran some quick benchmarks with our example project and this is what I got:
Important to note that the times vary heavily based on the order SK builds internally (which is not always the same), so there's some luck involved for the size 1 case. But for my test project at least it seemed to be the case that the more parallelism the better (since Bazel is good at it). Based on this however I thought it would be better to make a couple of changes:
@bnbarham For your Bazel questions:
|
I am wondering whether the customization option should come from the build system or the user. My feeling is that the ideal value is more dependent on the project that the user is opening, which means that the optimal preparation batch size should also be customizable on a per-project level using SourceKit-LSP.
I think we should default continue to default to
Have you had a chance to try the file count-based approach I mentioned in #2238 (comment)? The more I think about it, the more I believe that file count is a better measure here than target count because targets can vary drastically in size. It would also make it easier to interpret the results of your measurements since I don’t know how big the targets are that you are testing but I do have a rough idea how big a typical source file is. |
I think the file count has the same issue of having a potentially different sweet spot per project (protobuf generated specs are huge yet are "just" one file, for example), but like you said you could probably assume that in the majority of projects this would be relatively predictable. Personally I find the num of targets one easier to reason about as an user, but using file counts also feels like a valid choice. But for any choice, I think it would be valuable to let those with a more unusual project structure be able to fine-tune it if the default causes it to be too bottleneck-y for some structural reason. Let me know what you prefer for this PR! Will make the change to make |
This reverts commit 19491c9.
1453170
to
a61079c
Compare
Sorry for the delayed response, I needed to think about how best to specify the target batching strategy. Here are my thoughts: I want to allow ourselves to evolve the target batching strategy in the future and thus don’t want to include any batch size related options in our BSP extensions, since we want to keep those stable. Instead, the target batching strategy should be configured in the Configuraiton File, which we can evolve more easily. This also has the advantage that users can adjust the strategy to find the optimal values for their project insted of the BSP server deciding for them. I would propose a configurations schema similar to the following inside the {
"preparationBatchingStrategy": {
"description": "If the BSP server supports preparation of multiple targets in a batch, controls the size of the batches. Note that SwiftPM currently does not support batched target preparation, so this has no affect on Swift packages. Adjusting these options might improve performance of background preparation based on the used BSP server. The available batching strategies may change in the future.",
"oneOf": [
{
"type": "object",
"description": "Prepare a fixed number of targets in a single batch",
"properties": {
"strategy": {
"const": "target"
},
"batchSize": {
"type": "integer",
"description": "Defines how many targets should be prepared in a single batch"
}
},
"required": [
"strategy",
"batchSize"
]
},
{
"type": "object",
"description": "Prepare as many targets in a single batch so that these targets contain more than the specified number of source files",
"properties": {
"strategy": {
"const": "files"
},
"files": {
"type": "integer",
"description": "Accumulate targets in a target batch until they contain more files than specified by this property."
}
},
"required": [
"strategy",
"batchSize"
]
}
]
}
} As for finding a default strategy, you have the most experience and should pick a reasonable value. If you can, I would appreciate if you could back the choice of that value up with some kind of measurement in a doc comment so it’s less of a magic number, nothing too sophisticated but something we can read in a year and follow the steps that lead to this value. We can make the file-based batching strategy a follow-up PR but I’d really like to see how that behaves. As you noted, files may also differ in size but in my experience the file count is a more accurate representation of compile time then the number of targets. As a side note, allowing us to generate the @bnbarham, does this make sense to you as well? Do you have anything to add? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a couple comments on the test case but its shape looks really good.
private let projectRoot: URL | ||
private var testFileURL: URL { projectRoot.appendingPathComponent("test.swift").standardized } | ||
|
||
nonisolated(unsafe) var preparedTargetBatches = [[BuildTargetIdentifier]]() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you make BuildServer
an actor
, you don’t need nonisolated(unsafe)
.
final class BuildServer: CustomBuildServer { | ||
let inProgressRequestsTracker = CustomBuildServerInProgressRequestTracker() | ||
private let projectRoot: URL | ||
private var testFileURL: URL { projectRoot.appendingPathComponent("test.swift").standardized } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need .standardized
here? I suppose you copied this from testBuildServerUsesStandardizedFileUrlsInsteadOfRealpath
, which needed to use .standardized
because it specifically tested behavior around standardized
vs realpath
.
func buildTargetSourcesRequest(_ request: BuildTargetSourcesRequest) async throws -> BuildTargetSourcesResponse { | ||
var dummyTargets = [BuildTargetIdentifier]() | ||
for i in 0..<10 { | ||
dummyTargets.append(BuildTargetIdentifier(uri: try! URI(string: "dummy://dummy-\(i)"))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this try!
could just be a try
. Just removes one possibility that might lead to the test crashing and resolves my allergic reaction to try!
😉
Same below
for i in 0..<10 { | ||
dummyTargets.append(BuildTargetIdentifier(uri: try! URI(string: "dummy://dummy-\(i)"))) | ||
} | ||
return BuildTargetSourcesResponse(items: dummyTargets.map { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, interesting that this even works. The BSP server only reports a single target here and we return more targets than included in the BuildTargetSourcesRequest
request here, which is actually a bug in SourceKit-LSP. We should implement workspaceBuildTargetsRequest
and return all 10 targets from it and ideally filter to only return the requested targets here (or add an assertion that the request contains all 10 targets at least).
try await project.testClient.send(SynchronizeRequest(index: true)) | ||
|
||
let buildServer = try project.buildServer() | ||
let preparedBatches = buildServer.preparedTargetBatches.sorted { $0[0].uri.stringValue < $1[0].uri.stringValue } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that I would expect to receive this but this will crash if a preparation batch is empty. It might be easier to just make this let preparedBatches = Set(buildServer.preparedTargetBatches)
to make it order-invariant. Or just make preparedTargetBatches
inside the BuildServer
a Set<Set<BuildTargetIdentifier>>
to start with.
@@ -222,6 +213,9 @@ package final actor SemanticIndexManager { | |||
/// The parameter is the number of files that were scheduled to be indexed. | |||
private let indexTasksWereScheduled: @Sendable (_ numberOfFileScheduled: Int) -> Void | |||
|
|||
/// The size of the batches in which the `SemanticIndexManager` should dispatch preparation tasks. | |||
private let indexTaskBatchSize: Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let’s name this preparationBatchSize
.
Makes sense to me, it being in the LSP configuration is what I was thinking of in my:
Limiting it to just target in this PR also sounds good to me.
Interesting, good to know, thanks. |
I noticed that sourcekit-lsp forces the batch size to be 1 for SwiftPM reasons, but nowadays there are other ways to run it. On the Bazel BSP for example we have no issues building tons of targets at the same time, so it seemed feasible to keep the existing limitation for SwiftPM but allow other build systems to parallelize more index tasks.