Allow non-SwiftPM build systems to have larger indexing batch sizes #2238

rockbruno · 2025-08-13T21:22:57Z

I noticed that sourcekit-lsp forces the batch size to be 1 for SwiftPM reasons, but nowadays there are other ways to run it. On the Bazel BSP for example we have no issues building tons of targets at the same time, so it seemed feasible to keep the existing limitation for SwiftPM but allow other build systems to parallelize more index tasks.

Sources/SourceKitLSP/Workspace.swift

bnbarham

Oh interesting, thanks for looking into this @rockbruno! Have you been able to run any experiments with this to see if it helps much? I'm mostly wondering if half the number of processor count actually makes sense here - it would seem to imply that the buildsystem itself isn't very parallel.

Another question is whether we hook this up to maxCoresPercentageToUseForBackgroundIndexing. It's currently marked as internal and right now controls the number of background tasks. Seems like a nice way to easily control this as well - it could default to 0.5 instead of the 1 it does today, though that would also reduce the number of background tasks 🤔. Similar to above, I wonder what the interaction between these two ends up being.

EDIT: Thinking about it further, I don't know if I'm convinced this should be split according to processor count at all. Really this seems more like a trade off in granularity in when we can start indexing. At one end (eg. a single batch) we allow the buildsystem to be as parallel as it can be, but can't start indexing until everything is finished. At the other (one target per preparation) the buildsystem doesn't have much opportunity to run in parallel (unless it's a high level target), but we can start indexing as soon as each comes back. Maybe the better option here is to just add an option to control it so that users can tweak it as they desire?

rockbruno · 2025-08-14T06:32:13Z

@bnbarham Having it customizable sounds better yeah. For Bazel specifically the bottleneck is that starting Bazel takes some time, so individual requests are crazy slow. So having one request that builds tons of targets is much better.

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

ahoppen

Very cool stuff 🤩 Chiming in with some thoughts on how to determine the batch size.

Sources/BuildServerIntegration/BuildServerManager.swift

Sources/BuildServerProtocol/Messages/InitializeBuildRequest.swift

Sources/SemanticIndex/SemanticIndexManager.swift

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

ahoppen · 2025-08-14T11:21:59Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+    XCTAssertEqual(preparedTargetBatches[0].count, 3)
+    XCTAssertEqual(preparedTargetBatches[1].count, 3)
+    XCTAssertEqual(preparedTargetBatches[2].count, 3)
+    XCTAssertEqual(preparedTargetBatches[3].count, 1)


This test will crash if preparationTasks.value.count < 4. We should early exit the test if preparationTasks.value.count != 4.

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

rockbruno · 2025-08-14T11:52:19Z

@ahoppen I like it, sounds like a mix between the two approaches (I force pushed the previous code, but it was using processors / 2 as the number). I'm gonna run some tests and see what seems to work better here for Bazel to get an idea how high this can be.

bnbarham · 2025-08-14T22:38:30Z

For Bazel specifically the bottleneck is that starting Bazel takes some time, so individual requests are crazy slow

Out of interest, is it possible to run Bazel as eg. a daemon in order to avoid that startup time? How are requests for compiler arguments handled, is Bazel run once to get all the commands, and then those are cached?

rockbruno · 2025-08-15T09:11:58Z

@bnbarham @ahoppen Gave this another run. I ran some quick benchmarks with our example project and this is what I got:

Batch size 1: 1:30 minutes
Batch size 4: 30 seconds
Batch size 8: 20 seconds

Important to note that the times vary heavily based on the order SK builds internally (which is not always the same), so there's some luck involved for the size 1 case. But for my test project at least it seemed to be the case that the more parallelism the better (since Bazel is good at it).

Based on this however I thought it would be better to make a couple of changes:

I would suggest defaulting to false instead of the other way around, mostly to avoid accidentally breaking something on the other build systems that we didn't test.
While we can have SK have a default batch size, I think it will be good to still allow this to be customized for two reasons:
- I think the "sweet spot" depends on the build system capabilities and project complexity. For my test project for example it seems that building as much as possible is the best choice, but I'm not sure that would be the best choice for the main Spotify project (larger targets, so longer downtime / more cancellations issued from the LSP side). So I think that each project will have its own sweet spot that is not necessarily something like cpu / 2.
- (less relevant) I wasn't sure how to properly unit test this without allowing it to be changed :)

@bnbarham For your Bazel questions:

Bazel uses a daemon already, but we have some overhead to 1) establish a connection to the server, and 2) determine whether the invocation will result in Bazel needing to re-calculate some info internally, which can take a while depending on the size of the repo. So doing something like bazel build targetA targetB targetC will always be faster than individual calls.

ahoppen · 2025-08-15T11:21:30Z

While we can have SK have a default batch size, I think it will be good to still allow this to be customized for two reasons:

I am wondering whether the customization option should come from the build system or the user. My feeling is that the ideal value is more dependent on the project that the user is opening, which means that the optimal preparation batch size should also be customizable on a per-project level using SourceKit-LSP.

I would suggest defaulting to false instead of the other way around, mostly to avoid accidentally breaking something on the other build systems that we didn't test.

I think we should default continue to default to true because:

The BSP extension was always explicitly shaped to allow parallel preparation of multiple tasks, so if a BSP server can’t handle multiple targets, that’s definitely a bug on their side.
I am not aware of any BSP servers besides the built-in SwiftPM server and your Bazel BSP server now that support background preparation.
I don’t want to be stuck with a historically motivated default for the future. If a new BSP server gets implemented, it should support parallel preparation of targets.

I think the "sweet spot" depends on the build system capabilities and project complexity. For my test project for example it seems that building as much as possible is the best choice, but I'm not sure that would be the best choice for the main Spotify project (larger targets, so longer downtime / more cancellations issued from the LSP side). So I think that each project will have its own sweet spot that is not necessarily something like cpu / 2.

Have you had a chance to try the file count-based approach I mentioned in #2238 (comment)? The more I think about it, the more I believe that file count is a better measure here than target count because targets can vary drastically in size. It would also make it easier to interpret the results of your measurements since I don’t know how big the targets are that you are testing but I do have a rough idea how big a typical source file is.

rockbruno · 2025-08-15T18:21:53Z

I think the file count has the same issue of having a potentially different sweet spot per project (protobuf generated specs are huge yet are "just" one file, for example), but like you said you could probably assume that in the majority of projects this would be relatively predictable. Personally I find the num of targets one easier to reason about as an user, but using file counts also feels like a valid choice. But for any choice, I think it would be valuable to let those with a more unusual project structure be able to fine-tune it if the default causes it to be too bottleneck-y for some structural reason. Let me know what you prefer for this PR!

Will make the change to make true the default shortly

This reverts commit 19491c9.

ahoppen · 2025-08-20T18:10:05Z

Sorry for the delayed response, I needed to think about how best to specify the target batching strategy. Here are my thoughts:

I want to allow ourselves to evolve the target batching strategy in the future and thus don’t want to include any batch size related options in our BSP extensions, since we want to keep those stable. Instead, the target batching strategy should be configured in the Configuraiton File, which we can evolve more easily. This also has the advantage that users can adjust the strategy to find the optimal values for their project insted of the BSP server deciding for them.

I would propose a configurations schema similar to the following inside the index key of the configuration file.

{
  "preparationBatchingStrategy": {
    "description": "If the BSP server supports preparation of multiple targets in a batch, controls the size of the batches. Note that SwiftPM currently does not support batched target preparation, so this has no affect on Swift packages. Adjusting these options might improve performance of background preparation based on the used BSP server. The available batching strategies may change in the future.",
    "oneOf": [
      {
        "type": "object",
        "description": "Prepare a fixed number of targets in a single batch",
        "properties": {
          "strategy": {
            "const": "target"
          },
          "batchSize": {
            "type": "integer",
            "description": "Defines how many targets should be prepared in a single batch"
          }
        },
        "required": [
          "strategy",
          "batchSize"
        ]
      },
      {
        "type": "object",
        "description": "Prepare as many targets in a single batch so that these targets contain more than the specified number of source files",
        "properties": {
          "strategy": {
            "const": "files"
          },
          "files": {
            "type": "integer",
            "description": "Accumulate targets in a target batch until they contain more files than specified by this property."
          }
        },
        "required": [
          "strategy",
          "batchSize"
        ]
      }
    ]
  }
}

As for finding a default strategy, you have the most experience and should pick a reasonable value. If you can, I would appreciate if you could back the choice of that value up with some kind of measurement in a doc comment so it’s less of a magic number, nothing too sophisticated but something we can read in a year and follow the steps that lead to this value.

We can make the file-based batching strategy a follow-up PR but I’d really like to see how that behaves. As you noted, files may also differ in size but in my experience the file count is a more accurate representation of compile time then the number of targets. As a side note, allowing us to generate the oneOf in the schema above like this will likely need quite a bit new functionality in ConfigSchemaGen. If we only stick to the target-based strategy, we should only need support for the const key in the JSON schema, which should be a lot easier to accomplish.

@bnbarham, does this make sense to you as well? Do you have anything to add?

ahoppen

I have a couple comments on the test case but its shape looks really good.

ahoppen · 2025-08-20T18:13:05Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+      private let projectRoot: URL
+      private var testFileURL: URL { projectRoot.appendingPathComponent("test.swift").standardized }
+
+      nonisolated(unsafe) var preparedTargetBatches = [[BuildTargetIdentifier]]()


If you make BuildServer an actor, you don’t need nonisolated(unsafe).

ahoppen · 2025-08-20T18:14:42Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+    final class BuildServer: CustomBuildServer {
+      let inProgressRequestsTracker = CustomBuildServerInProgressRequestTracker()
+      private let projectRoot: URL
+      private var testFileURL: URL { projectRoot.appendingPathComponent("test.swift").standardized }


Do we need .standardized here? I suppose you copied this from testBuildServerUsesStandardizedFileUrlsInsteadOfRealpath, which needed to use .standardized because it specifically tested behavior around standardized vs realpath.

ahoppen · 2025-08-20T18:16:44Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+      func buildTargetSourcesRequest(_ request: BuildTargetSourcesRequest) async throws -> BuildTargetSourcesResponse {
+        var dummyTargets = [BuildTargetIdentifier]()
+        for i in 0..<10 {
+          dummyTargets.append(BuildTargetIdentifier(uri: try! URI(string: "dummy://dummy-\(i)")))


I think this try! could just be a try. Just removes one possibility that might lead to the test crashing and resolves my allergic reaction to try! 😉

Same below

ahoppen · 2025-08-20T18:24:29Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+        for i in 0..<10 {
+          dummyTargets.append(BuildTargetIdentifier(uri: try! URI(string: "dummy://dummy-\(i)")))
+        }
+        return BuildTargetSourcesResponse(items: dummyTargets.map {


Oh, interesting that this even works. The BSP server only reports a single target here and we return more targets than included in the BuildTargetSourcesRequest request here, which is actually a bug in SourceKit-LSP. We should implement workspaceBuildTargetsRequest and return all 10 targets from it and ideally filter to only return the requested targets here (or add an assertion that the request contains all 10 targets at least).

ahoppen · 2025-08-20T18:25:44Z

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift

+    try await project.testClient.send(SynchronizeRequest(index: true))
+
+    let buildServer = try project.buildServer()
+    let preparedBatches = buildServer.preparedTargetBatches.sorted { $0[0].uri.stringValue < $1[0].uri.stringValue }


Not that I would expect to receive this but this will crash if a preparation batch is empty. It might be easier to just make this let preparedBatches = Set(buildServer.preparedTargetBatches) to make it order-invariant. Or just make preparedTargetBatches inside the BuildServer a Set<Set<BuildTargetIdentifier>> to start with.

ahoppen · 2025-08-20T18:31:18Z

Sources/SemanticIndex/SemanticIndexManager.swift

@@ -222,6 +213,9 @@ package final actor SemanticIndexManager {
  /// The parameter is the number of files that were scheduled to be indexed.
  private let indexTasksWereScheduled: @Sendable (_ numberOfFileScheduled: Int) -> Void

+  /// The size of the batches in which the `SemanticIndexManager` should dispatch preparation tasks.
+  private let indexTaskBatchSize: Int


Let’s name this preparationBatchSize.

bnbarham · 2025-08-20T22:47:28Z

@bnbarham, does this make sense to you as well? Do you have anything to add?

Makes sense to me, it being in the LSP configuration is what I was thinking of in my:

Maybe the better option here is to just add an option to control it so that users can tweak it as they desire?

Limiting it to just target in this PR also sounds good to me.

@bnbarham For your Bazel questions:
Bazel uses a daemon already, but we have some overhead to 1) establish a connection to the server, and 2) determine whether the invocation will result in Bazel needing to re-calculate some info internally, which can take a while depending on the size of the repo. So doing something like bazel build targetA targetB targetC will always be faster than individual calls.

Interesting, good to know, thanks.

rockbruno requested review from ahoppen, bnbarham, hamishknight and rintaro as code owners August 13, 2025 21:22

rockbruno commented Aug 13, 2025

View reviewed changes

Sources/SourceKitLSP/Workspace.swift Outdated Show resolved Hide resolved

bnbarham reviewed Aug 13, 2025

View reviewed changes

tinder-maxwellelliott approved these changes Aug 13, 2025

View reviewed changes

rockbruno force-pushed the rockbruno/batchSize branch from 19831b7 to 9fbdeeb Compare August 14, 2025 08:12

rockbruno commented Aug 14, 2025

View reviewed changes

Tests/SourceKitLSPTests/BackgroundIndexingTests.swift Outdated Show resolved Hide resolved

ahoppen reviewed Aug 14, 2025

View reviewed changes

rockbruno changed the title ~~Allow non-SwiftPM build systems to have larger indexing batch sizes~~ (WIP) Allow non-SwiftPM build systems to have larger indexing batch sizes Aug 14, 2025

rockbruno force-pushed the rockbruno/batchSize branch 2 times, most recently from 36658d6 to f15b4da Compare August 14, 2025 14:01

rockbruno changed the title ~~(WIP) Allow non-SwiftPM build systems to have larger indexing batch sizes~~ Allow non-SwiftPM build systems to have larger indexing batch sizes Aug 15, 2025

rockbruno added 8 commits August 18, 2025 09:54

Allow alternate build systems to have larger indexing batch sizes

f380251

Let the server control the batch size instead

b62d91a

(Temporary) Remove unit test

ef3a6dd

Remove unused kind param

7c96ab6

Handle prepare task cancellation based on the purpose

25c5a52

Revert "(Temporary) Remove unit test"

b2526c0

This reverts commit 19491c9.

Allow custom batch size, write test for batch sizes

8da0ff7

Default multi-target preparation to true

a61079c

rockbruno force-pushed the rockbruno/batchSize branch from 1453170 to a61079c Compare August 18, 2025 07:58

ahoppen reviewed Aug 20, 2025

View reviewed changes

Allow non-SwiftPM build systems to have larger indexing batch sizes #2238

Are you sure you want to change the base?

Allow non-SwiftPM build systems to have larger indexing batch sizes #2238

Conversation

rockbruno commented Aug 13, 2025

Uh oh!

Uh oh!

bnbarham left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rockbruno commented Aug 14, 2025

Uh oh!

Uh oh!

ahoppen left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ahoppen Aug 14, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rockbruno commented Aug 14, 2025

Uh oh!

bnbarham commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

rockbruno commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ahoppen commented Aug 15, 2025

Uh oh!

rockbruno commented Aug 15, 2025

Uh oh!

ahoppen commented Aug 20, 2025

Uh oh!

ahoppen left a comment

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

ahoppen Aug 20, 2025

Choose a reason for hiding this comment

Uh oh!

bnbarham commented Aug 20, 2025

Uh oh!

Uh oh!

bnbarham left a comment •

edited

Loading

bnbarham commented Aug 14, 2025 •

edited

Loading

rockbruno commented Aug 15, 2025 •

edited

Loading