Skip to content

Cache container names in CheckedIndex #1909

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jan 8, 2025

Conversation

ahoppen
Copy link
Member

@ahoppen ahoppen commented Jan 3, 2025

It is important that we cache this because we might find a lot of symbols in the same container for eg. workspace symbols (eg. consider many symbols in the same C++ namespace). If we didn't cache this value, then we would need to perform a primaryDefinitionOrDeclarationOccurrence lookup for all of these containers, which is expensive.

For example, searching for only in SourceKit-LSP’s codebase used to not return results in more than 20s (after which I gave up) and now returns >8000 results in <2s.

It is important that we cache this because we might find a lot of symbols in the same container for eg. workspace symbols (eg. consider many symbols in the same C++ namespace). If we didn't cache this value, then we would need to perform a `primaryDefinitionOrDeclarationOccurrence` lookup for all of these containers, which is expensive.

For example, searching for `only` in SourceKit-LSP’s codebase used to not return results in more than 20s (after which I gave up) and now returns >8000 results in <2s.

rdar://141412138
@ahoppen ahoppen requested a review from bnbarham January 3, 2025 15:00
@ahoppen
Copy link
Member Author

ahoppen commented Jan 3, 2025

@swift-ci Please test

Comment on lines 63 to 64
/// Maps the USR of a symbol’s container the name of that container as well as the name of all containers the
/// container might itself be contained in.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Maps the USR of a symbol’s container the name of that container as well as the name of all containers the
/// container might itself be contained in.
/// Maps the USR of a symbol to its name and the name of all its containers, from outermost to innermost.

return cached
}
let result: [String]
if let containerDefinition = primaryDefinitionOrDeclarationOccurrence(ofUSR: container.symbol.usr) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still going to be fairly slow for decls with many occurrences. It's at least limited by def/decl, but even then for something like llvm that would still end up reading many records (eg. any file with namespace llvm { in it). Instead we could just use forEachSymbolOccurrence and stop at the first one, since we don't actually care about getting a consistent decl here.

Also, could we add tests for extensions? Eg.

struct Foo {
  struct Bar {}
}

extension Foo.Bar {
  func bar() {}
}

IIUC today this will just give Bar.bar rather than Foo.Bar.bar since bar has childOf to the Foo.Bar extension (which has name Bar) but that extension doesn't have any childOf to Foo.

(I'm fine with all that in a follow up/an issue, since this PR is mostly just moving code around)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still going to be fairly slow for decls with many occurrences. It's at least limited by def/decl, but even then for something like llvm that would still end up reading many records (eg. any file with namespace llvm { in it). Instead we could just use forEachSymbolOccurrence and stop at the first one, since we don't actually care about getting a consistent decl here.

I just measured the performance impact of containerNames. When searching for only in SouceKit-LSP (roughly 8000 hits), we need ~800ms if we don’t compute the container names. Computing container names takes another ~800ms. If we use forEachSymbolOccurrence instead of primaryDefinitionOrDeclarationOccurrence we can save about 150ms of that container name computation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about the exact measurements because I was using a debug build to gather them. But from a Release sample, container name computation still roughly doubles the time to search for only, which now takes ~800ms in total with your suggestion to use forEachSymbolOccurrence.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks!

This improves performance when searching for `only` in SourceKit-LSP by ~30%.
@ahoppen
Copy link
Member Author

ahoppen commented Jan 7, 2025

@swift-ci Please test

@ahoppen ahoppen merged commit 17e2db0 into swiftlang:main Jan 8, 2025
3 checks passed
@ahoppen ahoppen deleted the cache-container-names branch January 8, 2025 08:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants