Skip to content

Setup incremental indexing on file changes #323

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jan 20, 2022

Conversation

daplf
Copy link
Contributor

@daplf daplf commented Dec 5, 2021

Fixes #272

This seems to work reasonably well for most cases. I only identified a couple of issues so far. I'll describe those at the end.

Up until now, indexing was done when the first file was compiled. The indexing covered the entire project including all the dependencies. After that, no more indexing was performed.

With this new approach, the first time a file is compiled, we index every symbol (including dependencies). The difference now is that on subsequent compilations, the files' declarations are re-indexed. This process is currently removing all the declarations of the changed files and adding the new ones. This might not be the most performant approach (especially since most declarations will be removed just to be added again right after), but I think it'll do for now (I didn't test with very large files).

I initially planned to include the locations of the declarations in the index. Unfortunately, performance can take a big hit when indexing the dependencies because we're fetching locations for all the declarations available. We can probably still have them, but we'll need to come up with a better approach to populate them. I nonetheless kept the DB tables for the locations so we can use them in the future. I can remove these tables if you'd like.

As mentioned in the beginning, I only found two issues so far:

  • When a project is opened, only the current open file is compiled. All other files are parsed but not compiled. Since only the current open file is compiled, the other files' declarations are added to the index. However, when we then open the files, they will be compiled. This compilation will lead to their declarations being added again. This results in duplicate entries on the index for all the files that were not compiled in the beginning. I don't see this as being too annoying for now, so I think we can solve this later. The solution is unclear to me, because I'm still wondering whether it even makes sense for the files not to be compiled on startup. For example, if I have an error in a file when I open the project, the error will only be shown when I actually open the file (this seems a bit strange to me). In any case, there are multiple ways to solve this problem, but I think compiling everything in the beginning would be the best way to fix it.
  • Nested classes are not included in the index yet. This can be fixed later on.

Note that this only re-indexes declarations on file changes. If a new dependency is added, we're not re-indexing yet. This can be done in a later PR.

@fwcd fwcd added code completion Auto completion enhancement New feature or request index Related to the symbol indexer labels Dec 6, 2021
@fwcd
Copy link
Owner

fwcd commented Dec 6, 2021

Wow, many thanks for digging into this, looks really good already! Very excited to take a closer look at this, I hope to find some time to do so soon.

Copy link
Owner

@fwcd fwcd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work, I have tested it locally and it seems to work pretty well. Just a few comments (mostly for understanding).

Comment on lines 85 to 87
var progressFactory: Progress.Factory = object: Progress.Factory {
override fun create(label: String): CompletableFuture<Progress> = CompletableFuture.supplyAsync { Progress.None }
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not using Progress.Factory.None? If it's important that the future completes asynchronously, we may want to update Progress.Factory.None itself since I don't think we use it elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, Progress.Factory.None makes more sense 👍

@@ -125,6 +124,9 @@ class SourcePath(
if (isTemporary) (all().asSequence() + sequenceOf(parsed!!)).toList()
else all()
}

// Creates a shallow copy
fun clone(): SourceFile = SourceFile(uri, content, path, parsed, compiledFile, compiledContext, compiledContainer, language, isTemporary)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, could we make SourceFile a data class to get this implementation for free? Or isn't that possible because it is an inner class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, data classes cannot be inner at the same time.

private const val MAX_SHORT_NAME_LENGTH = 80
private const val MAX_URI_LENGTH = 511

private object Symbols : IntIdTable() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to understand, we are now using integers (instead of fully-qualified names) to identify symbols. Can we still be sure that uniqueness is maintained (perhaps by using .uniqueIndex() instead of .index())?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I didn't use a unique index because I think it would complicate things when a user creates multiple classes with the same fully qualified name. Even though it's illegal (and will lead to compilation errors), it's still technically possible for the user to do this. Therefore, I chose to include the duplicate entries in the index.

We could certainly make it unique, but I think we would need to add some more logic to prevent duplicates. And if we ever start using the index for other things (like definitions, for example) duplicated symbols would be ignored from that.

}
}

private fun Transaction.addDeclarations(declarations: Sequence<DeclarationDescriptor>) =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we use anything from Transaction's this in this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. Removed

@fwcd
Copy link
Owner

fwcd commented Jan 4, 2022

Also yes, we should probably move to compiling all files on startup, the question would mostly be how to do so in a in a performant manner (the indexing already blocks many LSP operations despite being asynchronous, so I assume the compiler holds some locks).

@daplf
Copy link
Contributor Author

daplf commented Jan 5, 2022

Also yes, we should probably move to compiling all files on startup, the question would mostly be how to do so in a in a performant manner (the indexing already blocks many LSP operations despite being asynchronous, so I assume the compiler holds some locks).

Cool, I can look at this later and introduce it in a following PR if I find a performant way to do it.

@daplf daplf requested a review from fwcd January 14, 2022 17:06
@daplf
Copy link
Contributor Author

daplf commented Jan 14, 2022

@fwcd Hey. I dived a bit into it and changed some things.

Now all files are compiled on startup and we only index the dependencies on startup. This solves the issue I initially highlighted.

The dependency indexing is also more isolated now. This should make #273 easier to implement as well.

Let me know what you think

@fwcd
Copy link
Owner

fwcd commented Jan 20, 2022

Looks good, thanks!

@fwcd fwcd merged commit d5b0aab into fwcd:main Jan 20, 2022
@daplf daplf deleted the setup-incremental-indexing branch January 20, 2022 17:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
code completion Auto completion enhancement New feature or request index Related to the symbol indexer
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reindex project incrementally as local files change
2 participants