Skip to content

Proposal: add several fields to SymbolInformation #154

Closed
@olafurpg

Description

@olafurpg
Member

This issue is an umbrella for several proposed additions to SCIP, based on discussions with @donsbot on Mastodon .

SymbolInformation.display_name

This would be the name of the symbol, which is both helpful for local variables and avoids parsing the name from the symbol. The field could be name instead of display_name, we use display_name in SemanticDB to emphasize that this name is meant to be displayed (and should therefore not have special encoding for non-ASCII characters like emojis.

SymbolInformation.owner

Alternative name parent. The thinking with this field is that it avoids parsing the owner from the symbol, and it allows us to emit an owner for local symbols.

SymbolInformation.kind

An enum that specifies what kind of symbol this is (enum/interface/method/...). Currently, Descriptor.Suffix doesn't encode enough fine-grained information (and it's intentionally named "suffix" to emphasize that it's primarily related to the syntax of the symbol.

SymbolInformation.signature_documentation

A string-formatted rendering of the signature. Currently, indexers emit this information in the documentation field as markdown-formatted code blocks. Having a separate field makes it cleaner to extract only the signature. I propose we reserve the field SymbolInformation.signature for fully typed/structured signatures (not string-formatted signatures).

Activity

changed the title [-]Proposall: add several fields to `SymbolInformation`[/-] [+]Proposal: add several fields to `SymbolInformation`[/+] on May 17, 2023
donsbot

donsbot commented on May 23, 2023

@donsbot
Contributor

SymbolInformation.kind

For symbol kinds, a variant of the lsif/vscode spec. E.g. currently used by Glean : https://github.com/facebookincubator/Glean/blob/main/glean/schema/source/codemarkup.types.angle#L38 ; these are used for search and outline views: to e.g. constraint a search to methods or classes only, or when searching by name, to show the kind of each result.

SymbolInformation.signature_documentation

this is somewhat important, as the scip index conflates "Documentation". e.g. in rust-analyzer the type signature and the documentation markdown are combined. They should ideally be separate "tables" in the output, keyed by scip.Symbol, as depending on the scenario you might only want to show the signature (e.g. a search page), or docs (an API page).

SymbolInformation.display_name

This is another search related application. Imagine taking the scip data and efficiently listing all symbols called 'vec'. We'd need a way to extrac the local name. Another scenario: search for foo::bar::vec , where 'bar' might be a subclass of something. In each case we need to know the precise local name and parent name ("qualified name"). These can be extracted by parsing the scip.Symbol but that's something the protocol could already do,

olafurpg

olafurpg commented on May 23, 2023

@olafurpg
MemberAuthor

@donsbot Agreed! We have these pieces of information in SemanticDB for similar reasons. If we add these fields to SCIP, would you be interested in contributing the changes to rust-analyzer? I estimate it will be easy to update scip-java to emit the new fields. We can create a tracking issue for the remaining languages.

donsbot

donsbot commented on May 23, 2023

@donsbot
Contributor

Yep we would likely want to extend rust-analyzer immediately to support these.

added a commit that references this issue on May 23, 2023
olafurpg

olafurpg commented on May 23, 2023

@olafurpg
MemberAuthor

I opened a PR adding SymbolInformation.kind as a starting point. Will follow up with separate PRs adding the remaining fields.

added a commit that references this issue on May 25, 2023
olafurpg

olafurpg commented on May 25, 2023

@olafurpg
MemberAuthor

@donsbot can you elaborate on your motivation for SymbolInformation.owner? I am OK with adding this field on the condition it should only be used for locals. My concern with adding this field for global symbols is that indexers will start to emit malformed global symbols. I think it's a very desirable property that clients can implement a lot of functionality with an array of symbols (string[]) without having to load a global symbol table (record<string, SymbolInformation>). If we limit SymbolInformation.owner to locals then this problem is alleviated since it's cheap to load a local-only symbol table.

I think it should be unavoidable that clients need to implement a symbol parser to walk up the symbol hierarchy. The Java logic for the symbol parser is ~150 lines of code and we could aim to provide similar APIs for all language bindings.

donsbot

donsbot commented on May 25, 2023

@donsbot
Contributor

I was imagining most indexers usually know the "containing" symbol.

  • method -> class/interface
  • param -> method
  • class -> module (e.g. python)

which lets us build navigation by containing scope.

There's an equivalent relationship for parent by "inheritance" (e.g. A extends B).

So In this case I was just interested in the relationship between a symbol and its parent (at the scip.Symbol level, not the textual names).

Obviously if we have something like a/b/c() , we can figure out that a/b might be the containing parent. But then I need to search for scip.Symbol that match? or is this existent relationship present somewhere else?

donsbot

donsbot commented on May 25, 2023

@donsbot
Contributor

The specific product use case is not just symbol outlines on. page, but API listings. E.g. generating cargo-like docs for a language: E.g. looking up "class C" to see its API:

Class C

- method 1
- method 2
- method 3

Inherits:
- A::method4
- B::method5
olafurpg

olafurpg commented on May 25, 2023

@olafurpg
MemberAuthor

@donsbot You can parse the owner from the symbol fields to support those use-cases (assuming they're global symbols). The Java symbol parser is 150 lines of code and we have had a decent experience porting this logic to other languages with ChatGPT https://github.com/sourcegraph/scip-java/blob/main/scip-semanticdb/src/main/java/com/sourcegraph/scip_semanticdb/SymbolDescriptor.java#L21

I am open to add SymbolInformation.owner (or SymbolInformation.enclosing_symbol) to support the same functionality for local symbols.

Would that be a satisfiable solution? I'm not 100% against allowing owner/enclosing_symbol for global symbols but I am concerned it increases the risk that indexers stop emitting well-structured symbols as I mentioned in my last comment.

donsbot

donsbot commented on May 26, 2023

@donsbot
Contributor

Ok sounds good. Yes I will need it for locals (and can derive it for globals).

olafurpg

olafurpg commented on Jun 4, 2023

@olafurpg
MemberAuthor

Closing this as fixed after #164 since all the proposed fields have been added to the spec now. However(!), none of the existing SCIP indexers have been updated yet to emit the new information. Feel free to open an issue in this repository or each SCIP indexer repository if you'd like the indexer to add support for these new fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @donsbot@olafurpg

        Issue actions

          Proposal: add several fields to `SymbolInformation` · Issue #154 · sourcegraph/scip