Skip to content

Feature Request / Discussion - Allow multiple compile_commands.json / settings the target path to the generated compile_commands.json #48

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
molar opened this issue Apr 25, 2022 · 12 comments · Fixed by #44

Comments

@molar
Copy link

molar commented Apr 25, 2022

We have a rather large repo, which means the generation of the compile commands takes a very long time
We would like to split the compile_commands.json into multiple files, allowing us to better "re-generate" a subset of the compile_commands.json. As far as i can tell, multiple compile_commands.json is supported by clangd.

// packageA
refresh_compile_commands(
    name = "refresh_compile_commands",
    targets = {
        "//packageA/...": "",
    },
)

will currently overwrite the top-level one, so the suggestion is to allow a different path for the resulting compile_commands.json.

Thanks for your time on this.

@cpsauer
Copy link
Contributor

cpsauer commented Apr 25, 2022

Hey, Morten! Thanks for your continued interest--and for your thoughtfulness about supporting these broader cases you're encountering.

Sure. Shouldn't be a problem. We can definitely, e.g., let people specify the path.

But before we do, I want to trace things back to the root problem in case there's a less manual way we can solve things for you.

It sounds like the root issue is speed. Is that right? That is, if it ran way faster, you wouldn't need (or want) multiple compile_commands.json? (I have some ideas about how we might be able to speed things up a whole lot by caching the include information between runs, and getting a chunk of it from Bazel. [More here, if curious.])

For my additional understanding/context, could I ask about how long it's taking? (minutes?) And what typically prompts you to rerun?

If you're down, could you also try adding return set() to the top of _get_headers in external/hedron_compile_commands/refresh.template.py and let me know if that runtime would be fast enough for you? (That gives us an upper bound on the gains from optimizing finding the include optimization.)

Cheers,
Chris


As self notes for later, if we do end up doing this with multiple compile_command.json:
Might also save a round trip :)

  • We'll need to check that we can easily specify multiple compile_commands.json files via, e.g. --compile-commands-dir. (You need to manually specify them to properly get out of tree headers, like system headers.) I'm not super up to speed on the multiple compile_commands.json interface just yet, since it was added after I started using clangd and I haven't needed it just yet.
  • We should talk about the interface. Would overriding the path be the best option, vs, like, generating it in a subdirectory? ^ Relates to the above, and depends on whether we'd be able to use files with names besides compile_commands.json.
    • Another option would be partial regeneration and concatenation, but this is starting to look an awful lot like caching--which we should probably just do directly.

@cpsauer
Copy link
Contributor

cpsauer commented May 4, 2022

@molar, any more thoughts on this?

@molar
Copy link
Author

molar commented May 11, 2022

Thank you for the patience,

I have collected some metrics on the two cases, one with the header scanning and one without as you suggested. With header scanning it takes 25 minutes. Without it takes 20 seconds. With header scanning there are around 22000 entries in the compile_commands,json, and without there are around 10000 entries.

We generate the compile_commands,json on every CI run, for usage with Sonarqube, and in that case 20 seconds is acceptable for us.

So as you also write, the root issue is actually the time it takes to generate the compile_commands.json.

For now i have a flag that allows me to "disable" the header scanning, and we can upstream that if you are interested.

Another solution could be to allow a pattern for the source files where include scanning is required? this would allow us to only pay in the cases where clangd uses the wrong settings?

@cpsauer
Copy link
Contributor

cpsauer commented May 12, 2022

Yeah! No problem. Thanks for testing--and for a great answer. I appreciate your thoroughness.

Does Sonarqube need/want headers?

If Sonarqube wants headers, I think the caching described above could get times way down towards that 20s mark, if you're otherwise building or have cache. LMK if we're in this case.

If Sonarqube doesn't want headers, #44 will add an exclude_headers toggle shortly, among some other goodies! Rather than kicking off a separate upstreaming effort, I'd love it if you'd take a quick peek at that PR and weigh in if you have ideas on how we could make things even better! Or just tell Dave that you're excited to use his work :)

@cpsauer
Copy link
Contributor

cpsauer commented May 20, 2022

And @alexander-born, I see your 👀, which I assume means you're watching :)
Are you having a similar need for speed?

[@molar, still curious about the above!]

@alexander-born
Copy link

I am happy once pr #44 gets merged. The pr of Dave reduces regeneration of the compile commands by a lot (ca 10-15 min down to 30s).

@cpsauer
Copy link
Contributor

cpsauer commented May 20, 2022

Sweet. You've probably already read but heads that you'll likely run into some (very annoying) issues with clangd when you edit headers if you turn the header finding off. Are you using this for clangd or for some other tool?

[Current version of #44 doesn't speed things up, I think, but I'll make sure it gets back there.]

I'm trying to feel out whether you guys want headers and therefore the fast/cached header finding--and whether either of you might be willing to help build it.

@cpsauer
Copy link
Contributor

cpsauer commented May 24, 2022

@alexander-born?
(Also, thanks for testing and weighing in on the PR!)

@alexander-born
Copy link

alexander-born commented May 25, 2022

I am using clangd and haven't encountered any issues with pr #44 and

    exclude_headers = "all",
    exclude_external_sources = True,

but that's not to say there are no issue without header finding, just didn't encounter any.

@cpsauer
Copy link
Contributor

cpsauer commented May 26, 2022

Cool. Thanks for letting me know. Working on getting #44 landed.

Just so you're prepared and will recognize the cause of the issue if you see it: Heads that you may run into weirdness if/when clangd infers the wrong header, since it doesn't yet traverse the include graph (see clangd/clangd#123). Hence all the header finding efforts here. But if it works for you, I'm delighted, and it's really good for me to know that it's useable in some cases without header finding. Could I ask you to let me know if you do ever run into any issues there, just so I get a sense?

Also, I'll mark #44 as resolving for now, since #5 is the speed/caching issue. LMK @molar if that seems wrong to you and we can reverse course.

@cpsauer
Copy link
Contributor

cpsauer commented May 28, 2022

Closed by @44. @molar, hopefully that solves things for you. If not--or you still need the compiler unwrapping we discussed in #33, please lmk!

@cpsauer
Copy link
Contributor

cpsauer commented Jun 2, 2022

Hey all! Heads that I just landed code that makes header finding way faster. We now have caching for header file finding and for clang/gcc, piggyback on Bazel's dependency cache--in addition to the options to control header finding merged previously!

That should be a huge improvement in speed. Don't think this can be much faster, and it's now quite good.

(Though if there's a windows user reading this: We might be able to make piggyback Bazel's cache with MSVC too. See #49 (comment) if you're interested in helping.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants