Skip to content

WIP/RFC: Parse .cabal files to a source code representation #6621

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

m-renaud
Copy link
Collaborator

NOTE: This is a work in progress, looking for input on direction/structuring (see Open Questions below).

TL;DR; Don't perform all simplification steps during parsing, do them as a follow-up transformation.

Why

How

  • Create a source representation in the AST (PackageSourceDescription) for common stanza sections (CommonStanza) and for import directives within sections (CommonStanzaImports).

  • Don't perform inlining of sections during parsing, leave that to another step. The transformation from PackageSourceDescription is not yet implemented, but should be pretty trivial.

Another alternative is to update GenericPackageDescription in place, but I thought it may be useful (if not more code) to have a separate representation. May be too complicated but we could look at the "trees that grow" approach that is used in GHC as well.

NOTE: For simplicity I forked the code from GenericPackageDescription instead of modifying in place to keep things separate, so there's a lot of duplication right now. We'll likely want to refactor several modules as pre-steps to this.

NOTE: Ignore the HasCommonStanzaImports typeclass, that's an artifact from another approach I took and isn't used currently.

Testing

I've updated the cabal format command to round trip thorough PackageSourceDescription instead of GenericPackageDescription and it appears to work for all the common cases I've tested (including common stanzas that import other common stanzas).

Open Questions

  1. What's up with PackageDescription and GenericPackageDescription? Why do they both exist? Historical reasons or separate concerns?

  2. Currently I've added a CommonStanzaImports field to each of the section types that can have a common stanza, this results in having to insert memptys in places where these sections are created manually (and thus will never have imports). A more principled way would be to create separate types for ExecutableSource, LibrarySource etc. which has the CommonStanzaImports fields, and leave the existing section types used in GenericPackageDescription unchanged.

  3. As a first step should we keep this separate from the GenericPackageDescription parser and just use it for cabal format?

  4. Is there good test coverage for parsing to GenericPackageDescription? In other words, if I replace the String -> GenericPackageDescription parser with a String -> PackageSourceDescription parser and then a separate PackageSourceDescription -> GenericPackageDescription transform, how confident can I be that it works?

/cc @phadej @gbaz


Please include the following checklist in your PR:

  • Patches conform to the coding conventions.
  • Any changes that could be relevant to users have been recorded in the changelog.
  • The documentation has been updated, if necessary.
  • If the change is docs-only, [ci skip] is used to avoid triggering the build bots.

Please also shortly describe how you tested your change. Bonus points for added tests!

@phadej
Copy link
Collaborator

phadej commented Mar 29, 2020 via email

@m-renaud
Copy link
Collaborator Author

m-renaud commented Mar 29, 2020

We already have Field structure which can be used to format .cabal files. See how cabal-fmt is implemented.

Yeah, that's exactly what I'm using. cabal-fmt round trips through GenericPackageDescription via readGenericPackageDescription verbosity path >>= writeGenericPackageDescription path (see formatAction code).

The problem is that GenericPackageDescription and the concrete Field instances that are used do not have any representation for imports, and the parser used to back readGenericPackageDescription never actually parses common stanzas or imports into any structure, it keeps them as internal state to fold into parsing of other sections (see processImports). Here I've created a new section type and updated the FieldGrammar definition to understand common stanzas and import directives.

The open questions outline some various trade-offs we can make regarding embedding the source structure within the existing field types (Executable, Library, Benchmark, etc) or creating a parallel structure.

I imagine that this source representation will grow to include comments as well for preservation, and this could also be used to provide an intermediary form of the package description which could have processing steps performed on them (maybe via "cabal plugins" or similar). This could provide a possible place to address things like automatic module discovery (#5343) in a more generic way and also integrate with cabal format (since you won't want the *s expanded during formatting, only during interpretation).

@phadej
Copy link
Collaborator

phadej commented Mar 30, 2020

To clarify, https://hackage.haskell.org/package/cabal-fmt is not the cabal format. cabal-fmt vs cabal format. So

Yeah, that's exactly what I'm using. cabal-fmt round trips through GenericPackageDescription via readGenericPackageDescription verbosity path >>= writeGenericPackageDescription path (see formatAction code).

is incorrect.

@m-renaud
Copy link
Collaborator Author

Ahhhh I see, I wasn't aware of that package, I misread that that as cabal format. Is there any reason to not fold the functionality into cabal proper? Or is the plan to deprecate cabal format and just move all formatting functionality to cabal-fmt?

It seems like it would be beneficial to have this functionality as part of the cabal itself, this way as the Cabal format changes the formatting can be kept in sync. It also removes the discover-ability barrier of having to find and install another package.

@phadej
Copy link
Collaborator

phadej commented Mar 30, 2020

It seems like it would be beneficial to have this functionality as part of the cabal itself, this way as the Cabal format changes the formatting can be kept in sync.

I agree. Yet, I'm not ready for bikeshedding. In cabal-fmt I can end it by saying that the tool is opinionated. cabal-install have to be configurable. I'm optimistic that I get to the rework of cabal.config and cabal.project parsers this year, so there could be a way to add configuration for formatting too.

Otherwise there will be issues like phadej/cabal-fmt#11, in a sense the fact no-one uses cabal format is a blessing. Syntax and style is too easy to tell opinion about.

@m-renaud
Copy link
Collaborator Author

In cabal-fmt I can end it by saying that the tool is opinionated. cabal-install have to be configurable

That's fair, but I would also imagine that a package named cabal-fmt would be a general tool and cabal-fmt-for-phadej would be your personal flavour 😜

I'm optimistic that I get to the rework of cabal.config and cabal.project parsers this year, so there could be a way to add configuration for formatting too.

I don't think it's reasonable to wait until some unspecified point in the future where a parser rewrite happens before making incremental improvements now. It's true that we'll eventually want some degree of configuration when formatting Cabal files (and some small subset of the community may have incredibly specific needs), but I imagine there is a pretty large group of the community that would be happy with cabal coming with a formatter with a couple of nobs (even just indentation). It doesn't make sense to me to tie creating a better internal representation to rewriting the parser for it.

Also, I'd be willing to build out the configuration infrastructure as well, it's not a particularly difficult problem to solve in the grand scheme of things. There are some improvements that I think would be useful as well such as separating the folding in of common stanza data into the other sections, imho that should be a separate step from parsing.

in a sense the fact no-one uses cabal format is a blessing. Syntax and style is too easy to tell opinion about

As a generally available tool that we're going out and saying "everyone start using this now!" I agree. But, at the same time I disagree because it fills a real hole in the current tooling for those who want it (even if its not feature complete or fully configurable now). We could get it to a point where it handles most use cases (really just the fact that it inlines common stanzas right now is probably the only reason more folks aren't using it), and then make an announcement that it is available as an early access tool, and to expect changes to come in the future until a point at which all the configurability is available.

So, what I'm looking for are concrete next steps here, I don't want to block all improvements in this space on waiting for the perfect end state to be defined, and as you said if no one is using the tool we should feel free to make changes and see what we like and don't like. Are there specific implementation details that you don't like here? Would you prefer if I do more prefactoring work to clean up the code before making these changes (to make it more clear how much is actually changing)? Would you like to see a write-up of a longer term plan for formatting in cabal? Can I help with the parser changes you mentioned above? Is there a group of people I should reach out to for wider input on this? I'd be happy to do any and all of these, I'm just looking for guidance and actionable input to move this forward.

@phadej
Copy link
Collaborator

phadej commented Mar 30, 2020

I don't think it's reasonable to wait until some unspecified point in the future where a parser rewrite happens before making incremental improvements now.

Most functionality of cabal-fmt is already present in Cabal. cabal-fmt starts to be opinionated in how it renders individual fields. Better version of readGPD >>= writeGPD can be already done with parseFields >>= showFields.

The internal representation, i.e. Field is already there.

.. and to expect changes to come in the future until a point at which all the configurability is available.

Please take a lead, champion the functionality and be responsive when follow up issues and requests are reported and created. The progress is not forbidden. But there have to be a plan.

It's true that since readGPD >>= writeGPD was fixed (#4719, check out who fixed that) there are new opportunities,. I have also since extending the parsing functionality of Cabal based on the lessons from writing various experiments for formatting. cabal-fmt the latest (but not the only one).

E.g. comments were added so

Yet, I don't have time to deal with formatting related functionality in cabal-install.


Some alternative to GPD representation is required. With Field (which already exists) you are be able to preserve ordering of fields, and (some) comments (as illustrated by cabal-fmt). With GPD-like variant which don't inline common stanzas, you'll loose ordering information and combine fields defined in multiple steps.

Importantly, GPD is an input structure for cabal-install's solver (or rather CondTree BuildInfo like structures). Thus there a huge gap between source code and first second internal representation.

I don't think the latter (GPD-like structure losing ordering and comments) is what people would want, even as incremental "update".

Some people would also like to have ghc-exactprint like level of control of the output, i.e. that you can round trip the .cabal file completely, without destroying any formatting if you don't want to. To allow writing general refactoring tools. Field is too coarse for that. That would require completely new parser. It might make sense to have that as separate codepath, where one could deal with ill-formed input too, in .cabal files error-recovery is relatively easy.


I'd suggest you to look into #6187 and #5555, the annotations support I mention in #5555 is now in Cabal. You are the cabal init lead now.

@mergify mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Sep 1, 2022
@ulysses4ever ulysses4ever removed the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Sep 3, 2022
@Kleidukos Kleidukos marked this pull request as draft May 17, 2023 06:35
@Kleidukos
Copy link
Member

Marking this PR as draft 🙂

@ffaf1
Copy link
Collaborator

ffaf1 commented May 8, 2025

Hello, I am going through old PRs to check whether they are stale.

If this PR is still “live”, write a comment and I will remove the consider closing label. Otherwise in ≃ 2 weeks this PR will be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants