WIP/RFC: Parse .cabal files to a source code representation #6621

m-renaud · 2020-03-29T08:49:20Z

NOTE: This is a work in progress, looking for input on direction/structuring (see Open Questions below).

TL;DR; Don't perform all simplification steps during parsing, do them as a follow-up transformation.

Why

Allows cabal format to maintain key pieces of source code (common stanzas for example cabal format inlines and removes common stanzas #5734)
Separation of concerns; simplifies parser logic (no longer need to strange FromBuildInfo typeclass while parsing)

How

Create a source representation in the AST (PackageSourceDescription) for common stanza sections (CommonStanza) and for import directives within sections (CommonStanzaImports).
Don't perform inlining of sections during parsing, leave that to another step. The transformation from PackageSourceDescription is not yet implemented, but should be pretty trivial.

Another alternative is to update GenericPackageDescription in place, but I thought it may be useful (if not more code) to have a separate representation. May be too complicated but we could look at the "trees that grow" approach that is used in GHC as well.

NOTE: For simplicity I forked the code from GenericPackageDescription instead of modifying in place to keep things separate, so there's a lot of duplication right now. We'll likely want to refactor several modules as pre-steps to this.

NOTE: Ignore the HasCommonStanzaImports typeclass, that's an artifact from another approach I took and isn't used currently.

Testing

I've updated the cabal format command to round trip thorough PackageSourceDescription instead of GenericPackageDescription and it appears to work for all the common cases I've tested (including common stanzas that import other common stanzas).

Open Questions

What's up with PackageDescription and GenericPackageDescription? Why do they both exist? Historical reasons or separate concerns?
Currently I've added a CommonStanzaImports field to each of the section types that can have a common stanza, this results in having to insert memptys in places where these sections are created manually (and thus will never have imports). A more principled way would be to create separate types for ExecutableSource, LibrarySource etc. which has the CommonStanzaImports fields, and leave the existing section types used in GenericPackageDescription unchanged.
As a first step should we keep this separate from the GenericPackageDescription parser and just use it for cabal format?
Is there good test coverage for parsing to GenericPackageDescription? In other words, if I replace the String -> GenericPackageDescription parser with a String -> PackageSourceDescription parser and then a separate PackageSourceDescription -> GenericPackageDescription transform, how confident can I be that it works?

/cc @phadej @gbaz

Please include the following checklist in your PR:

Patches conform to the coding conventions.
Any changes that could be relevant to users have been recorded in the changelog.
The documentation has been updated, if necessary.
If the change is docs-only, [ci skip] is used to avoid triggering the build bots.

Please also shortly describe how you tested your change. Bonus points for added tests!

phadej · 2020-03-29T08:59:45Z

We already have `Field` structure which can be used to format .cabal files. See how `cabal-fmt` is implemented.

…

Sent from my iPhone

On 29. Mar 2020, at 11.49, Matt Renaud ***@***.***> wrote: NOTE: This is a work in progress, looking for input on direction/structuring (see Open Questions below). TL;DR; Don't perform all simplification steps during parsing, do them as a follow-up transformation. Why Allows cabal format to maintain key pieces of source code (common stanzas for example #5734) Separation of concerns; simplifies parser logic (no longer need to strange FromBuildInfo typeclass while parsing) How Create a source representation in the AST (PackageSourceDescription) for common stanza sections (CommonStanza) and for import directives within sections (CommonStanzaImports). Don't perform inlining of sections during parsing, leave that to another step. The transformation from PackageSourceDescription is not yet implemented, but should be pretty trivial. Another alternative is to update GenericPackageDescription in place, but I thought it may be useful (if not more code) to have a separate representation. May be too complicated but we could look at the "trees that grow" approach that is used in GHC as well. NOTE: For simplicity I forked the code from GenericPackageDescription instead of modifying in place to keep things separate, so there's a lot of duplication right now. We'll likely want to refactor several modules as pre-steps to this. NOTE: Ignore the HasCommonStanzaImports typeclass, that's an artifact from another approach I took and isn't used currently. Testing I've updated the cabal format command to round trip thorough PackageSourceDescription instead of GenericPackageDescription and it appears to work for all the common cases I've tested (including common stanzas that import other common stanzas). Open Questions What's up with PackageDescription and GenericPackageDescription? Why do they both exist? Historical reasons or separate concerns? Currently I've added a CommonStanzaImports field to each of the section types that can have a common stanza, this results in having to insert memptys in places where these sections are created manually (and thus will never have imports). A more principled way would be to create separate types for ExecutableSource, LibrarySource etc. which has the CommonStanzaImports fields, and leave the existing section types used in GenericPackageDescription unchanged. As a first step should we keep this separate from the GenericPackageDescription parser and just use it for cabal format? Is there good test coverage for parsing to GenericPackageDescription? In other words, if I replace the String -> GenericPackageDescription parser with a String -> PackageSourceDescription parser and then a separate PackageSourceDescription -> GenericPackageDescription transform, how confident can I be that it works? /cc @phadej @gbaz Please include the following checklist in your PR: Patches conform to the coding conventions. Any changes that could be relevant to users have been recorded in the changelog. The documentation has been updated, if necessary. If the change is docs-only, [ci skip] is used to avoid triggering the build bots. Please also shortly describe how you tested your change. Bonus points for added tests! You can view, comment on, or merge this pull request online at: #6621 Commit Summary Initial check-in of PackageSourceDescription. Checkpoint: common stanzas definitions maintained. Checkpoint: Maintain common stanza imports within sections. Simplify parser. Clean up code, update cabal format command. File Changes M Cabal/Cabal.cabal (8) M Cabal/Distribution/PackageDescription/FieldGrammar.hs (59) A Cabal/Distribution/PackageDescription/PackageSourceDescriptionParser.hs (794) A Cabal/Distribution/PackageDescription/PackageSourceDescriptionPrettyPrint.hs (219) M Cabal/Distribution/PackageDescription/Parsec.hs (5) M Cabal/Distribution/Simple/Build.hs (4) M Cabal/Distribution/Simple/Haddock.hs (2) M Cabal/Distribution/Types/Benchmark.hs (10) M Cabal/Distribution/Types/Benchmark/Lens.hs (5) A Cabal/Distribution/Types/CommonStanza.hs (55) A Cabal/Distribution/Types/CommonStanza/Lens.hs (28) A Cabal/Distribution/Types/CommonStanzaImports.hs (56) A Cabal/Distribution/Types/CommonStanzaImports/Lens.hs (18) M Cabal/Distribution/Types/Executable.hs (7) M Cabal/Distribution/Types/Executable/Lens.hs (4) M Cabal/Distribution/Types/ForeignLib.hs (9) M Cabal/Distribution/Types/ForeignLib/Lens.hs (5) A Cabal/Distribution/Types/Import.hs (34) M Cabal/Distribution/Types/Lens.hs (4) M Cabal/Distribution/Types/Library.hs (8) M Cabal/Distribution/Types/Library/Lens.hs (5) A Cabal/Distribution/Types/PackageSourceDescription.hs (82) A Cabal/Distribution/Types/PackageSourceDescription/Lens.hs (132) M Cabal/Distribution/Types/TestSuite.hs (8) M Cabal/Distribution/Types/TestSuite/Lens.hs (5) M Cabal/tests/Instances/TreeDiff.hs (4) M cabal-install/main/Main.hs (8) Patch Links: https://github.com/haskell/cabal/pull/6621.patch https://github.com/haskell/cabal/pull/6621.diff — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

m-renaud · 2020-03-29T16:46:51Z

We already have Field structure which can be used to format .cabal files. See how cabal-fmt is implemented.

Yeah, that's exactly what I'm using. cabal-fmt round trips through GenericPackageDescription via readGenericPackageDescription verbosity path >>= writeGenericPackageDescription path (see formatAction code).

The problem is that GenericPackageDescription and the concrete Field instances that are used do not have any representation for imports, and the parser used to back readGenericPackageDescription never actually parses common stanzas or imports into any structure, it keeps them as internal state to fold into parsing of other sections (see processImports). Here I've created a new section type and updated the FieldGrammar definition to understand common stanzas and import directives.

The open questions outline some various trade-offs we can make regarding embedding the source structure within the existing field types (Executable, Library, Benchmark, etc) or creating a parallel structure.

I imagine that this source representation will grow to include comments as well for preservation, and this could also be used to provide an intermediary form of the package description which could have processing steps performed on them (maybe via "cabal plugins" or similar). This could provide a possible place to address things like automatic module discovery (#5343) in a more generic way and also integrate with cabal format (since you won't want the *s expanded during formatting, only during interpretation).

phadej · 2020-03-30T00:27:04Z

To clarify, https://hackage.haskell.org/package/cabal-fmt is not the cabal format. cabal-fmt vs cabal format. So

Yeah, that's exactly what I'm using. cabal-fmt round trips through GenericPackageDescription via readGenericPackageDescription verbosity path >>= writeGenericPackageDescription path (see formatAction code).

is incorrect.

m-renaud · 2020-03-30T01:11:01Z

Ahhhh I see, I wasn't aware of that package, I misread that that as cabal format. Is there any reason to not fold the functionality into cabal proper? Or is the plan to deprecate cabal format and just move all formatting functionality to cabal-fmt?

It seems like it would be beneficial to have this functionality as part of the cabal itself, this way as the Cabal format changes the formatting can be kept in sync. It also removes the discover-ability barrier of having to find and install another package.

phadej · 2020-03-30T10:11:08Z

It seems like it would be beneficial to have this functionality as part of the cabal itself, this way as the Cabal format changes the formatting can be kept in sync.

I agree. Yet, I'm not ready for bikeshedding. In cabal-fmt I can end it by saying that the tool is opinionated. cabal-install have to be configurable. I'm optimistic that I get to the rework of cabal.config and cabal.project parsers this year, so there could be a way to add configuration for formatting too.

Otherwise there will be issues like phadej/cabal-fmt#11, in a sense the fact no-one uses cabal format is a blessing. Syntax and style is too easy to tell opinion about.

m-renaud · 2020-03-30T16:26:30Z

In cabal-fmt I can end it by saying that the tool is opinionated. cabal-install have to be configurable

That's fair, but I would also imagine that a package named cabal-fmt would be a general tool and cabal-fmt-for-phadej would be your personal flavour 😜

I'm optimistic that I get to the rework of cabal.config and cabal.project parsers this year, so there could be a way to add configuration for formatting too.

I don't think it's reasonable to wait until some unspecified point in the future where a parser rewrite happens before making incremental improvements now. It's true that we'll eventually want some degree of configuration when formatting Cabal files (and some small subset of the community may have incredibly specific needs), but I imagine there is a pretty large group of the community that would be happy with cabal coming with a formatter with a couple of nobs (even just indentation). It doesn't make sense to me to tie creating a better internal representation to rewriting the parser for it.

Also, I'd be willing to build out the configuration infrastructure as well, it's not a particularly difficult problem to solve in the grand scheme of things. There are some improvements that I think would be useful as well such as separating the folding in of common stanza data into the other sections, imho that should be a separate step from parsing.

in a sense the fact no-one uses cabal format is a blessing. Syntax and style is too easy to tell opinion about

As a generally available tool that we're going out and saying "everyone start using this now!" I agree. But, at the same time I disagree because it fills a real hole in the current tooling for those who want it (even if its not feature complete or fully configurable now). We could get it to a point where it handles most use cases (really just the fact that it inlines common stanzas right now is probably the only reason more folks aren't using it), and then make an announcement that it is available as an early access tool, and to expect changes to come in the future until a point at which all the configurability is available.

So, what I'm looking for are concrete next steps here, I don't want to block all improvements in this space on waiting for the perfect end state to be defined, and as you said if no one is using the tool we should feel free to make changes and see what we like and don't like. Are there specific implementation details that you don't like here? Would you prefer if I do more prefactoring work to clean up the code before making these changes (to make it more clear how much is actually changing)? Would you like to see a write-up of a longer term plan for formatting in cabal? Can I help with the parser changes you mentioned above? Is there a group of people I should reach out to for wider input on this? I'd be happy to do any and all of these, I'm just looking for guidance and actionable input to move this forward.

phadej · 2020-03-30T17:30:31Z

I don't think it's reasonable to wait until some unspecified point in the future where a parser rewrite happens before making incremental improvements now.

Most functionality of cabal-fmt is already present in Cabal. cabal-fmt starts to be opinionated in how it renders individual fields. Better version of readGPD >>= writeGPD can be already done with parseFields >>= showFields.

The internal representation, i.e. Field is already there.

.. and to expect changes to come in the future until a point at which all the configurability is available.

Please take a lead, champion the functionality and be responsive when follow up issues and requests are reported and created. The progress is not forbidden. But there have to be a plan.

It's true that since readGPD >>= writeGPD was fixed (#4719, check out who fixed that) there are new opportunities,. I have also since extending the parsing functionality of Cabal based on the lessons from writing various experiments for formatting. cabal-fmt the latest (but not the only one).

E.g. comments were added so

Yet, I don't have time to deal with formatting related functionality in cabal-install.

Some alternative to GPD representation is required. With Field (which already exists) you are be able to preserve ordering of fields, and (some) comments (as illustrated by cabal-fmt). With GPD-like variant which don't inline common stanzas, you'll loose ordering information and combine fields defined in multiple steps.

Importantly, GPD is an input structure for cabal-install's solver (or rather CondTree BuildInfo like structures). Thus there a huge gap between source code and ~~first~~ second internal representation.

I don't think the latter (GPD-like structure losing ordering and comments) is what people would want, even as incremental "update".

Some people would also like to have ghc-exactprint like level of control of the output, i.e. that you can round trip the .cabal file completely, without destroying any formatting if you don't want to. To allow writing general refactoring tools. Field is too coarse for that. That would require completely new parser. It might make sense to have that as separate codepath, where one could deal with ill-formed input too, in .cabal files error-recovery is relatively easy.

I'd suggest you to look into #6187 and #5555, the annotations support I mention in #5555 is now in Cabal. You are the cabal init lead now.

Kleidukos · 2023-05-17T06:36:00Z

Marking this PR as draft 🙂

ffaf1 · 2025-05-08T18:22:53Z

Hello, I am going through old PRs to check whether they are stale.

If this PR is still “live”, write a comment and I will remove the consider closing label. Otherwise in ≃ 2 weeks this PR will be closed.

m-renaud added 5 commits March 28, 2020 12:49

Initial check-in of PackageSourceDescription.

7b459cb

Checkpoint: common stanzas definitions maintained.

58b748b

Checkpoint: Maintain common stanza imports within sections.

f968f9a

Simplify parser.

cc40b4c

Clean up code, update cabal format command.

3b4e80c

Mikolaj added the exact-print label Jun 26, 2021

emilypi mentioned this pull request Aug 14, 2021

Meta: Exact-printer Mega-issue #7544

Open

mergify bot added the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Sep 1, 2022

ulysses4ever removed the merge delay passed Applied (usually by Mergify) when PR approved and received no updates for 2 days label Sep 3, 2022

Kleidukos marked this pull request as draft May 17, 2023 06:35

ulysses4ever added attention: needs-help Help wanted with this issue/PR historical labels May 19, 2023

ffaf1 added the status: consider closing label May 8, 2025

ffaf1 closed this Jun 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

WIP/RFC: Parse .cabal files to a source code representation #6621

WIP/RFC: Parse .cabal files to a source code representation #6621

Uh oh!

m-renaud commented Mar 29, 2020

Uh oh!

phadej commented Mar 29, 2020 via email

Uh oh!

m-renaud commented Mar 29, 2020 •

edited

Loading

Uh oh!

phadej commented Mar 30, 2020 •

edited

Loading

Uh oh!

m-renaud commented Mar 30, 2020

Uh oh!

phadej commented Mar 30, 2020 •

edited

Loading

Uh oh!

m-renaud commented Mar 30, 2020

Uh oh!

phadej commented Mar 30, 2020

Uh oh!

Kleidukos commented May 17, 2023

Uh oh!

ffaf1 commented May 8, 2025

Uh oh!

Uh oh!

WIP/RFC: Parse .cabal files to a source code representation #6621

WIP/RFC: Parse .cabal files to a source code representation #6621

Uh oh!

Conversation

m-renaud commented Mar 29, 2020

Why

How

Testing

Open Questions

Uh oh!

phadej commented Mar 29, 2020 via email

Uh oh!

m-renaud commented Mar 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

phadej commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-renaud commented Mar 30, 2020

Uh oh!

phadej commented Mar 30, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

m-renaud commented Mar 30, 2020

Uh oh!

phadej commented Mar 30, 2020

Uh oh!

Kleidukos commented May 17, 2023

Uh oh!

ffaf1 commented May 8, 2025

Uh oh!

Uh oh!

m-renaud commented Mar 29, 2020 •

edited

Loading

phadej commented Mar 30, 2020 •

edited

Loading

phadej commented Mar 30, 2020 •

edited

Loading