Skip to content

RFC: new intermediate cabal file representation. #3614

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
phadej opened this issue Jul 25, 2016 · 11 comments
Open

RFC: new intermediate cabal file representation. #3614

phadej opened this issue Jul 25, 2016 · 11 comments

Comments

@phadej
Copy link
Collaborator

phadej commented Jul 25, 2016

Problem: Ultimately we want formatting-preserving programmagic refactorings of .cabal files.

Current situation:

  • The parsec parser's Field structure is annotated with source position (parametrised over ann), but
    • values are ByteStrings
    • Field could represent any cabal-file-like structure (e.g. nested sections)
  • GenericPackageDescription
    • doesn't have annotations
    • some preprocessing already done, like instances of the same field (e.g. build-depends) on the same level are merged.

Solution:

  • Introduce a new structure (CabalAst ann) with source annotations, which represents valid (to some degree) cabal files.
  • Change parsing pipeline from Field ann → GenericPackageDescription to Field ann → CabalAst ann → GenericPackageDescription
  • As a proof-of-concept, change cabal gen-bounds to work on CabalAst ann

This RFC doesn't propose how CabalAst would look like specifically, as some exprerimentation on implementation is needed. ghc-exactprint.

ping @alanz @hvr @dcoutts

@23Skidoo
Copy link
Member

Would be nice to have something like that for config files as well.

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

@23Skidoo, isn't config files flat: i.e. no sections? We could experiment on them first, as they are simpler.

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

From IRC discussion: starting with ~/.cabal/config would be also easier as the field types are much simpler. I have no good idea how to represent build-depends in editable yet formatting-preservable way.

Also: @dcoutts' experiment: http://code.haskell.org/~duncan/cabal-ast-experiment/

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

@dcoutts also proposed to change GenericPackageDescription into CabalAst i.e. into something which supports refactoring.

@23Skidoo
Copy link
Member

@phadej No, they have sections: repository, install-dirs, program-locations, etc.

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

@23Skidoo: yes, but sections are predefined and used to group fields. i.e. the file could be flat.

@dcoutts
Copy link
Contributor

dcoutts commented Jul 25, 2016

@phadej so when I thought about it some time ago I concluded that there were only a few kinds of fields. Lets see if I can remember what they are:

  • free-form text like description etc
  • single atom, like category, license
  • list of atoms, like module lists, source files, ghc options
  • single compound expression, cabal-version
  • multiple comma-separated compound expressions, like build-depends

The point is, it may be possible to pre-split the list fields and then only keep pos info at that level, not within the AST of the field entries.

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

@dcoutts makes sense. Compound expressions are still tricky.

@phadej
Copy link
Collaborator Author

phadej commented Jul 25, 2016

Also to remember: https://github.com/alanz/cabal-ast-play

@alanz
Copy link
Collaborator

alanz commented Jul 26, 2016

Here is a brain dump from me

Based on my GHC / ghc-exactprint experiences I would suggest using an initial (from the parser) AST that has a clear mapping to the underlying cabal file, and generally keeps things in order. So resist putting all like things together, especially if they can be interleaved in the file.

Then have an ann parameter, which can have location info initially, and delta info later to be pretty printed.

If the initial AST gets processed for use, it makes sense to track in some clear way how each part is derived from the initial one, even if that is done via an annotation at that level too.

So the annotations are never part of the main line of processing, only ever used for round-tripping (with possible modification the cabal file).

@phadej
Copy link
Collaborator Author

phadej commented Jun 23, 2019

I updated the description. Also to complement @dcoutts comment, current parsec approach parses ByteString to whatever field type is. There is no intermediate step, parsing e.g. to a list of tokens first. Maybe there should be (I even think it could help with the performance! or at least not ruin it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants