CS2 Discussion: Project: Revisiting implementation of a new parser

Hello. I'm a long time lurker here and would like to (re?)-raise a proposal.
Sorry for the long post 😄 

### The What:

I'd like to re-raise the issue of **implementing a new parser for CoffeeScript**.

Some previous related discussions:
* https://github.com/coffeescript6/discuss/issues/25#issuecomment-247904501
* https://github.com/coffeescript6/discuss/issues/21#issuecomment-237980538
* https://github.com/coffeescript6/discuss/issues/33
* https://github.com/coffeescript6/discuss/issues/76

#### Scope:

The scope is **intentionally** limited to **only** creating a new parser.
No intent to touch the lexer & re-writer nor to modify the code generation parts.


### The why:

As previously discussed the existing CS compiler infrastructure is a **limiting
factor** in the long term for CoffeeScript. 
- Strings based code generation
- Incompatibility with Babel AST structures
- Difficulty in expanding Jison's capabilities
  * See [pulse](https://github.com/zaach/jison/pulse/monthly) of Jison.
  * See [commit graph](https://github.com/zaach/jison/graphs/contributors) of Jison.
  * Particularly in the context of language services related capabilities such as error recovery / partial parsing. 
  
Replacing the whole pipeline **at once** requires more resources than available to this project.
And even if those resources were available it is still a **very risky** approach.
 
Therefore an **incremental** approach is needed.


### Architecture:

I propose to create a **separation** between the syntactic analysis and the AST creation.
This means that logic that creates the AST must not be embedded inside the parser.
Instead the parser should create a more low level structure, a Parse Tree / Concrete Syntax Tree.
which could be transformed afterwards to serve different needs, for example:

- Transformation to create the existing CS AST to support the existing compiler backend.
- Transformation to create a Babel AST to support a new experimental compiler backend.
- Transformation to an enriched AST structure that represents the entire syntactic information to support
  language services tool such a formatting & refactoring.
  
This proposed **separation of concerns** will help to future proof the CoffeeScript compiler
by enabling future **incremental** changes such as replacing the compiler backend without
modifying (or diverging from) the compiler frontend (parser).


### The How:

**Warning Sales pitch incoming**

Normally the standard approach to writing a parser for a compiler is to write one "by hand".
* See [quote](https://github.com/antlr/antlr4/blob/master/doc/faq/general.md) from Terence Parr (the creator of Antlr):
  >In my experience, almost no one uses parser generators to build commercial compilers.

The problem with this approach is that it can be a bit repetitive and error prone work.
And that implementing more advanced capabilities such as fault tolerance capabilities can be complex.
fortunately the last time I needed to write an hand built parser I was too lazy 😸  and instead
created a library that makes it easier to hand build parsers in JavaScript: [Chevrotain](https://github.com/SAP/chevrotain)
without any code generation.

Relevant Highlights:
* [JSON Grammar example in CoffeeScript.](https://github.com/SAP/chevrotain/blob/master/examples/implementation_languages/coffeescript/coffeescript_json.coffee)
* [Performance Benchmark (Jison included).](http://sap.github.io/chevrotain/performance/)
  - About **one order of magnitude** faster than Jison on Chrome 57.
* [Automatic CST output creation.](http://sap.github.io/chevrotain/playground/)
  - This in an online playground, the first example also creates a CST output.
* [Automatic fault tolerance capabilities.](http://sap.github.io/chevrotain/playground/?example=tutorial%20fault%20tolerance)
  
The proposal is to write the new CoffeeScript parser **in CoffeeScript** (no code generation).
Using the Chevrotain Parsing library. 
  
  
### The who:  

I can contribute enough time to try implementing this.
I obviously can't make any promises, but this won't be the first parser I've written so I've got a decent
chance of success.


### Risks & Issues:

* Factoring away left recursion (for LL(k) parser) may result in uglier parse trees.

* Do the CoffeeScript's Token contain full position information?
  - A worst case may require changes to the re-writer or even replacing the whole
    lexer -> re-writer -> parser flow, but that is a less incremental approach.
    
* My CoffeeScript skills are lacking, may require assistance in getting the code to decent quality.

* Error messages contents and structure for invalid inputs will change.

* Testing that the AST output is the same requires a large amount of valid CS source code.

* Additional abstraction and separation will have an overhead performance wise.
  - Should be mitigated by the higher base performance of Chevrotain vs Jison.


### Questions:

* **Any feedback / suggestions?**

* Am I missing some blocker or potential show stopper here?

* Is this approach acceptable/approved by the project leaders?

* If a POC succeeds will there be assistance in integrating this into the CoffeeScript code base?

* What percentage of the CoffeeScript running time is spent parsing?
  - I'm trying to figure out the potential performance benefits for an E2E compilation flow.


  

  
 

  
 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

CS2 Discussion: Project: Revisiting implementation of a new parser #77

The What:

Scope:

The why:

Architecture:

The How:

The who:

Risks & Issues:

Questions:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

CS2 Discussion: Project: Revisiting implementation of a new parser #77

Description

The What:

Scope:

The why:

Architecture:

The How:

The who:

Risks & Issues:

Questions:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions