-
Notifications
You must be signed in to change notification settings - Fork 2
Charter, group mission, observations/assumptions #4
Description
I am going to try to articulate in detail what my initial plans and goals are for this group. These things are of course open to discussion, but I want to stress one thing at the beginning:
This is not design by committee -- those types of efforts usually fail miserably. This is not democratic -- those things usually just divide constituency and never lead to consensus. I am going to lead this group and effort, because it needs to happen and because no one has done it before, so I stepped up. I am going to do the best I can to balance what input people give with the overall big picture and goals.
And I am going to actively seek your feedback to make sure we do the best we can.
Mission Statement
To develop a CST (concrete syntax tree) format which preserves/represents concrete syntax elements (whitespace, comments, etc) in a data structure (tree) "alongside" the AST (abstract syntax tree) elements (VariableDeclaration
, BinaryExpression
, etc), and to evangelize this new format to all tools in the JS tooling ecosystem (parsers, analyzers, transformers, code generators) to gain substantial/complete adoption as a new IR (intermediate representation) standard.
Charter
This ad hoc group of volunteer members seeks to develop a single, new standard (which we're currently calling "CST" -- Concrete Syntax Tree) for the IR (intermediate representation) of JS source code as it passes through various tools in the JS tooling ecosystem, from parsing to analysis to transformation to code (re)regeneration.
This new standard will replace/augment the existing standard (AST -- Abstract Syntax Tree as standardized by the SpiderMonkey Parser API). Note: That does not mean ASTs won't exist, but it means the preferred tree format for exchange will be the CST, and that AST will become a reduction of the CST for certain use-cases that only need/care about AST information.
The goal of the new standard tree format is to provide a standard and reliable way to represent all "concrete syntax" elements which are normally discarded and not kept by the AST, such as "unncessary" whitespace, comments, grammar-redundant ( )
or { }
pairs, etc. These elements are needed for a variety of use-cases which cannot tolerate "information loss" in the round-trip from source-code to IR back to source-code.
This group seeks to discuss several existing proposals for a CST format, hammer out any problems with them, and find one that can gain the most support. We will publish a detailed specification/documentation for this new format, and evangelize and work with all the JS tooling ecosystem members to gain adoption and implementation as quickly as possible.
It is a success condition of this group if many or all JS tools agree, even in principle, to eventually move to this new standard IR format, even if they only can agree to support both CST and AST rather than replace AST. Furthermore, there must be at least one parser that produces CSTs from source code, and at least one code generator which takes a CST and produces code output, as this round-trip is inherent to nearly all use-cases this group concerns itself with.
Assumptions/Observations
-
AST is a lossy format, in that concrete syntax information is lost when a parser takes a program and outputs an AST. This lossy format has served many common use-cases well, but it has not served at all the use-cases which need to retain (and/or use!) this information.
As such, the new CST format that will retain this information must be seen as the primary format, as you can always strip out information from a CST representation to get only an AST, but you cannot go the other direction and restore information which was lost. Note: Some use-cases do call for adding in new "concrete syntax information", such as default whitespace, etc, but that's different than preserving (while parsing) the original information.
We will pay close attention to the tradeoffs in complexity/performance that this implies, and be sensitive to that in what we propose. We will not grossly degrade the performance of existing tools by forcing them to do things like tracking concrete-syntax which they have no concern with, except as it is minimally required to support the rest of the JS tooling ecosystem and use-cases.
-
This group is not an open-ended, unstructured exploration. It will be guided and informed by prior work, and seek to keep to the narrowest scope and process as necessary to get to a proposed solution with widest adoption/consensus.
There have been extensive discussions about various approaches to CST tracking over the 6+ months in various places. Two main proposals surfaced in that discussion, as detailed in the main README of this repo. It is my goal that we first validate both of these proposals, or invalidate them (with proof, not opinion).
We will entertain (and indeed, seek out!) discussion about concrete deficiencies in these proposals, but we will not entertain bikeshedding on opinions of taste on any proposal. If there are unresolvable deficiencies in current proposals, we will entertain alternate proposals, but again we will not get mired in bikeshedding, but rather seek to solve whatever problems exist in any given attempted solution.
-
Since AST is the current standard IR format with these tools, whatever the CST settles as must provide the least amount of friction to existing tools to augmenting or replacing current AST handling.
-
Throwing out the entire AST format and producing a new CST format that is wholly unlike the current AST is likely to produce a lot of friction to implementation with existing tools, even if it can be demonstrated that it would be superior (for some definition of "superior").
As such, a CST that augments AST in some way is generally more preferable as it generally would lead to less friction to implementation (less changes to existing tools' code). We should prefer incrementation/evolution of the current standard rather than reinvention.
-
The form that the CST takes (and how it co-exists with AST elements) matters, because it directly affects how easily the IR of code can pass through multiple tools in a chain of processing. A single tree (that can be textualized as JSON) is the current norm, and it is preferred (again for ease of friction) that the process not become significantly harder, such as creating multiple different streams of data to pass around, etc.
-
There have already been a lot of ad hoc explorations by various tools to tracking whitespace and/or comments, but each tool has done it differently, and none of them have handled all concrete-syntax preservation. All these different previous attempts inform our current attempts, but they are explicitly considered insufficient as the mission is to preserve all concrete-syntax in a standard and agreed-upon way across most/all tools.
As such, the CST effort seeks to replace any of those previous non-standard approaches, even eventually if not immediately. We want to solve problems, not create more problems for future users by having multiple different competing ways to do things and no consensus on how to do it properly.