-
Notifications
You must be signed in to change notification settings - Fork 79
Implement Syntax 0.7 #287
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Implement Syntax 0.7 #287
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
* Indentation/Whitespace in 0.7 * Apply feedback * Apply 2nd round of feedback
This is a re-write of the runtime parser. It supports Fluent Syntax 0.7, runs against the reference fixtures, has half the lines of code, and is as fast in SpiderMonkey as the old one (and slightly faster in V8). Goals 1. Support 100% of Fluent Syntax 0.7. This includes the indentation relaxation, dropping tabs and CR as syntax whitespace, normalizing new lines to LF, and only allowing numbers and identifiers as variant keys. 2. Maintain good performance. The parser is used in performance-critical code paths. Back in the days of Firefox OS it had to be both fast _and_ produce tightly packed results so that translations don't take up too much space on the device. I think the storage requirements can be relaxed these days. 3. Write code which will be easy to maintain in the future. The parser was first written even before Fluent branched off from L20n. It's seen many changes and additions over the last two years. As new features accrued it became hard to maintain it and also to keep track of all known bugs. My goal for the re-write was not only to clean it up but also to define the conformance story for the future and to improve the testing infrastructure. Design The parser focuses on minimizing the number of false negatives at the expense of increasing the risk of false positives. In other words, it aims at parsing _valid_ Fluent messages with a success rate of 100%, but it may also parse some invalid messages which the reference parser would reject. The parser doesn't perform any validation and may produce entries which wouldn't make sense in the real world. For best results users are advised to validate translations with the fluent-syntax parser pre-runtime. The main parser loop iterates over the beginnings of messages and terms. This is to efficiently skip over comments (which have no use on runtime), and to recover from errors. When a fatal error is encountered, the parser instantly bails out of the currently-parsed message and moves on to the next one. Errors are discarded and are not visible to the users of `FluentResource`. The do carry a minimal description of what went wrong which may be useful when reading the code and for debugging, though. The parser makes an extensive use of sticky regexes which can be anchored to any offset of the source string without slicing it. In some places, it's easier to just check the character currently at the cursor, so it does a fair share of that, too. Conformance My original plan was to base the parser on the EBNF and only parse well-formed syntax. In this PR, I went for something a bit wider than that: a superset of well-formed syntax. The main deviation from the EBNF is related to parsing `VariantExpressions` and `CallExpressions`. The EBNF verifies that the they are called on `Terms` and `Functions` respectively. The optimistic parser doesn't differentiate between `Messages`, `Terms` and `Functions`. I decided to implement it this way because this code might soon change anyways (see projectfluent/fluent#176). Another deviation is that the parser treats commas in argument lists as whitespace, similar to how Clojure treats them in sequence lists. I might suggest we upstream this in the spec, too, because it makes the implementation of args lists _much_ simpler. I based this PR on top of the `zeroseven` branch. The `fluent-syntax` parser already supports Syntax 0.7 and passes the [reference fixtures](https://github.com/projectfluent/fluent/tree/master/test/fixtures). This made it possible to also turn on the reference testing in the runtime parser, too. `make fixtures` creates the parsed results for all reference fixtures; for now they must be verified manually before they're committed. `make test` can be used in development to assert that the output of the runtime parser still matches the committed one.
Rewind index to improve error recovery
This was referenced Oct 23, 2018
Closed
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.