Parse Comments as standalone if they're attached to Junk #235

stasm · 2018-06-28T18:17:44Z

Syntax 0.6 changes the parsing logic of Comments. Comments attached to what ends up being Junk are not part of Junk anymore. Instead, they end up as standalone Comments in the final Resource.

This PR also changes the behavior of FluentParser.parseEntry() to ignore all valid comments and only start parsing when a Message or a Term is encountered. Comments with syntax errors are not skipped, however.

stasm · 2018-06-29T07:42:18Z

@zbraniecki, @Pike - This implements a change in Syntax 0.6 to how comments are parsed in case of syntax errors in messages and terms they're attached to. Previously, the comment would be parsed as part of the junk; the new behavior is to salvage the comment as standalone and limit the junk to the message.

This has implications for FluentParser.parseEntry() which is used in Pontoon to parse the contents of the source editor. See https://github.com/mozilla/pontoon/blob/master/pontoon/base/static/js/fluent_interface.js. It's an interesting use-case: we're really only interested in parsing Messages and Terms. If the source editor contains a group comment, I guess parseEntry should fail? What about a comment which has a syntax error?

## A group comment
foo = Foo

#Junk comment
foo = Foo

Such questions make changing parseEntry challenging. We never spec'ed it. Right now it just returns the first entry it parses (successfully or not: a GroupComment in the first example and a Junk in the second one), which might not be what Pontoon needs.

I made changes to parseEntry in the PR which I'd like to get your input on.

Right now all of the uses of parseEntry in Pontoon ignore comments, either implicitly or explicitly. I chose to make this behavior explicit in this PR: all comments are ignored, unless they contain syntax errors themselves, in which case the Junk they produced is returned. Thanks to this logic, the following snippet returns junk rather the comment as standalone, ignoring junk.

# A comment
junk

We will need to revisit the design of parseEntry when we allow localizers to save their own comments. This will require many other changes in Pontoon, too, so I'm not too worried about it right now.

Pike

This should work fine.

One question, in the reference parser, we support three line endings,

https://github.com/projectfluent/fluent/blob/d2a5363f413d9b963f9b2299329760156e6a7ba9/syntax/grammar.mjs#L463-L468

Seems that the js parser only handles unix line endings?

Another purely code-style comment is about the outer loop and the re-entry into getEntryOrJunk. I'm using a lastComment pattern in compare-locales, but I have no idea if that'd be simpler here, or if it would be in 0.7. Also something that'd be perfectly fine for a separate issue to try out.

Pike · 2018-07-02T09:47:07Z

fluent-syntax/src/parser.js

+      // Messages or Terms if they are followed immediately by them. However
+      // they should parse as standalone when they're followed by Junk.
+      // Consequently, we only attach them once we know that the Message or the
+      // Term parsed successfully.


Not sure if a logic based on lastComment would be easier or not. I.e., if you hit a Comment without blankLines, stash it. If you hit a Message or Term, use lastComment if given, otherwise push lastComment and entry.

Once we add top-level whitespace with projectfluent/fluent#116, does one of the two become more straightforward? Would we fold that top-level entry into getEntryOrJunk?

I think I like the lastComment approach better. I'll implement it in a separate commit so that we can draw inspiration from the entries.push(entry, next) approach in the future, if needed.

Pike · 2018-07-02T10:05:31Z

On the question which API tools should use:

Right now, pontoon uses a single API for four use-cases (AFAICT): One is to find out details about the reference entry. Given that the message/term came from the parser, it's probably OK to assume that there are no errors.

Also, it parses reference and l10n to create the short display version. One would hope that there are no parser errors left here?

The other is to sanitize the serialization. It creates a serialized message, parses that, and serializes again. That code puzzles me a bit. https://github.com/mozilla/pontoon/blob/26811442d45a0a7be6ee7cc8703a7f218e866860/pontoon/base/static/js/fluent_interface.js#L810

Last but not least, to sanitize the fluent text are. I'd think we can do better here in making errors visible. For this case in particular, I'd expect that anything that's not the actual message, we should show an error, and refuse to save. Right now, trailing Junk is just dropped, which isn't what I'd expect?

I wonder if we could make that API even take the expected ID explicitly.

Anywho, I think the changes you made to parseEntry go in the right direction, so I think we can tweak this in a follow-up.

Syntax 0.6 changes the parsing logic of Comments. Comments attached to what ends up being Junk are not part of Junk anymore. Instead, they end up as standalone Comments in the final Resource. This PR also changes the behavior of FluentParser.parseEntry() to ignore all valid comments and only start parsing when a Message or a Term is encountered. Comments with syntax errors are not skipped, however.

stasm · 2018-07-02T13:53:59Z

Thanks for the review, @Pike!

Seems that the js parser only handles unix line endings?

There's #228 which fixes this. I haven't yet had time to reply to @zbraniecki in it.

Another purely code-style comment is about the outer loop and the re-entry into getEntryOrJunk. I'm using a lastComment pattern in compare-locales, but I have no idea if that'd be simpler here, or if it would be in 0.7. Also something that'd be perfectly fine for a separate issue to try out.

I actually like the lastComment approach better. Thanks for suggesting it. I'll try to implement it now.

stasm · 2018-07-02T14:02:00Z

Right now, pontoon uses a single API for four use-cases (AFAICT): One is to find out details about the reference entry. Given that the message/term came from the parser, it's probably OK to assume that there are no errors.

Correct, that was my assumption as well.

Also, it parses reference and l10n to create the short display version. One would hope that there are no parser errors left here?

Agreed here as well.

The other is to sanitize the serialization. It creates a serialized message, parses that, and serializes again. That code puzzles me a bit. https://github.com/mozilla/pontoon/blob/26811442d45a0a7be6ee7cc8703a7f218e866860/pontoon/base/static/js/fluent_interface.js#L810

Hmm, interesting indeed. Perhaps this serves as a prettifier?

Last but not least, to sanitize the fluent text area. I'd think we can do better here in making errors visible. For this case in particular, I'd expect that anything that's not the actual message, we should show an error, and refuse to save.

I agree that we should improve the source editing experience. Ideally I'd see us using a proper web IDE solution for this, which would likelu want to use the full tooling parser.

Right now, trailing Junk is just dropped, which isn't what I'd expect?

The current implementation of parseEntry will happily return the first Entry it's able to parse, be it a Comment, GroupComment, ResourceComment, Message, Term or Junk. Anything after the first one is ignored.

I wonder if we could make that API even take the expected ID explicitly.

In one iteration of this PR I changed parseEntry(source) to parseEntry(id, source) :) I reverted this change to limit the impact on the API changes. I still think it would be worth following up with. At the point where we change the signature, we might also rename the method. I think we're learning through Pontoon that this is a very special use-case. A general parseEntry doesn't satisfy it.

Anywho, I think the changes you made to parseEntry go in the right direction, so I think we can tweak this in a follow-up.

Cool. Let's see how the upgrade to Syntax 0.6 in Pontoon goes and then we can research the parseEntry use-cases in detail and change the API as needed.

stasm requested a review from zbraniecki June 29, 2018 07:42

stasm mentioned this pull request Jun 29, 2018

fluent-syntax: Implement Syntax 0.6 #206

Closed

12 tasks

Pike approved these changes Jul 2, 2018

View reviewed changes

Use lastComment to stash attached Comments

11b96ed

stasm force-pushed the comment-junk branch from 5469213 to 11b96ed Compare July 2, 2018 14:12

stasm merged commit 6e46458 into projectfluent:feature/syntax-0.6 Jul 2, 2018

stasm deleted the comment-junk branch July 2, 2018 15:05

stasm mentioned this pull request Jul 2, 2018

Reference parser stores Message Comment as Standalone Comment if Message has errors. projectfluent/fluent#143

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Parse Comments as standalone if they're attached to Junk #235

Parse Comments as standalone if they're attached to Junk #235

Uh oh!

stasm commented Jun 28, 2018 •

edited

Loading

Uh oh!

stasm commented Jun 29, 2018

Uh oh!

Pike left a comment

Uh oh!

Pike Jul 2, 2018

Uh oh!

stasm Jul 2, 2018

Uh oh!

Pike commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

Uh oh!

Parse Comments as standalone if they're attached to Junk #235

Parse Comments as standalone if they're attached to Junk #235

Uh oh!

Conversation

stasm commented Jun 28, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

stasm commented Jun 29, 2018

Uh oh!

Pike left a comment

Choose a reason for hiding this comment

Uh oh!

Pike Jul 2, 2018

Choose a reason for hiding this comment

Uh oh!

stasm Jul 2, 2018

Choose a reason for hiding this comment

Uh oh!

Pike commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

stasm commented Jul 2, 2018

Uh oh!

Uh oh!

stasm commented Jun 28, 2018 •

edited

Loading