Skip to content

Parse Comments as standalone if they're attached to Junk #235

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 2, 2018

Conversation

stasm
Copy link
Contributor

@stasm stasm commented Jun 28, 2018

Syntax 0.6 changes the parsing logic of Comments. Comments attached to what ends up being Junk are not part of Junk anymore. Instead, they end up as standalone Comments in the final Resource.

This PR also changes the behavior of FluentParser.parseEntry() to ignore all valid comments and only start parsing when a Message or a Term is encountered. Comments with syntax errors are not skipped, however.

@stasm
Copy link
Contributor Author

stasm commented Jun 29, 2018

@zbraniecki, @Pike - This implements a change in Syntax 0.6 to how comments are parsed in case of syntax errors in messages and terms they're attached to. Previously, the comment would be parsed as part of the junk; the new behavior is to salvage the comment as standalone and limit the junk to the message.

This has implications for FluentParser.parseEntry() which is used in Pontoon to parse the contents of the source editor. See https://github.com/mozilla/pontoon/blob/master/pontoon/base/static/js/fluent_interface.js. It's an interesting use-case: we're really only interested in parsing Messages and Terms. If the source editor contains a group comment, I guess parseEntry should fail? What about a comment which has a syntax error?

## A group comment
foo = Foo
#Junk comment
foo = Foo

Such questions make changing parseEntry challenging. We never spec'ed it. Right now it just returns the first entry it parses (successfully or not: a GroupComment in the first example and a Junk in the second one), which might not be what Pontoon needs.

I made changes to parseEntry in the PR which I'd like to get your input on.

Right now all of the uses of parseEntry in Pontoon ignore comments, either implicitly or explicitly. I chose to make this behavior explicit in this PR: all comments are ignored, unless they contain syntax errors themselves, in which case the Junk they produced is returned. Thanks to this logic, the following snippet returns junk rather the comment as standalone, ignoring junk.

# A comment
junk

We will need to revisit the design of parseEntry when we allow localizers to save their own comments. This will require many other changes in Pontoon, too, so I'm not too worried about it right now.

@stasm stasm requested a review from zbraniecki June 29, 2018 07:42
@stasm stasm mentioned this pull request Jun 29, 2018
12 tasks
Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should work fine.

One question, in the reference parser, we support three line endings,

https://github.com/projectfluent/fluent/blob/d2a5363f413d9b963f9b2299329760156e6a7ba9/syntax/grammar.mjs#L463-L468

Seems that the js parser only handles unix line endings?

Another purely code-style comment is about the outer loop and the re-entry into getEntryOrJunk. I'm using a lastComment pattern in compare-locales, but I have no idea if that'd be simpler here, or if it would be in 0.7. Also something that'd be perfectly fine for a separate issue to try out.

// Messages or Terms if they are followed immediately by them. However
// they should parse as standalone when they're followed by Junk.
// Consequently, we only attach them once we know that the Message or the
// Term parsed successfully.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if a logic based on lastComment would be easier or not. I.e., if you hit a Comment without blankLines, stash it. If you hit a Message or Term, use lastComment if given, otherwise push lastComment and entry.

Once we add top-level whitespace with projectfluent/fluent#116, does one of the two become more straightforward? Would we fold that top-level entry into getEntryOrJunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I like the lastComment approach better. I'll implement it in a separate commit so that we can draw inspiration from the entries.push(entry, next) approach in the future, if needed.

@Pike
Copy link
Contributor

Pike commented Jul 2, 2018

On the question which API tools should use:

Right now, pontoon uses a single API for four use-cases (AFAICT): One is to find out details about the reference entry. Given that the message/term came from the parser, it's probably OK to assume that there are no errors.

Also, it parses reference and l10n to create the short display version. One would hope that there are no parser errors left here?

The other is to sanitize the serialization. It creates a serialized message, parses that, and serializes again. That code puzzles me a bit. https://github.com/mozilla/pontoon/blob/26811442d45a0a7be6ee7cc8703a7f218e866860/pontoon/base/static/js/fluent_interface.js#L810

Last but not least, to sanitize the fluent text are. I'd think we can do better here in making errors visible. For this case in particular, I'd expect that anything that's not the actual message, we should show an error, and refuse to save. Right now, trailing Junk is just dropped, which isn't what I'd expect?

I wonder if we could make that API even take the expected ID explicitly.

Anywho, I think the changes you made to parseEntry go in the right direction, so I think we can tweak this in a follow-up.

Syntax 0.6 changes the parsing logic of Comments. Comments attached to what
ends up being Junk are not part of Junk anymore. Instead, they end up as
standalone Comments in the final Resource.

This PR also changes the behavior of FluentParser.parseEntry() to ignore all
valid comments and only start parsing when a Message or a Term is encountered.
Comments with syntax errors are not skipped, however.
@stasm
Copy link
Contributor Author

stasm commented Jul 2, 2018

Thanks for the review, @Pike!

Seems that the js parser only handles unix line endings?

There's #228 which fixes this. I haven't yet had time to reply to @zbraniecki in it.

Another purely code-style comment is about the outer loop and the re-entry into getEntryOrJunk. I'm using a lastComment pattern in compare-locales, but I have no idea if that'd be simpler here, or if it would be in 0.7. Also something that'd be perfectly fine for a separate issue to try out.

I actually like the lastComment approach better. Thanks for suggesting it. I'll try to implement it now.

@stasm
Copy link
Contributor Author

stasm commented Jul 2, 2018

Right now, pontoon uses a single API for four use-cases (AFAICT): One is to find out details about the reference entry. Given that the message/term came from the parser, it's probably OK to assume that there are no errors.

Correct, that was my assumption as well.

Also, it parses reference and l10n to create the short display version. One would hope that there are no parser errors left here?

Agreed here as well.

The other is to sanitize the serialization. It creates a serialized message, parses that, and serializes again. That code puzzles me a bit. https://github.com/mozilla/pontoon/blob/26811442d45a0a7be6ee7cc8703a7f218e866860/pontoon/base/static/js/fluent_interface.js#L810

Hmm, interesting indeed. Perhaps this serves as a prettifier?

Last but not least, to sanitize the fluent text area. I'd think we can do better here in making errors visible. For this case in particular, I'd expect that anything that's not the actual message, we should show an error, and refuse to save.

I agree that we should improve the source editing experience. Ideally I'd see us using a proper web IDE solution for this, which would likelu want to use the full tooling parser.

Right now, trailing Junk is just dropped, which isn't what I'd expect?

The current implementation of parseEntry will happily return the first Entry it's able to parse, be it a Comment, GroupComment, ResourceComment, Message, Term or Junk. Anything after the first one is ignored.

I wonder if we could make that API even take the expected ID explicitly.

In one iteration of this PR I changed parseEntry(source) to parseEntry(id, source) :) I reverted this change to limit the impact on the API changes. I still think it would be worth following up with. At the point where we change the signature, we might also rename the method. I think we're learning through Pontoon that this is a very special use-case. A general parseEntry doesn't satisfy it.

Anywho, I think the changes you made to parseEntry go in the right direction, so I think we can tweak this in a follow-up.

Cool. Let's see how the upgrade to Syntax 0.6 in Pontoon goes and then we can research the parseEntry use-cases in detail and change the API as needed.

@stasm stasm merged commit 6e46458 into projectfluent:feature/syntax-0.6 Jul 2, 2018
@stasm stasm deleted the comment-junk branch July 2, 2018 15:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants