Skip to content

Fallbacks for variables in syntactically invalid declarations #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
catamorphism opened this issue Mar 4, 2024 · 9 comments
Closed

Fallbacks for variables in syntactically invalid declarations #703

catamorphism opened this issue Mar 4, 2024 · 9 comments
Labels
errors Issues related to the errors section of the spec formatting Issue pertains to the formatting section of the spec normative Issue affects normative text in the specification resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. specification Issue affects the specification test-suite Issue pertains to tests

Comments

@catamorphism
Copy link
Collaborator

Consider the following syntactically incorrect message:

.local $bar {|foo|} {{{$bar}}}

This is incorrect because the "=" between $bar and {|foo|} is missing.

The spec requires the implementation to signal a syntax error, but also provide a partial result.

Should the result of formatting be foo since the right-hand side of the declaration is syntactically correct, although the declaration itself isn't? Or should it be {$bar} because no valid declaration for $bar exists?

There is a similar test case in https://github.com/unicode-org/message-format-wg/blob/main/test/syntax-errors.json#L50 , but this file doesn't specify formatting output for syntactically invalid messages.

The basic issue is that this text from formatting.md:

In a declaration, the resolved value of the expression is bound to a variable, which is available for use by later expressions. Since a variable can be referenced in different ways later, implementations SHOULD NOT immediately fully format the value for output.

doesn't say whether to bind a variable to a resolved value if an error is encountered while processing the declaration.

I'll note that a previous version of the messageformat package included a test file that included the following test:

{
    "src": "let $bar {|foo|} {{$bar}}",
    "exp": "foo",
    "errors": [{ "type": "missing-char" }]
  },

which implies that $bar should be bound to a value in the environment even though processing its declaration should cause the formatter to emit a syntax error. However, I don't see a basis for this (or the other interpretation) in the spec.

@catamorphism catamorphism added normative Issue affects normative text in the specification specification Issue affects the specification formatting Issue pertains to the formatting section of the spec test-suite Issue pertains to tests errors Issues related to the errors section of the spec labels Mar 4, 2024
@catamorphism catamorphism changed the title Fallbacks for variables in syntactically invalid declarations are invalid Fallbacks for variables in syntactically invalid declarations Mar 4, 2024
@aphillips
Copy link
Member

It's a syntax error. We say this about syntax errors:

Syntax Errors and Data Model Errors apply to all message processors, and MUST be emitted as soon as possible. The other error categories are only emitted during formatting, but it might be possible to detect them with validation tools.

You'll also find buried in the formatting spec this instruction:

If the message being formatted has any Syntax Errors or Data Model Errors, the result of pattern selection MUST be a pattern resolving to a single fallback value using the message's fallback string defined in the formatting context or if this is not available or empty, the U+FFFD REPLACEMENT CHARACTER �.

That is, in this case, we emit the logo. While a human might intuit that the rest of the example message is intact and it's only missing an equals sign, there's no way for the parser to know for sure. It's also not possible for the parser to know if other Very Bad Things might happen if $bar were given some fallback value.

I do find that it's confusing that the above quoted paragraph (the "If the message..." one) is not more prominent.

@catamorphism
Copy link
Collaborator Author

You'll also find buried in the formatting spec this instruction:

Since that text is under the "Pattern Selection" header, it could be read as only applying to a selectors message. (It could also be read the other way, since selecting a pattern from a set containing 1 pattern is also pattern selection, but I think there's no obvious reading of it.)

@aphillips
Copy link
Member

I agree. A syntax error can occur in a simple message:

This has a {$syntax :error

... so we should clarify that the quoted paragraph applies to all syntax problems.

@eemeli
Copy link
Collaborator

eemeli commented Mar 5, 2024

Probably the right prominent place for this is the formatting intro, where we need to replace this MAY with MUST:

If this construction has encountered any _Syntax Errors_ or _Data Model Errors_,
an appropriate error MUST be emitted and a _fallback value_ MAY be used as the formatting result.

@aphillips
Copy link
Member

... or we could just get rid of "MAY be" (or convert to SHOULD).

Some syntax errors are recoverable (at least to the point of not requiring the logo be emitted for the whole message)--if the error is restricted to the inside of an expression, the expression would fall back without escaping to obliterate the message. With lazy evaluation, the message as a whole might even work for some inputs. It's probably better to blow up so that the message gets fixed in pre-prod, but we don't have to require it. Should we allow that?


Anyway, should we attempt to clarify pre-preview?

@catamorphism do you agree that this is the right behavior?

aphillips added a commit that referenced this issue Mar 5, 2024
Fixes #703 

This fix moves the syntax/data model error resolution text verbatim from the bottom of the pattern selection section to the formatting intro (except to replace the words 'pattern selection' with 'formatting', making it more general).

In addition, this changes uses US vs. UK spelling in the first sentence of the intro (an editorial nit) and removes the existing similar instruction about fallbacks. This is a normative change because the previous text had "MAY" for the fallback.

I also rephrased the "To start..." paragraph to be less chatty by using an imperative (this is an editorial change).
@catamorphism
Copy link
Collaborator Author

@catamorphism do you agree that this is the right behavior?

I'm not sure what "this" is :)

I can read the current spec in several different ways (let's just focus on syntax errors, for simplicity):

  1. Any syntax error anywhere in the message means the whole message should format to the single fallback value.
  2. Same as (1), except that a syntax error encountered while parsing an expression MAY be recovered from (by using a fallback for the syntactically incorrect expression, but not for the entire message).
  3. Same as (2), but with MUST instead of MAY.
  4. (1) but allowing lazy evaluation: syntax errors in expressions, declarations or patterns that are not needed for the formatting of the chosen pattern MAY be ignored.
  5. (2) but allowing lazy evaluation: "ignored if the expression's value is never demanded" instead of "recovered from"

(5) is what allows an implementation to make the set of error-free messages as large as possible.

@aphillips
Copy link
Member

Sorry I wasn't clear. The proposal is that syntax/data model errors fail the message outright.

I agree that expression-internal syntax errors are sometimes recoverable and that we can allow these to be evaluated lazily (only parsed when needed) or that we can allow expression internal error to fallback appropriately or do something implementation defined. This sort of fallback is described in the expression resolution section, btw.

(5) is what allows an implementation to make the set of error-free messages as large as possible.

That isn't quite accurate. It allows an implementation to make the set of error-free formatted messages as large as possible, at the cost of potentially hiding errors in a way that can appear to be "transient". I think it is worth careful consideration, since most parsers will run over the whole message, not skipping patterns until needed. We have a separate error (Invalid Expression) for problems inside of an expression.

@catamorphism
Copy link
Collaborator Author

I agree that hiding errors isn't necessarily a good thing! Option (1) is the most consistent with all programming languages I'm aware of (all syntax errors fail the program). And I personally support it; parsing the syntax is complicated enough without the requirement to recover from syntax errors. If there are good error diagnostics, I don't think providing partial output adds much.

(That being said, ICU4C makes it hard to give good error diagnostics beyond line and column numbers, and adding better diagnostics is unlikely to happen for the tech preview.)

@catamorphism
Copy link
Collaborator Author

I think this has been addressed by subsequent spec changes to the effect that syntax errors are non-recoverable.

@catamorphism catamorphism added the resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. label Oct 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
errors Issues related to the errors section of the spec formatting Issue pertains to the formatting section of the spec normative Issue affects normative text in the specification resolve-candidate This issue appears to have been answered or resolved, and may be closed soon. specification Issue affects the specification test-suite Issue pertains to tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants