diff --git a/spec/message.abnf b/spec/message.abnf index 1adfe7a3c9..0f31bbf3de 100644 --- a/spec/message.abnf +++ b/spec/message.abnf @@ -2,21 +2,24 @@ message = [s] *(declaration [s]) body [s] declaration = let s variable [s] "=" [s] expression body = pattern - / (selectors 1*([s] variant)) + / (matcher) + +pattern = "{" *(text / expression) "}" + +matcher = match 1*([s] selector) 1*([s] variant) +selector = expression +variant = when 1*(s key) [s] pattern -pattern = "{" *(text / expression) "}" -selectors = match 1*([s] expression) -variant = when 1*(s key) [s] pattern key = literal / "*" expression = "{" [s] ((operand [s annotation]) / annotation) [s] "}" -operand = literal / variable +operand = literal / variable annotation = (function *(s option)) / reserved -literal = quoted / unquoted +literal = quoted / unquoted variable = "$" name function = (":" / "+" / "-") name -option = name [s] "=" [s] (literal / variable) +option = name [s] "=" [s] (literal / variable) ; reserved keywords are always lowercase let = %x6C.65.74 ; "let" diff --git a/spec/syntax.md b/spec/syntax.md index 1cedae38f4..bc4291c504 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -87,285 +87,253 @@ The syntax specification takes into account the following design restrictions: private-use code points (U+E000 through U+F8FF, U+F0000 through U+FFFFD, and U+100000 through U+10FFFD), unassigned code points, and other potentially confusing content. -## Overview & Examples +## Messages and Syntax -_This section is non-normative._ - -### Messages - -All messages, including simple ones, are enclosed in `{…}` delimiters: - - {Hello, world!} - -The same message defined in a `.properties` file: - -```properties -app.greetings.hello = {Hello, world!} -``` - -The same message defined inline in JavaScript: - -```js -let hello = new MessageFormat('{Hello, world!}') -hello.format() -``` - -### Expression - -An _expression_ represents a part of a message that will be determined -during the message's formatting. - -An _expression_ always uses `{…}` delimiters. -An _expression_ can appear as a local variable value, as a _selector_, and within a _pattern_. - -A simple _expression_ is a bare variable name: - - {Hello, {$userName}!} - -### Formatting Functions - -A _function_ is named functionality, possibly with _options_, that format, -process, or operate on an _operand_ which may be either a _literal_ or a _variable_. -For example, a _message_ with a `$date` _variable_ formatted with the `:datetime` _function_: - {Today is {$date :datetime weekday=long}.} - -A _message_ with a `$userName` _variable_ formatted with -the custom `:person` _function_ capable of -declension (using either a fixed dictionary, algorithmic declension, ML, etc.): - - {Hello, {$userName :person case=vocative}!} - -A _message_ with a `$userObj` _variable_ formatted with -the custom `:person` _function_ capable of -plucking the first name from the object representing a person: +### Messages - {Hello, {$userObj :person firstName=long}!} +A **_message_** is the complete template for a specific message formatting request. -A message with two markup-like _functions_, `button` and `link`, -which the runtime can use to construct a document tree structure for a UI framework: +The complete syntax of a _message_ is described by the ABNF. - {{+button}Submit{-button} or {+link}cancel{-link}.} +> **Note** +> +> This syntax is designed to be embeddable into many different programming langauges +> and formats. As such, it avoids constructs, such as character escapes, that are +> specific to a given file format or processor. In particular, it avoids using +> quote characters common to many file formats and programming languages, such that +> these do not need to be escaped in the body of a _message_. -An opening element MAY be present in a message without a corresponding closing element, -and vice versa. - -### Selection +> **Note** +> +> In general (and except where required by the syntax), whitespace carries no meaning +> in the structure of a _message_. While many of the examples in this spec are written +> on multiple lines, the formatting shown in this spec is primarily for readability. +> +>> Example. +>> This _message_: +>> ``` +>> let $foo = {|horse|} +>> {You have a {$foo}!} +>>``` +>> +>> Can also be written as: +>> ``` +>> let $foo={|horse|}{You have a {$foo}!} +>> ``` +> +> An exception to this is: whitespace inside a _pattern_ is _always_ significant. -A _selector_ selects a specific _pattern_ from a list of available _patterns_ -in a _message_ based on the value of its _expression_. -A message can have multiple selectors. +A _message_ consists of two parts: +1. an optional list of _declarations_, followed by +2. a _body_ -A message with a single _selector_: +All _messages_ MUST contain a _body_. +An empty string is not a valid _message_. - match {$count :number} - when 1 {You have one notification.} - when * {You have {$count} notifications.} +> A simple message: +>``` +>{Hello, world!} +>``` +>The same message defined in a `.properties` file: +> +>```properties +>app.greetings.hello = {Hello, world!} +>``` +> +>The same message defined inline in JavaScript: +> +>```js +>let hello = new MessageFormat('{Hello, world!}') +>hello.format() +>``` -A message with a single _selector_ which is an invocation of -a custom function `:platform`, formatted on a single line: +A _message_ satisfying all rules of the grammar is considered _well-formed_. - match {:platform} when windows {Settings} when * {Preferences} +Furthermore, a _well-formed_ _message_ is considered _valid_ +if it meets additional semantic requirements about its structure, defined below. -A message with a single _selector_ and a custom `:hasCase` function -which allows the message to query for presence of grammatical cases required for each variant: +### Declarations - match {$userName :hasCase} - when vocative {Hello, {$userName :person case=vocative}!} - when accusative {Please welcome {$userName :person case=accusative}!} - when * {Hello!} +A **_declaration_** binds a variable identifier +to the value of an _expression_ within the scope of a _message_. +This local variable can then be used in other _expressions_ within the same _message_. -A message with 2 _selectors_: +```abnf +declaration = let s variable [s] "=" [s] expression +``` - match {$photoCount :number} {$userGender :equals} - when 1 masculine {{$userName} added a new photo to his album.} - when 1 feminine {{$userName} added a new photo to her album.} - when 1 * {{$userName} added a new photo to their album.} - when * masculine {{$userName} added {$photoCount} photos to his album.} - when * feminine {{$userName} added {$photoCount} photos to her album.} - when * * {{$userName} added {$photoCount} photos to their album.} +### Body -### Local Variables +The **_body_** of a message consists of either a _pattern_ or a _match statement_. -A _message_ can define local variables, -such as might be needed for transforming input -or providing additional data to an _expression_. -Local variables appear in a _declaration_, -which defines the value of a named local variable. +### Pattern -A _message_ containing a _declaration_ defining a local variable `$whom` which is then used twice inside the pattern: +A **_pattern_** is a combination of _text_ and _placeholders_ +to be formatted as a unit. All _patterns_, including simple ones, begin with U+007B LEFT CURLY BRACKET `{` +and end with U+007D RIGHT CURLY BRACKET `}`. - let $whom = {$monster :noun case=accusative} - {You see {$quality :adjective article=indefinite accord=$whom} {$whom}!} +A _pattern_ MAY be empty. -A message defining two local variables: -`$itemAcc` and `$countInt`, and using `$countInt` as a selector: +A _pattern_ MAY contain an arbitrary number of _expressions_ to be evaluated +during the formatting process. - let $countInt = {$count :number maximumFractionDigits=0} - let $itemAcc = {$item :noun count=$count case=accusative} - match {$countInt} - when one {You bought {$color :adjective article=indefinite accord=$itemAcc} {$itemAcc}.} - when * {You bought {$countInt} {$color :adjective accord=$itemAcc} {$itemAcc}.} +Whitespace in a _pattern_, including tabs, spaces, and newlines, is significant and +MUST be preserved during formatting. -### Complex Messages +```abnf +pattern = "{" *(text / expression) "}" +``` -The various features can be used to produce arbitrarily complex messages by combining -_declarations_, _selectors_, _functions_, and more. +Embedding a _pattern_ in brackets ensures that simple _messages_ can be embedded into +various formats regardless of the container's whitespace trimming rules. -A complex message with 2 _selectors_ and 3 local variable _declarations_: +> Example. In a Java `.properties` file, the message `hello` has exactly three spaces before and after +> the word "Hello": +> ```properties +> hello = { Hello } +> ``` - let $hostName = {$host :person firstName=long} - let $guestName = {$guest :person firstName=long} - let $guestsOther = {$guestCount :number offset=1} +### Text - match {$host :gender} {$guestOther :number} +**_text_** is the translatable content of a _pattern_. +Any Unicode code point is allowed, +except for surrogate code points U+D800 through U+DFFF. +The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}`. - when female 0 {{$hostName} does not give a party.} - when female 1 {{$hostName} invites {$guestName} to her party.} - when female 2 {{$hostName} invites {$guestName} and one other person to her party.} - when female * {{$hostName} invites {$guestName} and {$guestsOther} other people to her party.} +All code points are preserved. Whitespace in text is significant. - when male 0 {{$hostName} does not give a party.} - when male 1 {{$hostName} invites {$guestName} to his party.} - when male 2 {{$hostName} invites {$guestName} and one other person to his party.} - when male * {{$hostName} invites {$guestName} and {$guestsOther} other people to his party.} +```abnf +text = 1*(text-char / text-escape) +text-char = %x0-5B ; omit \ + / %x5D-7A ; omit { + / %x7C ; omit } + / %x7E-D7FF ; omit surrogates + / %xE000-10FFFF +``` - when * 0 {{$hostName} does not give a party.} - when * 1 {{$hostName} invites {$guestName} to their party.} - when * 2 {{$hostName} invites {$guestName} and one other person to their party.} - when * * {{$hostName} invites {$guestName} and {$guestsOther} other people to their party.} +### Placeholder -## Productions +A **_placeholder_** is another word for an _expression_ that appears inside of a _pattern_ +and which will be replaced during the formatting of the _message_. -The specification defines the following grammar productions. +### Matcher -A message satisfying all rules of the grammar is considered _well-formed_. +A **_matcher_** is a _message_ _body_ that allows the _pattern_ to vary +in content or form depending on values determined at runtime. +A _matcher_ selects a specific _pattern_ from a list of available +_variants_ in a _message_. -Furthermore, a well-formed message is considered _valid_ -if it meets additional semantic requirements about its structure, defined below. +A _matcher_ consists of the keyword `match` followed by at least one _selector_ and +at least one _variant_. -### Message +When the _matcher_ is processed, the result will be a single _pattern_ that serves +as the template for the formatting process. -A **_message_** is a (possibly empty) list of _declarations_ followed by either a single _pattern_, -or a `match` statement followed by one or more _variants_ which represent the translatable body of the message. +A _message_ can only be considered _well-formed_ if the following requirements are satisfied: -A _message_ MUST be delimited with `{` at the start, and `}` at the end. Whitespace MAY -appear outside the delimiters; such whitespace is ignored. No other content is permitted -outside the delimiters. +* The number of _keys_ on each _variant_ MUST be equal to the number of _selectors_. +* At least one _variant_'s MUST exist whose _keys_ are all equal to the catch-all key (`*`). ```abnf -message = [s] *(declaration [s]) body [s] -body = pattern - / (selectors 1*([s] variant)) +matcher = match 1*(selector) 1*(variant) ``` -### Variable Declarations +>A _message_ containing a _selector_: +> +>``` +>match {$count :number} +>when 1 {You have one notification.} +>when * {You have {$count} notifications.} +>``` -A **_declaration_** is an expression binding a variable identifier -within the scope of the message to the value of an expression. -This local variable can then be used in other expressions within the same message. +>A _message_ containing a _selector_ formatted on a single line: +> +>``` +>match {:platform} when windows {Settings} when * {Preferences} +>``` -```abnf -declaration = let s variable [s] "=" [s] expression -``` -### Selectors +### Selector + -A `match` statement contains one or more **_selectors_** -which will be used to choose one of the _variants_ during formatting. +A **_selector_** is an _expression_ that +determines how a given _message_ will select the most appropriate _pattern_. + +There MUST be at least one _selector_ in a _matcher_. +There MAY be any number of additional _selectors_. +An _implementation_ MAY limit the total number of _selectors_: when it does so +it MUST support at least 5 _selectors_ to be considered conformant. +Limiting the number of _selectors_ is NOT RECOMMENDED. ```abnf -selectors = match 1*([s] expression) +selector = expression ``` -> Examples: +>A _matcher_ with a single _selector_ that uses a custom `:hasCase` _function_ +>which allows the _selector_ to choose a _pattern_ based on grammatical case: > -> ``` -> match {$count :plural} -> when 1 {One apple} -> when * {{$count} apples} -> ``` +>``` +>match {$userName :hasCase} +>when vocative {Hello, {$userName :person case=vocative}!} +>when accusative {Please welcome {$userName :person case=accusative}!} +>when * {Hello!} +>``` + +>A _matcher_ with two _selectors_: > -> ``` -> let $frac = {$count: number minFractionDigits=2} -> match {$frac} -> when 1 {One apple} -> when * {{$frac} apples} -> ``` - -### Variants - -A **_variant_** is a keyed _pattern_. -The keys are used to match against the _selectors_ defined in the `match` statement. -The key `*` is a "catch-all" key, matching all selector values. +>``` +>match {$photoCount :number} {$userGender :equals} +>when 1 masculine {{$userName} added a new photo to his album.} +>when 1 feminine {{$userName} added a new photo to her album.} +>when 1 * {{$userName} added a new photo to their album.} +>when * masculine {{$userName} added {$photoCount} photos to his album.} +>when * feminine {{$userName} added {$photoCount} photos to her album.} +>when * * {{$userName} added {$photoCount} photos to their album.} +>``` + +### Variant + +A **_variant_** is a _pattern_ associated with a set of _keys_. +Each _variant_ MUST begin with the keyword `when`, be followed by a sequence of _keys_, +and terminate with a valid _pattern_. +The key `*` is a "catch-all" key, matching all values from a _selector_. +The number of _keys_ in the _variant_ MUST match the number of _selectors_ in the +_matcher_. ```abnf variant = when 1*(s key) [s] pattern -key = literal / "*" +key = literal / "*" ``` -A _well-formed_ message is considered _valid_ if the following requirements are satisfied: - -- The number of keys on each _variant_ MUST be equal to the number of _selectors_. -- At least one _variant's_ keys MUST all be equal to the catch-all key (`*`). - -### Patterns +### Key -A **_pattern_** is a sequence of translatable elements. -Patterns MUST be delimited with `{` at the start, and `}` at the end. -This serves 3 purposes: +A **_key_** is a value in a _variant_ for use by a _selector_ when selecting the _pattern_ +at runtime. +A _key_ can be either a _literal_ value or the catch-all key `*`. -- The message can be unambiguously embeddable in various container formats - regardless of the container's whitespace trimming rules. - E.g. in Java `.properties` files, - `hello = {Hello}` will unambiguously define the `Hello` message without the space in front of it. -- The message can be conveniently embeddable in various programming languages - without the need to escape characters commonly related to strings, e.g. `"` and `'`. - Such need might still occur when a single or double quote is - used in the translatable content. -- The syntax needs to make it as clear as possible which parts of the message body - are translatable and which ones are part of the formatting logic definition. - -```abnf -pattern = "{" *(text / expression) "}" -``` - -> Example: -> -> ``` -> {Hello, world!} -> ``` -Whitespace within a _pattern_ is meaningful and MUST be preserved. +### Expression -### Expressions +An **_expression_** is a part of a _message_ that will be determined +during the _message_'s formatting. -**_Expressions_** MUST start with an _operand_ or an _annotation_. +An _expression_ MUST begin with a U+007B LEFT CURLY BRACKET `{` +and end with a U+007D RIGHT CURLY BRACKET `}`. An _expression_ MUST NOT be empty. +An _expression_ can contain an _operand_, an _annotation_, or an _operand_ followed by +an _annotation_. -An **_operand_** is either a _literal_ or a _variable_. -An _operand_ MAY be optionally followed by an _annotation_. - -An **_annotation_** consists of a _function_ and its named _options_, -or consists of a _reserved_ sequence. - -_Functions_ do not accept any positional arguments -other than the _operand_ in front of them. - -_Functions_ use one of the following prefix sigils: - -- `:` for standalone content -- `+` for starting or opening _expressions_ -- `-` for ending or closing _expressions_ +An _expression_ can appear as the value portion of a _declaration_, as a _selector_, and within a _pattern_. ```abnf expression = "{" [s] ((operand [s annotation]) / annotation) [s] "}" operand = literal / variable annotation = (function *(s option)) / reserved -option = name [s] "=" [s] (literal / variable) ``` + > Expression examples: > > ``` @@ -410,12 +378,85 @@ option = name [s] "=" [s] (literal / variable) > {{+h1 name=above-and-beyond}Above And Beyond{-h1}} > ``` -#### Reserved +### Operand + +An **_operand_** is a _literal_ or a _variable_ to be evaluated in an _expression_. +An _operand_ MAY be optionally followed by an _annotation_. + +### Annotation + +An **_annotation_** consists of either a _function_ plus any optional named _options_, +or it consists of a _reserved_ sequence. + +### Function + +A **_function_** is functionality used to evaluate, format, select, or otherwise +process an _operand_, or, if lacking an _operand_, its _annotation_. + +_Functions_ do not accept any positional arguments +other than the _operand_ in front of them. + +_Functions_ use one of the following prefix sigils: + +- `:` for standalone content +- `+` for starting or opening _expressions_ +- `-` for ending or closing _expressions_ + +```abnf +expression = "{" [s] ((operand [s annotation]) / annotation) [s] "}" +operand = literal / variable +annotation = (function *(s option)) / reserved +option = name [s] "=" [s] (literal / variable) +``` + +A **_function_** is a named modifier in an _expression_. +A _function_ MAY be followed by one or more _options_. + +>For example, a _message_ with a `$date` _variable_ formatted with the `:datetime` _function_: +> +>``` +>{Today is {$date :datetime weekday=long}.} +>``` + +>A _message_ with a `$userName` _variable_ formatted with +>the custom `:person` _function_ capable of +>declension (using either a fixed dictionary, algorithmic declension, ML, etc.): +> +>``` +>{Hello, {$userName :person case=vocative}!} +>``` + +>A _message_ with a `$userObj` _variable_ formatted with +>the custom `:person` _function_ capable of +>plucking the first name from the object representing a person: +> +>``` +>{Hello, {$userObj :person firstName=long}!} +>``` + +_Functions_ can be _standalone_, or can be an _opening element_ or _closing element_. + +A **_standalone_** _function_ is not expected to be paired with another _function_. +An **_opening element_** is a _function_ that SHOULD be paired with a _closing function_. +A **_closing element_** is a _function_ that SHOULD be paired with an _opening function_. -**_Reserved_** annotations start with a reserved character -and are intended for future standardization -as well as private implementation use. -A _reserved_ _annotation_ MAY be empty or contain arbitrary text. +An _opening element_ MAY be present in a message without a corresponding _closing element_, +and vice versa. + +>A message with two markup-like _functions_, `button` and `link`, +>which the runtime can use to construct a document tree structure for a UI framework: +> +>``` +>{{+button}Submit{-button} or {+link}cancel{-link}.} +>``` + +### Reserved + +A **_reserved_** _annotation_ is an _annotation_ whose syntax is reserved +for future standardization. + +A _reserved_ _annotation_ starts with a reserved character. +A _reserved_ _annotation_ MAY be empty or contain arbitrary text after its first character. This allows maximum flexibility in future standardization, as future definitions are expected to define additional semantics and constraints on the contents of these _annotations_. @@ -446,9 +487,7 @@ reserved-char = %x00-08 ; omit HTAB and LF / %xE000-10FFFF ``` -## Tokens -The grammar defines the following tokens for the purpose of the lexical analysis. ### Keywords @@ -461,27 +500,10 @@ match = %x6D.61.74.63.68 ; "match" when = %x77.68.65.6E ; "when" ``` -### Text - -**_text_** is the translatable content of a _pattern_. -Any Unicode code point is allowed, -except for surrogate code points U+D800 through U+DFFF. -The characters `\`, `{`, and `}` MUST be escaped as `\\`, `\{`, and `\}`. - -All code points are preserved. - -```abnf -text = 1*(text-char / text-escape) -text-char = %x0-5B ; omit \ - / %x5D-7A ; omit { - / %x7C ; omit } - / %x7E-D7FF ; omit surrogates - / %xE000-10FFFF -``` - ### Literals -**_Literal_** is used for matching variants and providing input to _expressions_. +A **_literal_** is a discrete value in a _message_. +_Literals_ can appear as _operands_, _keys_, or the value of an _option_. **_Quoted_** literals may include content with any Unicode code point, except for surrogate code points U+D800 through U+DFFF. @@ -510,13 +532,20 @@ unquoted-start = name-start / DIGIT / "." ### Names -The **_name_** token is used for variable names (prefixed with `$`), -function names (prefixed with `:`, `+` or `-`), -as well as option names. -It is based on XML's [Name](https://www.w3.org/TR/xml/#NT-Name), -with the restriction that it MUST NOT start with `:`, -as that would conflict with _function_ start characters. -Otherwise, the set of characters allowed in names is large. +A **_name_** is the identifier of an external variable, declared local variable, +_function_, or _option_. +When used for variable names it is prefixed with `$`. +When used for a _function_ name it is prefixed with `:`, `+` or `-`. +When used for an _option_ the _name_ has no prefix. + +A _name_ MUST start with a `name-start` character, which MAY be followed by additional +`name-char` characters. The permitted characters in these productions are based on XML's +[Name](https://www.w3.org/TR/xml/#NT-Name), with the additional restriction that a _name_ MUST NOT start with `:`. +The character `:` is reserved as an identifier for _functions_ in the syntax. + +> **Note** +> +> Names are sensitive to the character sequence used to encode them. ```abnf variable = "$" name