diff --git a/spec/syntax.md b/spec/syntax.md index 3d0bf06298..29f2100c62 100644 --- a/spec/syntax.md +++ b/spec/syntax.md @@ -6,9 +6,9 @@ 1. [Design Goals](#design-goals) 1. [Design Restrictions](#design-restrictions) 1. [Overview & Examples](#overview--examples) - 1. [Messages](#messages) + 1. [Messages and Patterns](#messages-and-patterns) 1. [Expressions](#expression) - 1. [Formatting Functions](#formatting-functions) + 1. [Formatting Functions](#function) 1. [Selection](#selection) 1. [Local Variables](#local-variables) 1. [Complex Messages](#complex-messages) @@ -90,161 +90,210 @@ The syntax specification takes into account the following design restrictions: ## Overview & Examples -_This section is non-normative._ +### Messages and Patterns -### Messages +A **_message_** is the complete template for a specific message formatting request. -All messages, including simple ones, are enclosed in `{…}` delimiters: +All _messages_ MUST contain a _body_. +The _body_ of a _message_ consists of either a _pattern_ or of _selectors_. +An empty string is not a _well-formed_ _message_. - {Hello, world!} +A _message_ MAY also contain one or more _declarations_ before the _body_. -The same message defined in a `.properties` file: +A **_pattern_** is a sequence of _text_ and _placeholders_ +to be formatted as a unit. +All _patterns_, including simple ones, begin with U+007B LEFT CURLY BRACKET `{` +and end with U+007D RIGHT CURLY BRACKET `}`. -```properties -app.greetings.hello = {Hello, world!} -``` +> A _message_ consisting of a simple _pattern_: +>``` +>{Hello, world!} +>``` -The same message defined inline in JavaScript: +>The same _message_ defined in a `.properties` file: +> +>```properties +>app.greetings.hello = {Hello, world!} +>``` -```js -let hello = new MessageFormat('{Hello, world!}') -hello.format() -``` +>The same _message_ defined inline in JavaScript: +> +>```js +>let hello = new MessageFormat('{Hello, world!}') +>hello.format() +>``` ### Expression -An _expression_ represents a part of a message that will be determined -during the message's formatting. +An **_expression_** is a part of a _message_ that will be determined +during the _message_'s formatting. -An _expression_ always uses `{…}` delimiters. -An _expression_ can appear as a local variable value, as a _selector_, and within a _pattern_. +A **_placeholder_** is an _expression_ that appears inside of a _pattern_ +and which will be replaced during the formatting of the _message_. -A simple _expression_ is a bare variable name: +An _expression_ begins with U+007B LEFT CURLY BRACKET `{` +and ends with U+007D RIGHT CURLY BRACKET `}`. - {Hello, {$userName}!} +An _expression_ can appear as a local variable value, as a _selector_, and within a _pattern_. -### Formatting Functions +> A simple _expression_ containing a variable: +> +>``` +>{Hello, {$userName}!} +>``` -A _function_ is named functionality, possibly with _options_, that format, -process, or operate on an _operand_ which may be either a _literal_ or a _variable_. +### Function -For example, a _message_ with a `$date` _variable_ formatted with the `:datetime` _function_: +A **_function_** is a named modifier in an _expression_. +A _function_ MAY be followed by zero or more _options_ - {Today is {$date :datetime weekday=long}.} +>For example, a _message_ with a `$date` _variable_ formatted with the `:datetime` _function_: +> +>``` +>{Today is {$date :datetime weekday=long}.} +>``` -A _message_ with a `$userName` _variable_ formatted with -the custom `:person` _function_ capable of -declension (using either a fixed dictionary, algorithmic declension, ML, etc.): +>A _message_ with a `$userName` _variable_ formatted with +>the custom `:person` _function_ capable of +>declension (using either a fixed dictionary, algorithmic declension, ML, etc.): +> +>``` +>{Hello, {$userName :person case=vocative}!} +>``` - {Hello, {$userName :person case=vocative}!} +>A _message_ with a `$userObj` _variable_ formatted with +>the custom `:person` _function_ capable of +>plucking the first name from the object representing a person: +> +>``` +>{Hello, {$userObj :person firstName=long}!} +>``` -A _message_ with a `$userObj` _variable_ formatted with -the custom `:person` _function_ capable of -plucking the first name from the object representing a person: +_Functions_ can be _standalone_, or can be an _opening element_ or _closing element_. - {Hello, {$userObj :person firstName=long}!} +A **_standalone_** _function_ is not expected to be paired with another _function_. +An **_opening element_** is a _function_ that SHOULD be paired with a _closing function_. +A **_closing element_** is a _function_ that SHOULD be paired with an _opening function_. -A message with two markup-like _functions_, `button` and `link`, -which the runtime can use to construct a document tree structure for a UI framework: +An _opening element_ MAY be present in a message without a corresponding _closing element_, +and vice versa. - {{+button}Submit{-button} or {+link}cancel{-link}.} +>A message with two markup-like _functions_, `button` and `link`, +>which the runtime can use to construct a document tree structure for a UI framework: +> +>``` +>{{+button}Submit{-button} or {+link}cancel{-link}.} +>``` -An opening element MAY be present in a message without a corresponding closing element, -and vice versa. ### Selection -A _selector_ selects a specific _pattern_ from a list of available _patterns_ +A **_selector_** selects a specific _pattern_ from a list of available _patterns_ in a _message_ based on the value of its _expression_. A message can have multiple selectors. -A message with a single _selector_: - - match {$count :number} - when 1 {You have one notification.} - when * {You have {$count} notifications.} - -A message with a single _selector_ which is an invocation of -a custom function `:platform`, formatted on a single line: - - match {:platform} when windows {Settings} when * {Preferences} - -A message with a single _selector_ and a custom `:hasCase` function -which allows the message to query for presence of grammatical cases required for each variant: +>A message with a single _selector_, `{$count :number}`. `:number` is a built-in function. +> +>``` +>match {$count :number} +>when 1 {You have one notification.} +>when * {You have {$count} notifications.} +>``` - match {$userName :hasCase} - when vocative {Hello, {$userName :person case=vocative}!} - when accusative {Please welcome {$userName :person case=accusative}!} - when * {Hello!} +>A message with a single _selector_ which is an invocation of +>a custom function `:platform`, formatted on a single line: +> +>``` +>match {:platform} when windows {Settings} when * {Preferences} +>``` -A message with 2 _selectors_: +>A message with a single _selector_ and a custom `:hasCase` function +>which allows the message to query for presence of grammatical cases required for each variant: +> +>``` +>match {$userName :hasCase} +>when vocative {Hello, {$userName :person case=vocative}!} +>when accusative {Please welcome {$userName :person case=accusative}!} +>when * {Hello!} +>``` - match {$photoCount :number} {$userGender :equals} - when 1 masculine {{$userName} added a new photo to his album.} - when 1 feminine {{$userName} added a new photo to her album.} - when 1 * {{$userName} added a new photo to their album.} - when * masculine {{$userName} added {$photoCount} photos to his album.} - when * feminine {{$userName} added {$photoCount} photos to her album.} - when * * {{$userName} added {$photoCount} photos to their album.} +>A message with two _selectors_: +> +>``` +>match {$photoCount :number} {$userGender :equals} +>when 1 masculine {{$userName} added a new photo to his album.} +>when 1 feminine {{$userName} added a new photo to her album.} +>when 1 * {{$userName} added a new photo to their album.} +>when * masculine {{$userName} added {$photoCount} photos to his album.} +>when * feminine {{$userName} added {$photoCount} photos to her album.} +>when * * {{$userName} added {$photoCount} photos to their album.} +>``` ### Local Variables -A _message_ can define local variables, -such as might be needed for transforming input +A _message_ can define local variables using a _declaration_. +A local variable might be needed for transforming input or providing additional data to an _expression_. Local variables appear in a _declaration_, which defines the value of a named local variable. -A _message_ containing a _declaration_ defining a local variable `$whom` which is then used twice inside the pattern: - - let $whom = {$monster :noun case=accusative} - {You see {$quality :adjective article=indefinite accord=$whom} {$whom}!} - -A message defining two local variables: -`$itemAcc` and `$countInt`, and using `$countInt` as a selector: +>A _message_ containing a _declaration_ defining a local variable `$whom` +>which is then used twice inside the pattern: +> +>``` +>let $whom = {$monster :noun case=accusative} +>{You see {$quality :adjective article=indefinite accord=$whom} {$whom}!} +>``` - let $countInt = {$count :number maximumFractionDigits=0} - let $itemAcc = {$item :noun count=$count case=accusative} - match {$countInt} - when one {You bought {$color :adjective article=indefinite accord=$itemAcc} {$itemAcc}.} - when * {You bought {$countInt} {$color :adjective accord=$itemAcc} {$itemAcc}.} +>A _message_ defining two local variables: +>`$itemAcc` and `$countInt`, and using `$countInt` as a selector: +> +>``` +>let $countInt = {$count :number maximumFractionDigits=0} +>let $itemAcc = {$item :noun count=$count case=accusative} +>match {$countInt} +>when one {You bought {$color :adjective article=indefinite accord=$itemAcc} {$itemAcc}.} +>when * {You bought {$countInt} {$color :adjective accord=$itemAcc} {$itemAcc}.} +>``` ### Complex Messages -The various features can be used to produce arbitrarily complex messages by combining +The various features can be used to produce arbitrarily complex _messages_ by combining _declarations_, _selectors_, _functions_, and more. -A complex message with 2 _selectors_ and 3 local variable _declarations_: - - let $hostName = {$host :person firstName=long} - let $guestName = {$guest :person firstName=long} - let $guestsOther = {$guestCount :number offset=1} - - match {$host :gender} {$guestOther :number} - - when female 0 {{$hostName} does not give a party.} - when female 1 {{$hostName} invites {$guestName} to her party.} - when female 2 {{$hostName} invites {$guestName} and one other person to her party.} - when female * {{$hostName} invites {$guestName} and {$guestsOther} other people to her party.} - - when male 0 {{$hostName} does not give a party.} - when male 1 {{$hostName} invites {$guestName} to his party.} - when male 2 {{$hostName} invites {$guestName} and one other person to his party.} - when male * {{$hostName} invites {$guestName} and {$guestsOther} other people to his party.} - - when * 0 {{$hostName} does not give a party.} - when * 1 {{$hostName} invites {$guestName} to their party.} - when * 2 {{$hostName} invites {$guestName} and one other person to their party.} - when * * {{$hostName} invites {$guestName} and {$guestsOther} other people to their party.} +>A complex message with 2 _selectors_ and 3 local variable _declarations_: +> +>``` +>let $hostName = {$host :person firstName=long} +>let $guestName = {$guest :person firstName=long} +>let $guestsOther = {$guestCount :number offset=1} +> +>match {$host :gender} {$guestOther :number} +> +>when female 0 {{$hostName} does not give a party.} +>when female 1 {{$hostName} invites {$guestName} to her party.} +>when female 2 {{$hostName} invites {$guestName} and one other person to her party.} +>when female * {{$hostName} invites {$guestName} and {$guestsOther} other people to her party.} +> +>when male 0 {{$hostName} does not give a party.} +>when male 1 {{$hostName} invites {$guestName} to his party.} +>when male 2 {{$hostName} invites {$guestName} and one other person to his party.} +>when male * {{$hostName} invites {$guestName} and {$guestsOther} other people to his party.} +> +>when * 0 {{$hostName} does not give a party.} +>when * 1 {{$hostName} invites {$guestName} to their party.} +>when * 2 {{$hostName} invites {$guestName} and one other person to their party.} +>when * * {{$hostName} invites {$guestName} and {$guestsOther} other people to their party.} +>``` ## Productions The specification defines the following grammar productions. -A message satisfying all rules of the grammar is considered _well-formed_. +A **_well-formed_** message satisifies all of the rules of the grammar. -Furthermore, a well-formed message is considered _valid_ -if it meets additional semantic requirements about its structure, defined below. +A **_valid_** message meets the additional semantic requirements about +the structure and functionality defined below. ### Message @@ -273,8 +322,18 @@ declaration = let s variable [s] "=" [s] expression ### Selectors -A `match` statement contains one or more **_selectors_** -which will be used to choose one of the _variants_ during formatting. +**_Selectors_** are a _match statement_ followed by one or more _variants_. +_Selectors_ provide the ability for a _message_ to use a _pattern_ +that varies in content or form depending on values determined at runtime. + +A **_selector expression_** is an _expression_ that will be used as part +of the selection process. + +A **_match statement_** indicates that the _message_ contains at least one +_variant_ that can potentially be used to format as output. +A _match statement_ MUST begin with the keyword `match`. +A _match statement_ MUST contain one or more _selector expressions_. +A _match statement_ MUST be followed by at least one _variant_. ```abnf selectors = match 1*([s] expression) @@ -297,8 +356,10 @@ selectors = match 1*([s] expression) ### Variants -A **_variant_** is a keyed _pattern_. -The keys are used to match against the _selectors_ defined in the `match` statement. +A **_variant_** is a _pattern_ associated with a set of _keys_. +Each _variant_ MUST begin with the _keyword_ `when`, +be followed by a non-empty sequence of _keys_, +and terminate with a valid _pattern_. The key `*` is a "catch-all" key, matching all selector values. ```abnf @@ -317,6 +378,8 @@ A _well-formed_ message is considered _valid_ if the following requirements are A **_pattern_** is a sequence of translatable elements. Patterns MUST be delimited with `{` at the start, and `}` at the end. +A _pattern_'s contents MAY be empty. +Whitespace within a _pattern_ is meaningful and MUST be preserved. This serves 3 purposes: - The message can be unambiguously embeddable in various container formats @@ -334,25 +397,45 @@ This serves 3 purposes: pattern = "{" *(text / expression) "}" ``` -> Example: -> +> **Example** +> +> A simple _pattern_ containing _text_: > ``` > {Hello, world!} > ``` - -Whitespace within a _pattern_ is meaningful and MUST be preserved. +> +> An empty _pattern_: +> ``` +> {} +> ``` +> +> Some _patterns_ with _expressions_: +> ``` +> {{$foo}} +> {Hello {$user}!} +> {You sent {$count :number maxFractionDigits=0} notifications to {$numFriends :number type=spellout} friends.} +> ``` +> +> A _pattern_ containing three spaces: +> ``` +> { } +> ``` ### Expressions -**_Expressions_** MUST start with an _operand_ or an _annotation_. +An _expression_ can appear as a local variable value, as a _selector_, and within a _pattern_. +The contents of each _expression_ MUST start with an _operand_ or an _annotation_. An _expression_ MUST NOT be empty. -An **_operand_** is either a _literal_ or a _variable_. +An **_operand_** is a _literal_ or a _variable_ to be evaluated in an _expression_. An _operand_ MAY be optionally followed by an _annotation_. An **_annotation_** consists of a _function_ and its named _options_, or consists of a _reserved_ sequence. +A **_function_** is functionality used to evaluate, format, select, or otherwise +process an _operand_, or, if lacking an _operand_, its _annotation_. + _Functions_ do not accept any positional arguments other than the _operand_ in front of them. @@ -455,9 +538,12 @@ private-start = "&" / "^" #### Reserved -**_Reserved_** annotations start with a reserved character -and are intended for future standardization. -A _reserved_ _annotation_ MAY be empty or contain arbitrary text. +A **_reserved_** _annotation_ is an _annotation_ whose syntax is reserved +for future standardization. + +A _reserved_ _annotation_ starts with a reserved character. +A _reserved_ _annotation_ MAY be empty or contain arbitrary text after its first character. + This allows maximum flexibility in future standardization, as future definitions MAY define additional semantics and constraints on the contents of these _annotations_. @@ -490,6 +576,8 @@ The grammar defines the following tokens for the purpose of the lexical analysis ### Keywords +A **_keyword_** is a reserved token that has a unique meaning in the _message_ syntax. + The following three keywords are reserved: `let`, `match`, and `when`. Reserved keywords are always lowercase. @@ -519,19 +607,32 @@ text-char = %x0-5B ; omit \ ### Literals -**_Literal_** is used for matching variants and providing input to _expressions_. - -**_Quoted_** literals may include content with any Unicode code point, +A **_literal_** is a character sequence that appears outside +of _text_ in various parts of a _message_. +A _literal_ can appear in a _declaration_, as a _key_ value, +as an _operand_, or in the value of an _option_. +A _literal_ MAY include any Unicode code point except for surrogate code points U+D800 through U+DFFF. -The characters `\` and `|` MUST be escaped as `\\` and `\|`. -**_Unquoted_** literals have a much more restricted range that +All code points are preserved. + +A **_quoted_** literal begins and ends with U+005E VERTICAL BAR `|`. +The characters `\` and `|` within a _quoted_ literal MUST be +escaped as `\\` and `\|`. + +An **_unquoted_** literal is a _literal_ that does not require the `|` +quotes around it to be distinct from the rest of the _message_ syntax. +An _unquoted_ MAY be used when the content of the _literal_ +contains no whitespace and otherwise matches the `unquoted` production. +Any _unquoted_ literal MAY be _quoted_. +Implementations MUST NOT distinguish between _quoted_ and _unquoted_ literals +that have the same sequence of code points. + +_Unquoted_ literals have a much more restricted range that is intentionally close to the XML's [Nmtoken](https://www.w3.org/TR/xml/#NT-Nmtoken), with the restriction that it MUST NOT start with `-` or `:`, as those would conflict with _function_ start characters. -All code points are preserved. - ```abnf literal = quoted / unquoted @@ -572,8 +673,10 @@ name-char = name-start / DIGIT / "-" / "." / ":" ### Escape Sequences -Escape sequences are introduced by U+005C REVERSE SOLIDUS `\` -and allow the appearance of lexically meaningful characters +An **_escape sequence_** is a two-character sequence starting with +U+005C REVERSE SOLIDUS `\`. + +An _escape sequence_ allows the appearance of lexically meaningful characters in the body of `text`, `quoted`, or `reserved` sequences respectively: ```abnf @@ -587,8 +690,8 @@ backslash = %x5C ; U+005C REVERSE SOLIDUS "\" **_Whitespace_** is defined as tab, carriage return, line feed, or the space character. -Inside _patterns_, -whitespace is part of the translatable content and is recorded and stored verbatim. +Inside _patterns_ and _quoted literals_, +whitespace is part of the content and is recorded and stored verbatim. Whitespace is not significant outside translatable text, except where required by the syntax. ```abnf