From 8933d221623acdeec6f8283e091a5237c66988b2 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Tue, 29 Aug 2023 16:00:14 +0100 Subject: [PATCH 01/11] Add design doc for formatted parts --- exploration/0003-formatted-parts.md | 187 ++++++++++++++++++++++++++++ 1 file changed, 187 insertions(+) create mode 100644 exploration/0003-formatted-parts.md diff --git a/exploration/0003-formatted-parts.md b/exploration/0003-formatted-parts.md new file mode 100644 index 0000000000..4b4cf13b95 --- /dev/null +++ b/exploration/0003-formatted-parts.md @@ -0,0 +1,187 @@ +# Formatted Parts + +Status: **Proposed** + +
+ Metadata +
+
Contributors
+
@eemeli
+
First proposed
+
2023-08-29
+
Pull Request
+
#000
+
+
+ +## Objective + +Messages often include structures and values +that are not necessarily best represented by a plain concatenated string. + +It would be useful to define a formatted-parts target for MessageFormat 2. + +## Background + +Past examples have shown us that if we don't provide a formatter to parts, +the string output will be re-parsed and re-processed by users. + +## Use-Cases + +- Markup elements +- Non-string values +- Message post-processors + +## Requirements + +- Define an iterable sequence of formatted part objects. +- Include metadata for each part, such as type, source, direction, and locale. +- Allow the representation of non-string values. +- Allow the representation of values that consist of an iterable sequence of formatted parts. +- Be able to represent each resolved value of a pattern with any number of formatted parts, including none. +- Define the formatted parts in a manner that allows synonymous but appropriate implementations in different programming languages. + +## Constraints + +- The JS Intl formatters already include formatted-parts representations for each supported data type. + The JS implementation of the MF2 formatted-parts representation should be able to match their structure, + at least as far as that's possible and appropriate. + +## Proposed Design + +The formatted-parts API is included in the spec as an optional but recommended formatting target. + +The shape of the formatted-parts output is defined in a manner similar to the data model, +which includes TypeScript, JSON Schema, and XML DTD definitions of the same data structure. + +At the top level, the formatted-parts result is an iterable sequence of parts. +Parts corresponding to each _text_ can be simpler than those of _expressions_, +as they do not have a `source` other than their `value`, +or set any of the other possible metadata fields. + +```ts +type MessageParts = Iterable< + MessageTextPart | MessageExpressionPart | MessageBiDiIsolationPart +> + +interface MessageTextPart { + type: "text" + value: string +} +``` + +For MessageExpressionPart, the `source` corresponds to the expression's fallback value. +The parts' `dir` and `locale` may be defined if overridden by an expression attribute or set otherwise. +Each part should have at most one of `value` or `parts` defined; +some may have none. + +```ts +interface MessageExpressionPart { + type: string + source: string + parts?: Iterable<{ type: string; value: unknown }> + value?: unknown + dir?: "ltr" | "rtl" | "auto" + locale?: string +} +``` + +The bidi isolation strategies included in the spec may require +the insertion of MessageBiDiIsolationParts in the formatted-parts output. + +```ts +interface MessageBiDiIsolationPart { + type: "bidiIsolation" + value: "\u2066" | "\u2067" | "\u2068" | "\u2069" // LRI | RLI | FSI | PDI +} +``` + +Some of the MessageExpressionPart instances may be further defined +without reference to the function registry. + +Unannotated expressions with a _literal_ operand +are represented by MessageStringPart. +As with MessageTextPart, +the `value` of MessageStringPart is always a string. + +```ts +interface MessageStringPart { + type: "string" + source: string + value: string + dir?: "ltr" | "rtl" | "auto" + locale?: string +} +``` + +Unannotated expressions with a _variable_ operand +that is not formatted according to its type +are represented by MessageUnknownPart. + +```ts +interface MessageUnknownPart { + type: "unknown" + source: string + value: unknown +} +``` + +When the resolution or formatting of a placeholder fails, +it is represented in the output by MessageFallbackPart. +No `value` is provided; when formatting to a string, +the part's representation would be `'{' + source + '}'`. + +```ts +interface MessageFallbackPart { + type: "fallback" + source: string +} +``` + +Formatting functions defined in the registry +will need accompanying formatted-parts representations. +Where available, such a formatted value should itself be represented by `parts` +rather than a unitary string `value`. +These sub-parts should not need fields beyond their `type` and `value`, +and in most cases it's presumed that the sub-part `value` would be a string. + +```ts +interface MessageDateTimePart { + type: "datetime" + source: string + parts: Iterable<{ type: string; value: unknown }> + dir?: "ltr" | "rtl" | "auto" + locale?: string +} + +interface MessageNumberPart { + type: "number" + source: string + parts: Iterable<{ type: string; value: unknown }> + dir?: "ltr" | "rtl" | "auto" + locale?: string +} +``` + +## Alternatives Considered + +### Not Defining a Formatted-Parts Output + +Leave it to implementations. +They will each come up with something a bit different, +but each will mostly work. + +They will not be interoperable, though. + +### Different Parts Shapes + +See issue #41 for details. + +They can be considered as precursors of the current proposal, +into which they've developed due to evolutionary pressure. + +### Annotated String Output + +Format to a string, but separately define metadata or other values. + +This gets really clunky for parts that are not reasonably stringifiable. From e973b3b94226510fbfcd4e90864fb846cd36658b Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" <41898282+github-actions[bot]@users.noreply.github.com> Date: Tue, 29 Aug 2023 15:00:45 +0000 Subject: [PATCH 02/11] style: Apply Prettier --- exploration/0003-formatted-parts.md | 62 ++++++++++++++--------------- 1 file changed, 31 insertions(+), 31 deletions(-) diff --git a/exploration/0003-formatted-parts.md b/exploration/0003-formatted-parts.md index 4b4cf13b95..44f7c8c6a3 100644 --- a/exploration/0003-formatted-parts.md +++ b/exploration/0003-formatted-parts.md @@ -62,11 +62,11 @@ or set any of the other possible metadata fields. ```ts type MessageParts = Iterable< MessageTextPart | MessageExpressionPart | MessageBiDiIsolationPart -> +>; interface MessageTextPart { - type: "text" - value: string + type: "text"; + value: string; } ``` @@ -77,12 +77,12 @@ some may have none. ```ts interface MessageExpressionPart { - type: string - source: string - parts?: Iterable<{ type: string; value: unknown }> - value?: unknown - dir?: "ltr" | "rtl" | "auto" - locale?: string + type: string; + source: string; + parts?: Iterable<{ type: string; value: unknown }>; + value?: unknown; + dir?: "ltr" | "rtl" | "auto"; + locale?: string; } ``` @@ -91,8 +91,8 @@ the insertion of MessageBiDiIsolationParts in the formatted-parts output. ```ts interface MessageBiDiIsolationPart { - type: "bidiIsolation" - value: "\u2066" | "\u2067" | "\u2068" | "\u2069" // LRI | RLI | FSI | PDI + type: "bidiIsolation"; + value: "\u2066" | "\u2067" | "\u2068" | "\u2069"; // LRI | RLI | FSI | PDI } ``` @@ -106,11 +106,11 @@ the `value` of MessageStringPart is always a string. ```ts interface MessageStringPart { - type: "string" - source: string - value: string - dir?: "ltr" | "rtl" | "auto" - locale?: string + type: "string"; + source: string; + value: string; + dir?: "ltr" | "rtl" | "auto"; + locale?: string; } ``` @@ -120,9 +120,9 @@ are represented by MessageUnknownPart. ```ts interface MessageUnknownPart { - type: "unknown" - source: string - value: unknown + type: "unknown"; + source: string; + value: unknown; } ``` @@ -133,8 +133,8 @@ the part's representation would be `'{' + source + '}'`. ```ts interface MessageFallbackPart { - type: "fallback" - source: string + type: "fallback"; + source: string; } ``` @@ -147,19 +147,19 @@ and in most cases it's presumed that the sub-part `value` would be a string. ```ts interface MessageDateTimePart { - type: "datetime" - source: string - parts: Iterable<{ type: string; value: unknown }> - dir?: "ltr" | "rtl" | "auto" - locale?: string + type: "datetime"; + source: string; + parts: Iterable<{ type: string; value: unknown }>; + dir?: "ltr" | "rtl" | "auto"; + locale?: string; } interface MessageNumberPart { - type: "number" - source: string - parts: Iterable<{ type: string; value: unknown }> - dir?: "ltr" | "rtl" | "auto" - locale?: string + type: "number"; + source: string; + parts: Iterable<{ type: string; value: unknown }>; + dir?: "ltr" | "rtl" | "auto"; + locale?: string; } ``` From a3e6b8c9cbbc9b67e64af5d87c813cd4b7c21a3f Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Thu, 31 Aug 2023 14:52:03 +0300 Subject: [PATCH 03/11] Apply suggestions from code review Co-authored-by: Addison Phillips --- exploration/0003-formatted-parts.md | 30 +++++++++++++++++++++++------ 1 file changed, 24 insertions(+), 6 deletions(-) diff --git a/exploration/0003-formatted-parts.md b/exploration/0003-formatted-parts.md index 44f7c8c6a3..51e8cc5066 100644 --- a/exploration/0003-formatted-parts.md +++ b/exploration/0003-formatted-parts.md @@ -16,10 +16,12 @@ Status: **Proposed** ## Objective -Messages often include structures and values -that are not necessarily best represented by a plain concatenated string. +Messages often include placeholders that, +when formatted, contain internal structure ("parts") +that the caller might want access to +for the purposes of styling, presentation, or manipulation. -It would be useful to define a formatted-parts target for MessageFormat 2. +This proposal defines a formatted-parts target for MessageFormat 2. ## Background @@ -31,6 +33,15 @@ the string output will be re-parsed and re-processed by users. - Markup elements - Non-string values - Message post-processors +- Decoration of placeholder interior parts. + For example, identifying the separate fields in these two currency values + (notice that the symbol, number, and fraction fields + are not in the same order and that the separator has been omitted): + ![image](https://github.com/unicode-org/message-format-wg/assets/69082/cb68c87f-9c0c-4bc6-b9a0-b1f97b2b789a) + ![image](https://github.com/unicode-org/message-format-wg/assets/69082/aedd4e66-7d47-4026-8b93-4ba061bb4d84) +- Supplying bidirectional isolation of placeholders, + such as by using HTML's `span` element with a `dir` attribute + based on the direction of the placeholder. ## Requirements @@ -71,7 +82,10 @@ interface MessageTextPart { ``` For MessageExpressionPart, the `source` corresponds to the expression's fallback value. -The parts' `dir` and `locale` may be defined if overridden by an expression attribute or set otherwise. +The `dir` and `locale` attributes of a part may be inherited from the message +or from the operand (if present), +or overridden by an expression attribute or formatting function, +or otherwise set by the implementation. Each part should have at most one of `value` or `parts` defined; some may have none. @@ -115,7 +129,8 @@ interface MessageStringPart { ``` Unannotated expressions with a _variable_ operand -that is not formatted according to its type +whose type is not recognized by the implementation +or for which no default formatter is available are represented by MessageUnknownPart. ```ts @@ -139,7 +154,10 @@ interface MessageFallbackPart { ``` Formatting functions defined in the registry -will need accompanying formatted-parts representations. +Each function defined in the registry MUST define its "formatted-parts" representation. +A function can define either a unitary string `value` or a `parts` representation. +Where possible, a function SHOULD provide a `parts` representation +if its output might reasonably consist of multiple fields. Where available, such a formatted value should itself be represented by `parts` rather than a unitary string `value`. These sub-parts should not need fields beyond their `type` and `value`, From ac84d25be0a25102469909d493517e6d42040dea Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Wed, 1 Nov 2023 01:52:59 +0200 Subject: [PATCH 04/11] Apply suggestions from code review Co-authored-by: Tim Chevalier --- exploration/0003-formatted-parts.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/exploration/0003-formatted-parts.md b/exploration/0003-formatted-parts.md index 51e8cc5066..10a1f049af 100644 --- a/exploration/0003-formatted-parts.md +++ b/exploration/0003-formatted-parts.md @@ -17,9 +17,13 @@ Status: **Proposed** ## Objective Messages often include placeholders that, -when formatted, contain internal structure ("parts") -that the caller might want access to -for the purposes of styling, presentation, or manipulation. +when formatted, contain internal structure ("parts"). +Preserving this structure in a formatted message +may be helpful to the caller, +who can then manipulate the parts. +For example, a caller may want to style or present +messages with the same content differently +if those messages have different internal structure. This proposal defines a formatted-parts target for MessageFormat 2. @@ -27,6 +31,9 @@ This proposal defines a formatted-parts target for MessageFormat 2. Past examples have shown us that if we don't provide a formatter to parts, the string output will be re-parsed and re-processed by users. +Recent examples of web browsers needing to account for such user behaviour are available from +[June 2022](https://github.com/WebKit/WebKit/commit/1dc01f753d89a85ee19df8e8bd75f4aece80c594) and +[November 2022](https://bugs.chromium.org/p/v8/issues/detail?id=13494). ## Use-Cases From 0ff1a33e2941c247f438cb8aa777d809b2ce7df8 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Mon, 13 Nov 2023 13:20:55 +0200 Subject: [PATCH 05/11] Rename exploration/0003-formatted-parts.md -> exploration/formatted-parts.md --- exploration/{0003-formatted-parts.md => formatted-parts.md} | 0 1 file changed, 0 insertions(+), 0 deletions(-) rename exploration/{0003-formatted-parts.md => formatted-parts.md} (100%) diff --git a/exploration/0003-formatted-parts.md b/exploration/formatted-parts.md similarity index 100% rename from exploration/0003-formatted-parts.md rename to exploration/formatted-parts.md From 2d96e77495afa8b31fcf139f6cf5a7ab69100f94 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Mon, 20 Nov 2023 17:25:55 +0200 Subject: [PATCH 06/11] Apply suggestions from code review Co-authored-by: Addison Phillips --- exploration/formatted-parts.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index 10a1f049af..58fd11bccf 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -10,7 +10,7 @@ Status: **Proposed**
First proposed
2023-08-29
Pull Request
-
#000
+
#463
@@ -100,7 +100,7 @@ some may have none. interface MessageExpressionPart { type: string; source: string; - parts?: Iterable<{ type: string; value: unknown }>; + parts?: Iterable<{ type: string; value: unknown; source?: string }>; value?: unknown; dir?: "ltr" | "rtl" | "auto"; locale?: string; @@ -167,8 +167,8 @@ Where possible, a function SHOULD provide a `parts` representation if its output might reasonably consist of multiple fields. Where available, such a formatted value should itself be represented by `parts` rather than a unitary string `value`. -These sub-parts should not need fields beyond their `type` and `value`, -and in most cases it's presumed that the sub-part `value` would be a string. +In most cases, these sub-parts should not need fields beyond their `type` and a string `value`, +Where necessary, other `value` types may be used and other fields such as a `source` included. ```ts interface MessageDateTimePart { From dbd626aac756a6a36cb74d1ef7366253417b4592 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Mon, 27 Nov 2023 19:15:30 +0200 Subject: [PATCH 07/11] Add "Registry definition of formatted parts" section MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Co-authored-by: Stanisław Małolepszy --- exploration/formatted-parts.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index 58fd11bccf..792fd46771 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -160,7 +160,8 @@ interface MessageFallbackPart { } ``` -Formatting functions defined in the registry +### Registry definition of formatted parts + Each function defined in the registry MUST define its "formatted-parts" representation. A function can define either a unitary string `value` or a `parts` representation. Where possible, a function SHOULD provide a `parts` representation From a8d966f83ca02b118e412b87abee70994a842734 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Mon, 27 Nov 2023 19:19:03 +0200 Subject: [PATCH 08/11] Drop extraneous sentence --- exploration/formatted-parts.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index 792fd46771..e9d0d87cc7 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -166,8 +166,6 @@ Each function defined in the registry MUST define its "formatted-parts" represen A function can define either a unitary string `value` or a `parts` representation. Where possible, a function SHOULD provide a `parts` representation if its output might reasonably consist of multiple fields. -Where available, such a formatted value should itself be represented by `parts` -rather than a unitary string `value`. In most cases, these sub-parts should not need fields beyond their `type` and a string `value`, Where necessary, other `value` types may be used and other fields such as a `source` included. From 1645aeafeebb0e46b076499f988bd6347d288777 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Wed, 29 Nov 2023 13:57:15 +0200 Subject: [PATCH 09/11] Add MessageSingleValuePart & MessageMultiValuePart definitions; other minor fixes --- exploration/formatted-parts.md | 29 ++++++++++++++++++++++++----- 1 file changed, 24 insertions(+), 5 deletions(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index e9d0d87cc7..727da5851a 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -10,7 +10,7 @@ Status: **Proposed**
First proposed
2023-08-29
Pull Request
-
#463
+
#463
@@ -97,13 +97,24 @@ Each part should have at most one of `value` or `parts` defined; some may have none. ```ts -interface MessageExpressionPart { - type: string; +type MessageExpressionPart = + | MessageSingleValuePart + | MessageMultiValuePart; + +interface MessageSingleValuePart { + type: T; source: string; - parts?: Iterable<{ type: string; value: unknown; source?: string }>; - value?: unknown; dir?: "ltr" | "rtl" | "auto"; locale?: string; + value?: V; +} + +interface MessageMultiValuePart { + type: T; + source: string; + dir?: "ltr" | "rtl" | "auto"; + locale?: string; + parts: Iterable<{ type: string; value: V; source?: string }>; } ``` @@ -127,6 +138,7 @@ the `value` of MessageStringPart is always a string. ```ts interface MessageStringPart { + // MessageSingleValuePart<"string", string> type: "string"; source: string; value: string; @@ -142,6 +154,7 @@ are represented by MessageUnknownPart. ```ts interface MessageUnknownPart { + // MessageSingleValuePart<"unknown", unknown> type: "unknown"; source: string; value: unknown; @@ -155,6 +168,7 @@ the part's representation would be `'{' + source + '}'`. ```ts interface MessageFallbackPart { + // MessageSingleValuePart<"fallback", never> type: "fallback"; source: string; } @@ -169,8 +183,12 @@ if its output might reasonably consist of multiple fields. In most cases, these sub-parts should not need fields beyond their `type` and a string `value`, Where necessary, other `value` types may be used and other fields such as a `source` included. +For example, `:datetime` and `:number` formatters could use the following formatted-parts representations. +In many implementations, these could be further narrowed to only use `string` values. + ```ts interface MessageDateTimePart { + // MessageMultiValuePart<"datetime", unknown> type: "datetime"; source: string; parts: Iterable<{ type: string; value: unknown }>; @@ -179,6 +197,7 @@ interface MessageDateTimePart { } interface MessageNumberPart { + // MessageMultiValuePart<"number", unknown> type: "number"; source: string; parts: Iterable<{ type: string; value: unknown }>; From 0b5debd7146544cfb364367e77f5428c5de5beb9 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Fri, 1 Dec 2023 12:17:42 +0200 Subject: [PATCH 10/11] Add note about custom expression fields --- exploration/formatted-parts.md | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index 727da5851a..93fc9864f7 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -180,8 +180,10 @@ Each function defined in the registry MUST define its "formatted-parts" represen A function can define either a unitary string `value` or a `parts` representation. Where possible, a function SHOULD provide a `parts` representation if its output might reasonably consist of multiple fields. -In most cases, these sub-parts should not need fields beyond their `type` and a string `value`, -Where necessary, other `value` types may be used and other fields such as a `source` included. +In most cases, these sub-parts should not need fields beyond their `type` and a string `value`. +Where necessary, other `value` types may be used +and other fields such as a `source` included in the sub-parts, +and additional fields included in the `MessageExpressionPart`. For example, `:datetime` and `:number` formatters could use the following formatted-parts representations. In many implementations, these could be further narrowed to only use `string` values. From 9c59af945d3a45e5e922bb15d4bb01f6b9f899b6 Mon Sep 17 00:00:00 2001 From: Eemeli Aro Date: Fri, 1 Dec 2023 12:18:51 +0200 Subject: [PATCH 11/11] Fix typo --- exploration/formatted-parts.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/exploration/formatted-parts.md b/exploration/formatted-parts.md index 93fc9864f7..bd13c11841 100644 --- a/exploration/formatted-parts.md +++ b/exploration/formatted-parts.md @@ -183,7 +183,7 @@ if its output might reasonably consist of multiple fields. In most cases, these sub-parts should not need fields beyond their `type` and a string `value`. Where necessary, other `value` types may be used and other fields such as a `source` included in the sub-parts, -and additional fields included in the `MessageExpressionPart`. +and additional fields may be included in the `MessageExpressionPart`. For example, `:datetime` and `:number` formatters could use the following formatted-parts representations. In many implementations, these could be further narrowed to only use `string` values.