Skip to content

Add metadata & frontmatter #15

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Dec 13, 2023
Merged

Add metadata & frontmatter #15

merged 1 commit into from
Dec 13, 2023

Conversation

eemeli
Copy link
Owner

@eemeli eemeli commented Nov 30, 2023

Closes #14
CC @zbraniecki and @flodolo, if you've any comments

This adds syntax for @ prefixed key-value metadata, and for --- as a separator between resource-level comments & metadata (aka the "frontmatter") and the resource body. Together, they look like this (using ini highlighting, which mostly works):

@locale en-US
---

one = A message with no properties

@version 3
@since 2023-11-30
two = A message with some properties

# Freeform comments must come before properties
@param $foobar - An input argument
                 with a multiline value
three = Some {$foobar} message

# Metadata also attaches to section-heads
@deprecated
[section]

four = Foo

Like comments, metadata attaches to the "next thing" in the syntax. To allow for resource-level attachment, a --- frontmatter separator is included in the syntax. This allows using the same syntax and concepts for resources as for sections and entries. As detailed in #14, it's also a common pattern used in other formats. While many other formats with frontmatter do not themselves have something like @metadata and so need to pick a separately defined format for their frontmatter content (often YAML), e.g. YAML itself uses %directives in its frontmatter.

Metadata values use the same value construct as message entries, which means that they may be multiline if indented, and their inner syntax (beyond the keyword) will need to be defined separately. The @ prefix is rather intentionally chosen to allow matching Javadoc/JSDoc/TSDoc syntax, which is relatively well known.

It's likely that not all consumers of a resource will care about all or any of the metadata. For example, while something like @version could be important to a system tracking which messages need re-translation, it probably would not have any effect during the message's formatting. On the other hand, a resource-level @locale could end up significant for all consumers.

At this level, the syntax does not differentiate between metadata fields depending on e.g. their relevance to formatting; that should be done separately. One purely mechanical way to allow for some formatting-relevant metadata would be to only support that for the frontmatter. Another alternative would be to introduce another sigil beyond the @ to differentiate such. Or we could define an explicit list of keywords with a formatting impact.

Comments or empty lines are not allowed between metadata lines and the line they're attaching to. This is intentional, and meant to ensure that they stay together.

The id rule needs to get narrowed a bit as a part of this change, as it can't start with ---.

@eemeli eemeli requested a review from stasm November 30, 2023 10:32
@flodolo
Copy link
Collaborator

flodolo commented Nov 30, 2023

The proposal makes sense to me.

My initial reaction was that putting metadata within the comment would be a better approach, since we likely need to display both to users (e.g. in Pontoon). The main complication is figuring out how to manage multiline values. But, on second thoughts, I can see a world where we interpolate metadata via scripts, and having them separate from comments makes things a lot easier.

@eemeli
Copy link
Owner Author

eemeli commented Nov 30, 2023

My thought was that in some contexts (like formatting), the metadata could be effectively ignored just as comments are.

With that sort of a parser, they can skip comments by first recognising one from the first # character, and then skip to the next \n. To skip metadata, that's recognised from the first @, after which we can skip to the next \n, and check if the next character is a space or a tab. If so, it's a metadata continuation line, and we can again skip to the next \n and repeat until we get something different.

So it should be just as easy, and this way metadata lines don't need a double sigil like # @ at their start.

@nordzilla
Copy link

I definitely like and support the idea of having separate tags for comments and attributes.

The distinction in the syntax feels much more readable to me overall.

I also personally don't like it when comments have semantic meaning that is tied to the code/config, with perhaps the exception of ``` code blocks ``` in Rust documentation comments being runnable as tests.

Attributes are a common practice in most languages that I'm familiar with, and I think this solution feels natural.

@zbraniecki
Copy link

I'm ambivalent on this design.

I'm used to think of attributes as part of comments, maybe due to @jsdoc, maybe due to the length of settling with Semantic Comments for Fluent, but I can see the argument @nordzilla made.

In result I feel comfortable supporting this proposal as is.

I looked at the syntax from the error recovery heuristics perspective and it seems like the general level remains the same and multiline attributes do not induce new vectors.

@eemeli
Copy link
Owner Author

eemeli commented Dec 13, 2023

Merging, as this seems like a good next step and lets us start working on actual metadata fields.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Including resource-level metadata in the syntax
4 participants