Skip to content

New syntax for meta-data #7

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stasm opened this issue Jan 4, 2017 · 32 comments
Closed

New syntax for meta-data #7

stasm opened this issue Jan 4, 2017 · 32 comments

Comments

@stasm
Copy link
Contributor

stasm commented Jan 4, 2017

Goal

Provide a simple means for defining private meta-data for messages.

Description

Currently, meta-data can be added to messages by using traits. Traits without namespaces are considered private.

brand-name =Firefox
  [gender] masculine

#5 and #6 will simplify traits and we'll need a new way to encode meta-data.

The proposal is to use binary tags attached to the value:

#masculine
brand-name = Firefox

The benefit of the binary approach is that there's usually no need to name the property in question (gender).

Discussion

https://groups.google.com/forum/#!topic/mozilla.tools.l10n/dhWfBXHzuZI

@stasm stasm modified the milestone: 0.2 Jan 19, 2017
@Pike
Copy link
Contributor

Pike commented Jan 19, 2017

One of the big wins of FTL was that everything about the message was in the value part of the syntax.

There is some level of beauty that you start the message with the ID. Right now, only comments break that. For tooling that is a boost.

Also, does the

#masculine vs
# masculine

create challenges in error recovery? The ' ' would also be a typo that would be really hard to debug, as

# masculine
#masculine
bob = Bob

would be totally legal, right?

@zbraniecki
Copy link
Collaborator

I think my vote would go for semantic comments for meta-data.

@stasm
Copy link
Contributor Author

stasm commented Jan 20, 2017

I think we used the # sigil as an example and then I copied it here without realizing we already use # for comments!

I'd love to discuss about semantic comments more. JSDoc-style @param clauses would certainly help tooling. And it might be possible to encode language-specific meta-information in the comments as well (@meta foo?).

@Pike
Copy link
Contributor

Pike commented Jan 25, 2017

@phlax, @ta2-1, does prefixing messages with metadata impact how pootle implements l20n support?

@ta2-1
Copy link

ta2-1 commented Jan 25, 2017

Hi @Pike, thanks for heads up. I'll check.

@ta2-1
Copy link

ta2-1 commented Jan 27, 2017

@Pike, I’d say that we use l20n libraries so as long as that is consistent we’re pretty much unimpacted by syntax changes. I believe that @mathjazz is in the same situation.

@stasm
Copy link
Contributor Author

stasm commented Jan 27, 2017

@Pike suggested that we separate the meta-information from semantic comments. The reason for this is that he sees semantic comments as relating to the toolchain and the process (@param, @rev), while the meta-information is strictly language-specific and private.

He suggested the following syntax:

# The short name of the app. 
brand-name = {
       *[nominative] Firefox
        [genitive] Firefox's
    }

    [masculine, inanimate]

The reason to use the brackets is that it closely resembles the way this information will be used. This in turn improves the copy&paste-ability of the syntax:

has-crashed = { META(brand-name) ->
       *[masculine] { brand-name } has crashed.
    }

Pike also likes the idea that everything defined below the identifier belongs to the message and is editable by the localizer.

@stasm
Copy link
Contributor Author

stasm commented Jan 27, 2017

I like @Pike's proposal and I thinks it's sound. I'd like to hear @zbraniecki's thoughts, of course. I have some reservations, too: I was hoping we could piggy-back on #16 to implement this. Meta-information should be rare enough that maybe it shouldn't get its own syntax. OTOH, it also is what make Fluent and FTL very powerful.

On the note of being rare enough, @Pike and I also discussed about not allowing meta-information on messages which have attributes. Such messages are meant to localize UI widgets and should not carry grammatical information. In fact, maybe we should rename meta-information to grammatical data or something similar.

@zbraniecki
Copy link
Collaborator

He suggested the following syntax:

My initial reaction is that this syntax seems confusing.

Meta-information should be rare enough that maybe it shouldn't get its own syntax.

This is my thinking too. As we said in the beginning - in all our work with L20n/FTL so far, we failed to find another example of the use case beyond gender.
Since we raised this a month ago we still didn't find a single other use case.

For that reason I find the idea of adding a specific syntax excessive. It adds a new source of potential bugs and errors in malformed content, in order to serve a single use case.

It will work, and as we said it's more important how will users retrieve that bit because it'll be way more common, but I'm not sure if we should be adding a whole new data type on Message to serve this individual goal.

On the other hand, functionally, I agree that semantic comments as we're thinking about them are functionally different from meta information like gender.
My brain experiment is that I can't see a reason for a localizer to call for META(rev).

So, I'll probably be reluctantly ok with this proposal, but I have another:

# @rev: 2
# @meta: masculine
brandName = {
 *[nominative] Firefox
  [posessive] Firefoksa
}

caller = { META(brandName) ->
 *[mascline] Foo
  [feminine] Faa
}

I recognize that it doesn't play with @Pike 's "all localizable info below identifier", but I guess I just don't share this concern.

@Pike
Copy link
Contributor

Pike commented Jan 30, 2017

The idea behind keeping the localizer data beneath the ID is one of incremental tool support:

It allows l10n tools to have the most rudimentary support, like we currently do for pontoon. You get a text area, and anything in that area is to be edited by localizers.
With localizer-editable semantic comments, that's a lot more complex.
You can also more easily allow to switch to a text editor if a localizer needs a feature which your tool doesn't support yet.

The other part about using the [] mark-up (to avoid the word syntax) is that [] denotes the option definition and reference for variants. Meta is the same thing in the reverse direction, and there's beauty in keeping [] as an easy to copy-n-paste markup on both source and target of the reference.

I can see us explaining [] as references between messages, and you never need to translate one markup into another.

@stasm
Copy link
Contributor Author

stasm commented Jan 30, 2017

You can also more easily allow to switch to a text editor if a localizer needs a feature which your tool doesn't support yet.

Wouldn't it be easier for a tool to gracefully downgrade to a text editor if the whole message, including the comments can be parsed and serialized?

The other part about using the [] mark-up (to avoid the word syntax) is that [] denotes the option definition and reference for variants. Meta is the same thing in the reverse direction,

I'm still not completely sold on this reverse direction thing. Grammatical information defined as meta-data has little to do with variants, doesn't it?

I can see us explaining [] as references between messages, and you never need to translate one markup into another.

There's some beauty in using [ ] as well as some confusion. When you define a variant of a select-expression with brackets you're saying: match this thing inside. [other] Other means match 'other' and return 'Other'. So, at least for me, the brackets mean match. Here OTOH, the brackets define a piece of grammatical informations and I'm still struggling with this inconsistency.

I don't have any better ideas right now and I see the values of everything previously suggested here. I'm tempted to postpone this issue until a later milestone.

@zbraniecki
Copy link
Collaborator

When you define a variant of a select-expression with brackets you're saying: match this thing inside. [other] Other means match 'other' and return 'Other'. So, at least for me, the brackets mean match. Here OTOH, the brackets define a piece of grammatical informations and I'm still struggling with this inconsistency.

This sentence describes my sentiment very well.

@stasm
Copy link
Contributor Author

stasm commented Jan 30, 2017

I don't want to rush a design decision here. Let's move this out of the scope of 0.2. This means that temporarily the syntax will not give any dedicated way of defining language-specific grammatical data.

(As a workaround, it's still possible to create entirely new local messages containing that data and refer to them, e.g. gender-of-brand-name = masculine. This is not recommended though.)

@stasm stasm removed this from the 0.2 milestone Jan 30, 2017
@Pike
Copy link
Contributor

Pike commented Jan 30, 2017

Do we get a good baseline to, say, ship L20n on Android without coming to a conclusion here?

To the actual conversation, let me try to depict my thinking:

brand.ftl:

brandName = {
    *[nominative] Firefox
     [posessive] Firefoksa
}
[gender] masculine

updates.ftl:

should_restart = { META(brandName) ->
    *[feminine] I would like her { brandName[posessive] } to be restarted
   [masculine] I would like his { brandName[posessive] } to be restarted
}

(omg, butchering some other language's grammar here)

My point is that when I resolve the variants of brandName, I use [] on both sides.

I think it's a good idea for the reverse direction to also use [] on both sides. If not that, but then to use the same mark-up on both sides. The pre-ID comment proposals use different markup on one side compared to the other, and that makes life hard.

@zbraniecki
Copy link
Collaborator

Do we get a good baseline to, say, ship L20n on Android without coming to a conclusion here?

I believe we should reach a solution here before we release L20n on Android.

@stasm
Copy link
Contributor Author

stasm commented Jan 31, 2017

I believe we should reach a solution here before we release L20n on Android.

+1 to that. I just don't want to lower to quality of 0.2 by rushing this decision right now.

@stasm
Copy link
Contributor Author

stasm commented Jan 31, 2017

[gender] masculine

@Pike, did you mean [masculine]?

The pre-ID comment proposals use different markup on one side compared to the other, and that makes life hard.

I see what you mean: in a selector-less list of variants, we also use brackets to define variants and we match them from the outside. In case of variants, however, the symmetry is between the definition and the reference. Both use [key]:

brand-name = {
       *[nominative] Firefox
        [locative] Firefoksie
    }
about = O { brand-name[locative] }

You'll never find yourself trying to match locative in another select-expression.

This is not true for grammatical information. Once it's defined, it's meant to be matched in other select-expressions. If you somehow define brand-name to be feminine, you can then match the gender elsewhere:

has-been-updated = { brand-name } { META(brand-name) ->
       *[masculine] został zaktualizowany.
        [feminine] została zaktualizowany.
    }

Furthermore, you must not assume that you can reference feminine in any other way than by using META. In particular, brand-name[feminine] will break.


...unless it doesn't. What if we used variants for all grammatical information? Variants are private and can be accessed from other messages. Grammatical information will be only added to messages which already may have other grammatical variants. We wouldn't be adding any new syntax. The word "variant" may not be the best one here, but in general, the construct seems to lend itself well to the use-case.

In English:

brand-name = Firefox
about = About { brand-name }
updated = { brand-name } has been updated.

In French:

brand-name = {
       *[nom] Firefox
        [genre] masculin
    }
about = A propos de { brand-name }
updated = { brand-name } { brand-name[genre] ->
       *[masculin] a été mis à jour.
        [féminin] a été mise à jour.
    }

In Polish:

brand-name = {
       *[mianownik] Firefox
        [miejscownik] Firefoksie
        [rodzaj] męski
    }
about = O { brand-name[miescownik] }
updated = { brand-name } { brand-name[rodzaj] ->
       *[męski] został zaktualizowany.
        [żeński] została zaktualizowany.
    }

Semantically, gender isn't a facet of the string value of brand-name but maybe that's okay for now. We can still choose to add an explicit syntax for this later.

@Pike
Copy link
Contributor

Pike commented Jan 31, 2017

Groundhog Day. That's the train of thought that lead us to traits.

@Pike
Copy link
Contributor

Pike commented Jan 31, 2017

One a less snarky note, putting meta data into the variants would

  • allow to return the value to the program (good? bad?)
  • not allow partial matching for something like [masculine, inanimate]

@stasm
Copy link
Contributor Author

stasm commented Jan 31, 2017

Groundhog Day. That's the train of thought that lead us to traits.

Yes, I know. I'm looking for solutions everywhere I can find them :)

@stasm
Copy link
Contributor Author

stasm commented Jan 31, 2017

That's the train of thought that lead us to traits.

Also, I feel like this is related but not accurate. We've always had three types of data: variants, grammatical descriptors and attributes. With traits, we lumped all of them together. Previously (L20n 1.0) descriptors and attributes were expressed with the same syntax. Even earlier (your designs from a long time ago) attributes and variants were together, while descriptors were separate.

I feel like we're going in circles.

@stasm
Copy link
Contributor Author

stasm commented Jan 31, 2017

allow to return the value to the program (good? bad?)

Probably bad, or at least unintended. That would only happen if the meta data variant has the * prefix, right?

not allow partial matching for something like [masculine, inanimate]

That would be possible with nested select-expressions or with list-selectors (#4).

What I really dislike about my proposal is that it forces localizers to find names for the meta-data: gender, animacy, etc. I'd much prefer a solution with binary descriptors, like "masculine". I'll come back to this issue next week and try to get some perspective this week.

@stasm
Copy link
Contributor Author

stasm commented Feb 6, 2017

After a short break the idea of putting the grammatical information into variants seems bad, I admit. Perhaps it was a necessary step back for me to consider other options :)

Over the weekend I did some small-scale user-testing. I presented two FTL files, one in English and another one in Polish to a few friends and asked them to complete the Polish translation. The only thing they knew about FTL beforehand was that translations had unique identifiers. The Polish file also already featured some grammar-sensitive syntax.

After completing the task (which went very well) I asked a few follow-up questions. Below is a bullet-point summary of the conclusions:

  • Content below the identifier belongs to the localizer.
  • Square brackets ([]) mean 'match' and should be followed by a value.
  • Variants are just facets of the same value; don't put extra data there, even if it's also grammatical.
  • The use of * for the default variant is clear.
  • No problem with an additional sigil for grammatical data, especially if it's rare.
  • Big preference towards binary grammatical data which doesn't need an explicit name (like 'gender').
  • Proposed sigil: +. Rationale: = means 'this is the value' and + means 'and this is additional information about it'.
  • Q: What should we call them? A: I don't know.. if they used # then tags. But with + they're more like traits.

Based on that, here is my newest proposal:

[[ English ]]

# A short name of the app.
brand-name = Firefox
about-app = About { brand-name }
has-updated = { brand-name } has been updated.


[[ French ]]

# A short name of the app.
brand-name = Firefox
    +masculin

about-app = À propos de { brand-name }
has-updated = { brand-name ->
       *[+masculin] { brand-name } a été mis à jour.
        [+feminin] { brand-name } a été mise à jour.
    }


[[ Polish ]]

# A short name of the app.
brand-name = {
       *[mianownik] Aurora
        [miejscownik] Aurorze
    } 
    +żeński

about-app = O { brand-name[miejscownik] }
has-updated = { brand-name ->
       *[+męski] { brand-name } został zaktualizowany.
        [+żeński] { brand-name } została zaktualizowana.
    }

@flodolo
Copy link
Contributor

flodolo commented Feb 10, 2017

I wonder if "classes" would be confusing as name for these definitions (e.g. gender).

Based on that, here is my newest proposal:

How do you associate two or more classes to a string?

+masculin
+something_else

vs

+masculin,something_else

There is one thing that I find confusing though:

  • I associate a class to a string by saying +masculine. It's intuitive, I'm adding a class and declare to the world that this is masculine name (but not +masculine).
  • [+masculin] { brand-name } a été mis à jour., on the other hand, sounds counterintuitive. I would expect to be able to use [masculin], since I'm defining the masculine version of this string.

@stasm
Copy link
Contributor Author

stasm commented Feb 13, 2017

(I'm going to use the # sigil in the snippets below, since #28 is close to landing.)

How do you associate two or more classes to a string?

foo = The Foo
    #feminine
    #someting_else

I'd like to think of them as tags, and actually just call them that: tags. I'm sure traits, classes or properties would make sense here too. Given the syntax, I'd like to piggy-back on the fact that people know what hashtags are.

I would expect to be able to use [masculin], since I'm defining the masculine version of this string.

I understand the rationale. I think there are two ways to go forward and they're not mutually exclusive:

has-updated = { TAG(brand-name) ->
       *[masculin] { brand-name } a été mis à jour.
        [feminin] { brand-name } a été mise à jour.
    }
has-updated = { brand-name ->
       *[#masculin] { brand-name } a été mis à jour.
        [#feminin] { brand-name } a été mise à jour.
    }

We could start with the first one and add the second one as syntax sugar later on.

@Pike
Copy link
Contributor

Pike commented Feb 13, 2017

What would

has-updated = { brand-name ->
 *[masculin] { brand-name } a été mis à jour.
   [feminin] { brand-name } a été mise à jour. }

do? I'm concerned that adding two variants with subtle difference would add more confusion than help?

@stasm
Copy link
Contributor Author

stasm commented Feb 13, 2017

It would try to match masculin then feminin against the value of brand-name, fail, and fall back to the variant marked with *.

@stasm stasm added the syntax label Feb 16, 2017
@stasm
Copy link
Contributor Author

stasm commented Feb 23, 2017

After a lot more further thinking: I like @Pike's proposal in #7 (comment) the most. I realized that I don't see a use-case for matching against the values of messages. Doing so would make the translation not portable. If a language has special rules for nouns starting with a vowel, it's much better to match a hashtag vowel than the literal value Aurora. The latter breaks for any other brand name.

stasm added a commit to stasm/fluent that referenced this issue Feb 23, 2017
Tags are binary values attached to messages.  They are language-specific and
can be used to describe grammatical characteristics of the message.

    brand-name = Firefox
        #masculine

    brand-name = Aurora
        #feminine
        #vowel

Tags can be used in select expressions by matching a hashtag name to the
message:

    has-updated = { brand-name ->
        [masculine] …
        [feminine] …
       *[other] …
    }

Tags can only be defined on messages which have a value and don't have any
attributes.
stasm added a commit to stasm/fluent that referenced this issue Feb 23, 2017
Tags are binary values attached to messages.  They are language-specific and
can be used to describe grammatical characteristics of the message.

    brand-name = Firefox
        #masculine

    brand-name = Aurora
        #feminine
        #vowel

Tags can be used in select expressions by matching a hashtag name to the
message:

    has-updated = { brand-name ->
        [masculine] …
        [feminine] …
       *[other] …
    }

Tags can only be defined on messages which have a value and don't have any
attributes.
@zbraniecki
Copy link
Collaborator

Just as a mental check, does it mean that we're 100% sure that we will never want to match against the value?

It seems to me like we won't, but I want us all to think it through explicitly because if implement what :stas is proposing we will never have an intuitive way to do that :)

@stasm
Copy link
Contributor Author

stasm commented Feb 23, 2017

Thanks, @zbraniecki, for asking. If we ever want to change our mind, we can implement a new approach inside of how variant keys match against the selector. If it's a Message, we can first look into its tags and then fall back onto its value. Or we can provide functions that allow the user to be more specific: VALUE(brand-name) or similar.

That said, I doubt that we'll want or need to do that. Famous last words?

@zbraniecki
Copy link
Collaborator

zbraniecki commented Feb 23, 2017

The last proposal is my concern. If we'll end up having a use case, and if that use case will end up being more common than this one, we'll end up having the API that makes the wrong thing easy.

If we'll try to have a smart API (check for tags, check for values), then it sounds like it'll work well.

I assume we won't allow attributes and tags on the same Message, right?

@stasm
Copy link
Contributor Author

stasm commented Feb 23, 2017

The last proposal is my concern. If we'll end up having a use case, and if that use case will end up being more common than this one, we'll end up having the API that makes the wrong thing easy.

I see what you mean. I think we could make the no-syntax variant be a smart one, and then expose VALUE and TAGS helpers. But only if we see a need for that.

If we'll try to have a smart API (check for tags, check for values), then it sounds like it'll work well.

+1

I assume we won't allow attributes and tags on the same Message, right?

Yes, correct. The rationale is that messages with tags are supposed to be interpolated into other messages. If they need to be displayed in the UI which requires an attribute, a new message can be created for that purpose and it can reference the message with tags.

stasm added a commit to stasm/fluent that referenced this issue Feb 23, 2017
Tags are binary values attached to messages.  They are language-specific and
can be used to describe grammatical characteristics of the message.

    brand-name = Firefox
        #masculine

    brand-name = Aurora
        #feminine
        #vowel

Tags can be used in select expressions by matching a hashtag name to the
message:

    has-updated = { brand-name ->
        [masculine] …
        [feminine] …
       *[other] …
    }

Tags can only be defined on messages which have a value and don't have any
attributes.
stasm added a commit to stasm/fluent that referenced this issue Feb 24, 2017
Tags are binary values attached to messages.  They are language-specific and
can be used to describe grammatical characteristics of the message.

    brand-name = Firefox
        #masculine

    brand-name = Aurora
        #feminine
        #vowel

Tags can be used in select expressions by matching a hashtag name to the
message:

    has-updated = { brand-name ->
            [masculine] …
            [feminine] …
           *[other] …
        }

Tags can only be defined on messages which have a value and don't have any
attributes.
@stasm stasm closed this as completed in f3a29f6 Feb 24, 2017
@stasm stasm mentioned this issue Oct 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants