-
Notifications
You must be signed in to change notification settings - Fork 45
Semantic Comments Proposal: String Versions #141
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
We have 41 translations for the new string: 16 unchanged, 25 changed. |
I've always been a fan of this idea, but with time I've started wondering what's the problem we're trying to solve, and if it's worth the effort. How common is case 2? It's hard to tell, but my feeling is that it's not very common, at least not as common as 3. Probably the largest change of this type was removing periods from strings in preferences, but I consider that an exception caused by lack of copy/UX reviews accumulating over years. Hopefully that should not happen these days. We know developers hate having to change string IDs, because it requires code changes. But most of those cases fall into 3, and that wouldn't improve. So, we're probably not going to make developers much happier. Supporting string revisions also requires non trivial tooling change:
|
My perception of the status-quo is that we have a versioning scheme, which relies on developers making decisions on behalf of 100+ other teams. Right now they're given two options, and I don't see adding a third option is going to make that decision easier for them. Instead, I'm afraid that they'll use that revisioning option in cases that are clearly major, but they'd have punted the decision from themselves to 100 localizers. That would create a bad UX on Nightly, and possibly a worse UX on Beta/Release in a cross-channel environment. Now, we can hope that we can educate developers to not do that. Just I don't have high hopes :-/ Then this would be a small fraction of our localization work, as flod said. At that point, this feature is only going to be as good as its implementation on Pontoon. VCS sync, dashboards, editing support, if I have a rev on a complex string, can I highlight if it's in label or tooltip? How do I keep track of rev 3 and 4? Getting this right will be a significant amount of work. Looking at 2018 in pontoon, I think that 2019 is already pretty booked. |
That's a great question. Being able to answer it would help us evaluate the value of this proposal.
I tried to address that in my initial statement. I don't see you referring this claim to that comment of mine, so I'll reiterate it.
This is not my experience. This is much closer to the (in)famous 5 Monkeys and a Banana Experiment, which is a bad project culture with negative impact on how people see l10n overall.
I understand that concern and I don't treat it lightly. I believe we can mitigate that risk with the use of tooling, and the worst case scenario seems analogous to the current state when the developer doesn't update the social contract ID when they change it.
Do you have any data on that, or do you have an idea how could we collect data on this? I'm also suggesting that a change would trigger a shift in the culture of treating strings, with the hopes of increasing the tier2 cases over time.
Those are valid questions that need to be answered, but they seem to be answerable and solvable. I'd like to avoid using implementation questions that can be solved as a reason to drop an idea that can bring value. |
Noting that "safely" is very subjective, and I might have messed up numbers along the way (I was adding data while translating), here's what I got. Total strings: 444 Safely tagged = old string can be still used, without introducing errors or showing a string with a meaning too far from the new value. |
This is part of the series of proposals spanning out of the meta #16.
String Versions
All localization systems have to facilitate the string changes as part of the project life cycle. While adding and removing strings is fairly well understood and covered by Fluent based on the l10n-id model, string updates are more complicated.
We identified three states of invalidation that can happen to a message:
At the moment at Mozilla we support (1) and (3). For (2), we will usually lean onto (3) and if the change is really minor, we'll put it on (1).
Limitations
That model works quite well, but has several limitations:
1) Any change to the message, even if the message does not lose its meaning, invalidates all translations.
That means that en-US change of tone requires l10n-drivers to decide if we want to invalidate the work of 100 people to inform them about the en-US-specific update?
2) If we deem the change small enough, we have no way to inform localizers of an "optional" update
In case we decide to go with (1) for that particular change, we have no way to communicate to localizers that there's anything to look at.
Solution
Semantic comments create an opportunity to shift that and separate out (2) as soft-fuzzy mode. It would be only applicable for cases where string change is subtle enough that the old message remains valid for production, but allow the localizers to learn about the update and consider updating their translation.
This fits quite well into the feature scope of #139 because it doesn't affect runtime, and in practice it mostly allows us to separate out (1) from (2).
But I believe that this feature can have a more subtle impact on Fluent ecosystem by nurturing the culture of thinking about the social contract. Instead of a culture where developers perceive every change to the string as requiring ID update, developers would be evaluating their changes to the social contract with localizers.
In most cases they'd inflate the ID understanding why are they doing it, while at the same time being incentivized to minimize the changes to copy in order to preserve the social contract and work of the 100 localizers.
It is my hope that the latter will also increase the value of the Fluent system by making it better at salvaging useful translations.
Case study
To illustrate the latter, I'm going to present an example. Two weeks ago we landed this change:
This change is useful and goes being fixing a spelling in the source locale, thus clearly not qualifying for (1). On the other hand, while updating the string will likely be useful for many locales, many others will probably not have to update their translation in result of this change since subject in such a sentence is implicit, or already not present at all.
For example, in polish the exact translation would be "History of browsing and file downloads" and hence no change is required.
But today, we had to update the ID and in result invalidate all localizations of the message, because otherwise we would have no way to notify those who do want to update their translation about the change. It means that either all 100 of the localizers update their string in time, or users will see the message in the en-US locale, just because we wanted to flag this string as potentially worth updates.
With semantic comments and string versions, such a change would look like this:
In result of it, all 100 localizations of this message would remain valid, but localizers would be notified in their toolchain that their translation of
history-remember-option
is outdated (implicitly set@rev 1
). They then would have a choice - either just mark it as valid (and apply@rev 2
in their localization), or fine tune their translation.The end result is that we were able to notify the localizers, preserve the translations, and minimize the friction.
p.s. I think an interesting exercise would be to see how many locales changed the string in result of this, and for how many the invalidation was just a friction in the system.
The text was updated successfully, but these errors were encountered: