Skip to content

[Proposal] Use ICU message for i18n & l10n #23863

Open
@wxiaoguang

Description

@wxiaoguang
Contributor

To avoid re-inventing wheels, it's better to use ICU message to do i18n/l10n.

Steps:

  1. Fix the buggy ini package
  2. Clean up all translation strings
  3. Introduce ICU message parser
  4. Convert legacy plural-related strings to ICU format
  5. Translate on Crowdin https://support.crowdin.com/icu-message-syntax/

Below is outdated description: the old idea is using a customized message format (it's a simple syntax like ICU message, but it's not supported by Crowdin, so Crowdin can't help to check mistakes).

The official package's design seems clear and will resolve Gitea's i18n/l10n problems fundamentally.

https://pkg.go.dev/golang.org/x/text/message

https://pkg.go.dev/golang.org/x/text/feature/plural

https://github.com/unicode-org/cldr/blob/main/common/supplemental/ordinals.xml

https://github.com/unicode-org/cldr/blob/main/common/supplemental/plurals.xml

I think a translator-friendly syntax is very important, because there are really a lot of broken translations, if we make the system more complex, there will be more errors.

And the syntax should be also designed for frontend (JS/Vue).

As the first step, we should refactor the locale package to make it stable, see the problems

A brief idea about how to maintain the translation strings:

<!-- 1: other -->  {%d $[text]}

<!-- 2: one,other --> {%d $[text,texts]}

<!-- 3: zero,one,other --> {%d $zero[0,1,o]}
<!-- 3: one,two,other --> {%d $two[1,2,o]}
<!-- 3: one,few,other --> {%d $few[1,f,o]}
<!-- 3: one,many,other --> {%d $many[1,m,o]}

<!-- 4: one,two,few,other --> {%d $two-few[1,2,f,o]}
<!-- 4: one,two,many,other --> {%d $two-many[1,2,m,o]}
<!-- 4: one,few,many,other --> {%d $few-many[1,f,m,o]}

<!-- 5: one,two,few,many,other --> {%d $[1,2,f,m,o]}

<!-- 6: zero,one,two,few,many,other --> {%d $[0,1,2,f,m,o]}

Then use the syntax to support different languages:

en: msg = there are {%d $[pull request, pull requests]}
lv: msg = there are {%d $zero[for 0 pull request, pull request, pull requests]}
ar: msg = there are {%d $[for 0, for 1, for 2, few, many, other]}

Another possible approach, define all concepts ahead:

en: NumPR = {%d $[pull request, pull requests]}
lv: NumPR = {%d $zero[for 0 pull request, pull request, pull requests]}
ar: NumPR = {%d $[for 0, for 1, for 2, few, many, other]}

Then the NumPR could be reused:

en: msg = there are {$NumPR}
lv: msg = there are {$NumPR}
ar: msg = there are {$NumPR}

If we only need to support one %d, the syntax might be simplified, eg:

en: msg = there are %d $[pull request, pull requests]
lv: msg = there are %d $zero[for 0 pull request, pull request, pull requests]
ar: msg = there are %d $[for 0, for 1, for 2, few, many, other]

Activity

added
type/featureCompletely new functionality. Can only be merged if feature freeze is not active.
type/proposalThe new feature has not been accepted yet but needs to be discussed first.
on Apr 1, 2023
changed the title [-][Proposal] Use golang's x/text package for i18n & l10n[/-] [+][Proposal] Use ICU message for i18n & l10n[/+] on Apr 6, 2023
lunny

lunny commented on Apr 28, 2023

@lunny
Member

Are there any tool to convert ini format to that ICU format? Or should we create one?

wxiaoguang

wxiaoguang commented on Apr 28, 2023

@wxiaoguang
ContributorAuthor

I didn't get your mean.

ICU is a just message format, no need to convert

lunny

lunny commented on Apr 28, 2023

@lunny
Member

Maybe we should use another format but ini files?

wxiaoguang

wxiaoguang commented on Apr 28, 2023

@wxiaoguang
ContributorAuthor

Why?

silverwind

silverwind commented on Apr 28, 2023

@silverwind
Member

YAML may be ok as it requires less escaping than INI. But one also needs to be aware of it's pitfalls, like no becoming boolean false because it is a typed language which ini isn't.

wxiaoguang

wxiaoguang commented on Apr 28, 2023

@wxiaoguang
ContributorAuthor

At the moment I don't see real benefit that YAML would bring.

Actually we do not need too much "escaping" with INI, there are just some legacy bugs.

The only "escaping" requirements are:

  1. The comment , YAML still needs to escape / quote #
  2. The leading/trailing space: YAML still needs to quote it by "
  3. Multiple-line support: YAML's syntax is not as simple as INI

I think INI still wins.

silverwind

silverwind commented on Jun 2, 2023

@silverwind
Member

Found another use case where {placeholder} syntax would have been really useful:

https://github.com/go-gitea/gitea/pull/25050/files#r1214691116

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    modifies/translationtype/featureCompletely new functionality. Can only be merged if feature freeze is not active.type/proposalThe new feature has not been accepted yet but needs to be discussed first.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @lunny@silverwind@wxiaoguang@yardenshoham

        Issue actions

          [Proposal] Use ICU message for i18n & l10n · Issue #23863 · go-gitea/gitea