Skip to content

locale.windows_locale: Incorrect Windows locale for Cambodian #123853

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
seanbudd opened this issue Sep 9, 2024 · 22 comments
Open

locale.windows_locale: Incorrect Windows locale for Cambodian #123853

seanbudd opened this issue Sep 9, 2024 · 22 comments
Labels
OS-windows type-bug An unexpected behavior, bug, or error

Comments

@seanbudd
Copy link

seanbudd commented Sep 9, 2024

Bug report

Bug description:

According to the Windows spec, the locale identifier for Cambodian (0x0453/1107) should be "km-KH"

Sources:

Currently locale.windows_locale[1107] == "kh_KH" incorrectly.
https://github.com/python/cpython/blob/3.12/Lib/locale.py#L1596

It is possible this mistake is from an older version of the protocol, but using the MS reference, the current mapping in Windows has been the case since the earliest recorded spec from 8/8/2013.

If this issue is accepted I am happy to make a small PR to adjust this value.

CPython versions tested on:

3.11, 3.12, 3.13, CPython main branch

Operating systems tested on:

Windows

Linked PRs

@seanbudd seanbudd added the type-bug An unexpected behavior, bug, or error label Sep 9, 2024
@rruuaanng

This comment was marked as outdated.

@seanbudd

This comment was marked as outdated.

@rruuaanng

This comment was marked as outdated.

@seanbudd

This comment was marked as outdated.

@rruuaanng

This comment was marked as outdated.

@seanbudd

This comment was marked as outdated.

@rruuaanng

This comment was marked as outdated.

@seanbudd
Copy link
Author

seanbudd commented Sep 9, 2024

@rruuaanng - how did you fix it? how did you test it?

@Eclips4
Copy link
Member

Eclips4 commented Sep 9, 2024

@rruuaanng If this issue is valid (should be confirmed from locale experts, cc @malemburg), we need to send a PR with a fix, applying changes locally affects only your local git repository.

rruuaanng added a commit to rruuaanng/cpython that referenced this issue Sep 9, 2024
@rruuaanng
Copy link
Contributor

rruuaanng commented Sep 9, 2024

@rruuaanng - how did you fix it? how did you test it?

I checked the results of the windows_locale.get(int(code, 0)) statement to see if it gets the corrected value now.

windows_locale dictionary isn't used anywhere else, so I think fixing the source of the error should be enough.

@rruuaanng

This comment was marked as outdated.

@hugovk

This comment was marked as outdated.

@rruuaanng

This comment was marked as outdated.

@vstinner vstinner changed the title Incorrect Windows locale for Cambodian locale.getdefaultlocale(): Incorrect Windows locale for Cambodian Sep 9, 2024
@vstinner
Copy link
Member

vstinner commented Sep 9, 2024

I changed my locale to Khmer. Python gives me:

>>> locale.getdefaultlocale()
('km_KH', 'cp1252')

The windows_locale dictionary is not used (so changing it would have no effect), since the underlying _locale function returns a string and not a code starting with 0x:

>>> import _locale; _locale._getdefaultlocale()
('km_KH', 'cp1252')

So the string 'km_KH' comes directly from Windows GetLocaleInfoA() function.


The windows_locale dictionary is only used by the locale.getdefaultlocale() function and this function is deprecated: it will be removed in Python 3.15. Why do you use locale.getdefaultlocale() instead of locale.setlocale(locale.LC_CTYPE, '') or locale.getlocale()? You can also use locale.getencoding() to get the locale encoding.

>>> locale.setlocale(locale.LC_CTYPE, '')
'Khmer_Cambodia.1252'

@seanbudd
Copy link
Author

seanbudd commented Sep 9, 2024

@vstinner - I'm not sure I understand the point you are trying to make in regards to deprecation. We use locale.windows_locale directly, is that deprecated too (or never officially supported)?

@vstinner
Copy link
Member

vstinner commented Sep 9, 2024

We use locale.windows_locale directly

Ah. This dictionary is not documented. How do you use it? Do you have an example?

@serhiy-storchaka
Copy link
Member

I spent half a day today updating windows_locale to the latest official data, only to find that it is not being used. 😦 🤦‍♂️

@rruuaanng
Copy link
Contributor

rruuaanng commented Sep 9, 2024

We use locale.windows_locale directly

Ah. This dictionary is not documented. How do you use it? Do you have an example?

This means that no changes to the code are required, right?

@rruuaanng
Copy link
Contributor

rruuaanng commented Sep 9, 2024

In getdefaultlocale func, Yes! I found this:

    import warnings
    warnings._deprecated(
        "locale.getdefaultlocale",
        "{name!r} is deprecated and slated for removal in Python {remove}. "
        "Use setlocale(), getencoding() and getlocale() instead.",
        remove=(3, 15))

@seanbudd
Copy link
Author

seanbudd commented Sep 9, 2024

@vstinner

Ah. This dictionary is not documented. How do you use it? Do you have an example?

Does this mean it's not part of the supported API? Will it be removed when getdefaultlocale is removed?
We use the dictionary for converting windows LCIDs to language strings.
For example - we need to determine the language code for SAPI5 synthesizers using the language attribute.
We also use it when getting language information using the UIA accessibility API for text UIA_CultureAttributeId.
There's several other similar cases like this when using the Windows API where we need to convert LCIDs to language codes.
We could always create and maintain this dictionary ourselves but its not ideal. Alternatively there is also LCIDToLocaleName, so dropping this dictionary is not a show stopper.

@vstinner
Copy link
Member

Does this mean it's not part of the supported API?

It means that you're in the gray area, maybe it's supposed, maybe not :-)

Will it be removed when getdefaultlocale is removed?

Good question. I didn't know that windows_locale was used directly. Maybe it should go through a regular PEP 387 deprecation first if we want to remove it.

@serhiy-storchaka: Maybe it's worth it to update windows_locale, since apparently, it's being used.

@hugovk hugovk changed the title locale.getdefaultlocale(): Incorrect Windows locale for Cambodian locale.windows_locale: Incorrect Windows locale for Cambodian Sep 10, 2024
@serhiy-storchaka
Copy link
Member

There is a problem -- name of some Windows locales is incompatible with gettext format. For example, "sr-Latn-RS" on Windows and "sr_RS@latin" on Linux. locale.setlocale() raises an error for "sr_RS@latin" on Windows, but if you set the "sr-Latn-RS" locale, locale.getlocale() will raises an error as it unable to parse it. So what should we use? Other example -- "ca-ES-valencia" on Windows and "ca_ES@valencia" on Linux.

The current table ignores modifiers, and _locale._getdefaultlocale() ignores them too, but this is wrong.

serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 10, 2024
Update the table of Windows language code identifiers (LCIDs) to
protocol version 16.0 (4/23/2024).
serhiy-storchaka added a commit to serhiy-storchaka/cpython that referenced this issue Sep 10, 2024
Update the table of Windows language code identifiers (LCIDs) to
protocol version 16.0 (4/23/2024).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
OS-windows type-bug An unexpected behavior, bug, or error
Projects
Development

No branches or pull requests

6 participants