Skip to content

Conversation

4-rodrigo-salazar
Copy link
Contributor

@4-rodrigo-salazar 4-rodrigo-salazar commented Mar 5, 2024

libcxx tests which assume the thousands separator for fr_FR locale is x00A0 on Windows currently fail when run on newer versions of Windows (It seems to have been updated to the new correct value of 0x202F around windows 11. The exact windows version where it changed doesn't seem to be documented anywhere).

This PR updates the tests to not hard-code the separator but instead find the host's fr_FR separator to set the right expectation in the tests that assume this. It uses 2 new functions in test/support/locale_helpers.h to do this: get_locale_thousands_sep and get_locale_mon_thousands_sep.

This fixes 5 tests under windows.


Testing notes:

Tested with toolsets:

  • windows llvm-mingw-20231128-ucrt-x86_64 (note, msvcrt has dozens of pre-existing locale failures).
  • msvc 2022

For both toolsets above I see the 5 tests passing now, whereas they were failing previously (and no new failures in overall check-cxx).

  • Also ran check-cxx tests with C++03 on the above windows toolsets.

Individual test commands:

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/time/time.duration/time.duration.nonmember/ostream.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp libcxx/test -v -a

@4-rodrigo-salazar 4-rodrigo-salazar requested a review from a team as a code owner March 5, 2024 07:48
@4-rodrigo-salazar 4-rodrigo-salazar changed the title Handle changing of fr_FR locale thousand sep in Windows libcxx tests [libcxx] Handle changing of fr_FR locale thousand sep in Windows libcxx tests Mar 5, 2024
Copy link

github-actions bot commented Mar 5, 2024

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be
notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write
permissions for the repository. In which case you can instead tag reviewers by
name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review
by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate
is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Mar 5, 2024
@llvmbot
Copy link
Member

llvmbot commented Mar 5, 2024

@llvm/pr-subscribers-libcxx

Author: Rodrigo Salazar (4-rodrigo-salazar)

Changes

libcxx tests which assume the thousands separator for fr_FR locale is x00A0 on windows currently fail when run on newer versions of windows (It seems to have been updated to the new correct value from 0x00A0 to 0x202F around windows 11. The exact windows version where it changed doesn't seem to be documented anywhere.).

This PR updates the tests to not hard-code the separator but instead find the host's fr_FR separator to set the right expectation in the tests that assume this. It uses 2 new functions in test/support/locale_helpers.h to do this: get_locale_thousands_sep and get_locale_mon_thousands_sep.

This fixes 5 tests under windows.


Testing notes:

Tested with toolsets:

  • windows llvm-mingw-20231128-ucrt-x86_64 (note, msvcrt has dozens of pre-existing locale failures).
  • msvc 2022

For both toolsets above I see the 5 tests passing now, whereas they were failing previously (and no new failures in overall check-cxx).

  • Also ran check-cxx tests with C++03 on the above windows toolsets.

Individual test commands:

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/time/time.duration/time.duration.nonmember/ostream.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp libcxx/test -v -a

python ./bin/llvm-lit.py --filter std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp libcxx/test -v -a

Full diff: https://github.com/llvm/llvm-project/pull/83967.diff

6 Files Affected:

  • (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp (+1-1)
  • (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp (+1-1)
  • (modified) libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp (+9-1)
  • (modified) libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp (+9-1)
  • (modified) libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp (+10-4)
  • (modified) libcxx/test/support/locale_helpers.h (+86-3)
diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
index 3effa80e7d6f79..1b594d8c5b488a 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.get/locale.money.get.members/get_long_double_fr_FR.pass.cpp
@@ -54,7 +54,7 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_fr_FR(in);
+  return LocaleHelpers::convert_mon_thousands_sep_fr_FR(in);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
index 05b4ee474944af..02112e06c4a122 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.money.put/locale.money.put.members/put_long_double_fr_FR.pass.cpp
@@ -54,7 +54,7 @@ class my_facetw
 };
 
 static std::wstring convert_thousands_sep(std::wstring const& in) {
-  return LocaleHelpers::convert_thousands_sep_fr_FR(in);
+  return LocaleHelpers::convert_mon_thousands_sep_fr_FR(in);
 }
 #endif // TEST_HAS_NO_WIDE_CHARACTERS
 
diff --git a/libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp b/libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp
index 2a70741d2a0fa6..856598a5c45b66 100644
--- a/libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/category.monetary/locale.moneypunct.byname/thousands_sep.pass.cpp
@@ -29,6 +29,10 @@
 #include "test_macros.h"
 #include "platform_support.h" // locale name macros
 
+#if defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)
+#include "locale_helpers.h"
+#endif
+
 class Fnf
     : public std::moneypunct_byname<char, false>
 {
@@ -115,7 +119,11 @@ int main(int, char**)
 #if defined(_CS_GNU_LIBC_VERSION)
     const wchar_t fr_sep = glibc_version_less_than("2.27") ? L' ' : L'\u202F';
 #elif defined(_WIN32)
-    const wchar_t fr_sep = L'\u00A0';
+    // Windows has changed it's fr thousands sep between releases.
+    // Fetch the host's separator in order to know what to expect from the test results.
+    const std::wstring fr_sep_s = LocaleHelpers::get_locale_mon_thousands_sep(LOCALE_fr_FR_UTF_8);
+    assert(fr_sep_s.size() == 1);
+    const wchar_t fr_sep = fr_sep_s[0];
 #elif defined(_AIX)
     const wchar_t fr_sep = L'\u202F';
 #else
diff --git a/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp b/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
index d7e1178c92e041..0c0b04ab710067 100644
--- a/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
+++ b/libcxx/test/std/localization/locale.categories/facet.numpunct/locale.numpunct.byname/thousands_sep.pass.cpp
@@ -28,6 +28,10 @@
 #include "test_macros.h"
 #include "platform_support.h" // locale name macros
 
+#if defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)
+#include "locale_helpers.h"
+#endif
+
 int main(int, char**)
 {
     {
@@ -78,7 +82,11 @@ int main(int, char**)
 #if defined(_CS_GNU_LIBC_VERSION)
             const wchar_t wsep = glibc_version_less_than("2.27") ? L' ' : L'\u202f';
 #elif defined(_WIN32)
-            const wchar_t wsep = L'\u00A0';
+            // Windows has changed it's fr thousands sep between releases.
+            // Fetch the host's separator in order to know what to expect from the test results.
+            const std::wstring wsep_s = LocaleHelpers::get_locale_mon_thousands_sep(LOCALE_fr_FR_UTF_8);
+            assert(wsep_s.size() == 1);
+            const wchar_t wsep = wsep_s[0];
 #else
             const wchar_t wsep = L',';
 #endif
diff --git a/libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp b/libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp
index e5d11ab4672bdf..db318e3db4e84f 100644
--- a/libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp
+++ b/libcxx/test/std/time/time.duration/time.duration.nonmember/ostream.pass.cpp
@@ -36,6 +36,10 @@
 #include "platform_support.h" // locale name macros
 #include "test_macros.h"
 
+#if defined(_WIN32) && !defined(TEST_HAS_NO_WIDE_CHARACTERS)
+#include "locale_helpers.h"
+#endif
+
 #define SV(S) MAKE_STRING_VIEW(CharT, S)
 
 template <class CharT, class Rep, class Period>
@@ -89,10 +93,12 @@ static void test_values() {
 #endif
   } else {
 #ifdef _WIN32
-    assert(stream_fr_FR_locale<CharT>(-1'000'000s) == SV("-1\u00A0000\u00A0000s"));
-    assert(stream_fr_FR_locale<CharT>(1'000'000s) == SV("1\u00A0000\u00A0000s"));
-    assert(stream_fr_FR_locale<CharT>(-1'000.123456s) == SV("-1\u00A0000,1235s"));
-    assert(stream_fr_FR_locale<CharT>(1'000.123456s) == SV("1\u00A0000,1235s"));
+    std::wstring expected_sep = LocaleHelpers::get_locale_thousands_sep(LOCALE_fr_FR_UTF_8);
+    assert(expected_sep.size() == 1);
+    assert(stream_fr_FR_locale<CharT>(-1'000'000s) == LocaleHelpers::convert_thousands_sep(L"-1 000 000s", expected_sep[0]));
+    assert(stream_fr_FR_locale<CharT>(1'000'000s) == LocaleHelpers::convert_thousands_sep(L"1 000 000s", expected_sep[0]));
+    assert(stream_fr_FR_locale<CharT>(-1'000.123456s) == LocaleHelpers::convert_thousands_sep(L"-1 000,1235s", expected_sep[0]));
+    assert(stream_fr_FR_locale<CharT>(1'000.123456s) == LocaleHelpers::convert_thousands_sep(L"1 000,1235s", expected_sep[0]));
 #elif defined(__APPLE__)
     assert(stream_fr_FR_locale<CharT>(-1'000'000s) == SV("-1000000s"));
     assert(stream_fr_FR_locale<CharT>(1'000'000s) == SV("1000000s"));
diff --git a/libcxx/test/support/locale_helpers.h b/libcxx/test/support/locale_helpers.h
index 3eb24ebf28f524..9b18acc13cbbd8 100644
--- a/libcxx/test/support/locale_helpers.h
+++ b/libcxx/test/support/locale_helpers.h
@@ -41,11 +41,90 @@ std::wstring convert_thousands_sep(std::wstring const& in, wchar_t sep) {
   return out;
 }
 
+#if defined(_WIN32)
+// This implementation is similar to the locale_guard in the private libcxx implementation headers
+// but exists here for usability from the libcxx/test/std conformance test suites.
+class LocaleGuard {
+public:
+  explicit LocaleGuard(const char* locale_in) : status_(_configthreadlocale(_ENABLE_PER_THREAD_LOCALE)) {
+    assert(status_ != -1);
+    // Setting the locale can be expensive even when the locale given is
+    // already the current locale, so do an explicit check to see if the
+    // current locale is already the one we want.
+    const char* curr_locale = set_locale_asserts(nullptr);
+    // If every category is the same, the locale string will simply be the
+    // locale name, otherwise it will be a semicolon-separated string listing
+    // each category.  In the second case, we know at least one category won't
+    // be what we want, so we only have to check the first case.
+    if (std::strcmp(locale_in, curr_locale) != 0) {
+      locale_all_ = _strdup(curr_locale);
+      assert(locale_all_ != nullptr);
+      set_locale_asserts(locale_in);
+    }
+  }
+
+  ~LocaleGuard() {
+    // The CRT documentation doesn't explicitly say, but setlocale() does the
+    // right thing when given a semicolon-separated list of locale settings
+    // for the different categories in the same format as returned by
+    // setlocale(LC_ALL, nullptr).
+    if (locale_all_ != nullptr) {
+      set_locale_asserts(locale_all_);
+      free(locale_all_);
+    }
+    _configthreadlocale(status_);
+  }
+
+private:
+  static const char* set_locale_asserts(const char* locale_in) {
+    const char* new_locale = setlocale(LC_ALL, locale_in);
+    assert(new_locale != nullptr);
+    return new_locale;
+  }
+
+  int status_;
+  char* locale_all_ = nullptr;
+};
+
+template <typename T>
+std::wstring get_locale_lconv_cstr_member(const char* locale, T lconv::*cstr_member) {
+  // Store and later restore current locale
+  LocaleGuard g(locale);
+
+  char* locale_set = setlocale(LC_ALL, locale);
+  assert(locale_set != nullptr);
+  lconv* lc            = localeconv();
+  const char* selected = lc->*cstr_member;
+  if (selected == nullptr) {
+    // member is empty string on the locale
+    return std::wstring();
+  }
+
+  std::size_t len = std::mbsrtowcs(nullptr, &selected, 0, nullptr);
+  assert(len != static_cast<std::size_t>(-1));
+
+  std::wstring ws_out(len, L'\0');
+  std::mbstate_t mb = {};
+  std::size_t ret   = std::mbsrtowcs(&ws_out[0], &selected, len, &mb);
+  assert(ret != static_cast<std::size_t>(-1));
+
+  return ws_out;
+}
+
+std::wstring get_locale_mon_thousands_sep(const char* locale) {
+  return get_locale_lconv_cstr_member(locale, &lconv::mon_thousands_sep);
+}
+
+std::wstring get_locale_thousands_sep(const char* locale) {
+  return get_locale_lconv_cstr_member(locale, &lconv::thousands_sep);
+}
+#endif // _WIN32
+
 // GLIBC 2.27 and newer use U+202F NARROW NO-BREAK SPACE as a thousands separator.
 // This function converts the spaces in string inputs to U+202F if need
 // be. FreeBSD's locale data also uses U+202F, since 2018.
-// Windows uses U+00A0 NO-BREAK SPACE.
-std::wstring convert_thousands_sep_fr_FR(std::wstring const& in) {
+// Windows may use U+00A0 NO-BREAK SPACE or U+0202F NARROW NO-BREAK SPACE.
+std::wstring convert_mon_thousands_sep_fr_FR(std::wstring const& in) {
 #if defined(_CS_GNU_LIBC_VERSION)
   if (glibc_version_less_than("2.27"))
     return in;
@@ -54,7 +133,11 @@ std::wstring convert_thousands_sep_fr_FR(std::wstring const& in) {
 #elif defined(__FreeBSD__)
   return convert_thousands_sep(in, L'\u202F');
 #elif defined(_WIN32)
-  return convert_thousands_sep(in, L'\u00A0');
+  // Windows has changed it's fr thousands sep between releases,
+  // so we find the host's separator instead of hard-coding it.
+  std::wstring fr_sep_s = get_locale_mon_thousands_sep(LOCALE_fr_FR_UTF_8);
+  assert(fr_sep_s.size() == 1);
+  return convert_thousands_sep(in, fr_sep_s[0]);
 #else
   return in;
 #endif

Copy link

github-actions bot commented Mar 5, 2024

✅ With the latest revision this PR passed the C/C++ code formatter.

@4-rodrigo-salazar
Copy link
Contributor Author

Another less complex approach I considered is to simply hardcode both the known separators and have the test assertions check whether either one of them works (that seems not great at all to me).

@4-rodrigo-salazar
Copy link
Contributor Author

4-rodrigo-salazar commented Mar 9, 2024

Perhaps a lit ConfigAction that provides this information about the environment to the tests that need it would be a better approach since it is more decoupled from what actually executes in the test executable itself (perhaps by plumbing the value to a define compilation flag).

I'll revisit this. Converting to draft for now.

@4-rodrigo-salazar 4-rodrigo-salazar marked this pull request as draft March 9, 2024 06:53
@mstorsjo
Copy link
Member

Thanks for looking into this! I've also run into this, and have a halfbaked incomplete fix that I've used locally - see mstorsjo@libcxx-test-win-locale-fr.

This uses std::moneypunct_byname for getting the current thousands separator, which is pretty short and concise. But I guess this somewhat weakens some of the tests, as some tests essentially just end up asserting that f.thousands_sep() == f.thousands_sep() - while your solution to use an independent codepath to fetch the current separator probably retains more test quality.

The idea of checking that the separator is either of the known ones probably would be nice - that possibly helps for some of the tests, but probably not for all of them.

@4-rodrigo-salazar
Copy link
Contributor Author

Thanks!

Here is my work in progress branch which gets the separators provided to tests using a lit substitution 4-rodrigo-salazar@0a1509c

@mstorsjo
Copy link
Member

Another alternative would be to check the version of windows (my guess is also that this changed from windows 10 to 11), but iirc checking the actual version of windows is a bit messy.

@4-rodrigo-salazar
Copy link
Contributor Author

4-rodrigo-salazar commented Mar 12, 2024

I'm thinking we can get rid of most of the cross platform ifdefs using the above approach (by providing the value from lconv through lit config actions), so each of the apple specific, gnu version specific, windows specific checks could be removed at once. Reaching for the values from the localeconv() c api shouldn't take away from the test quality either, as it shouldn't be hitting any of the locale c++ code that is being tested.

EDIT: Posted PR here #86649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants