-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Unicode to_lower()
and to_upper
#9363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm interested in working on this. At the moment, it seems like the best option might be changing /src/etc/unicode.py to also read in CaseFolding.txt and add a corresponding table to /libstd/Unicode.rs. Then Does anyone know of a better way to do this, a reason this shouldn't be done, or a better place to put this? Edit: It looks like this data can also be parsed out of UnicodeData.txt, so CaseFolding.txt is unnecessary. |
@aaronlaursen The problem is that case folding depends on the locale. For example, in the Turkish locale Though I personally think we should support at least a best-effort default mapping, but that might not be the general consensus. |
@Kimundi Good point. I hadn't thought of the locale issues... The CaseFolding.txt does contain these local-dependent mappings (at least for the Turkish locale), but it would require a more complicated system for locale detection, etc. I still share your desire for a default mapping. The UnicaodeData.txt has only a single mapping built in and could serve as a sane default... I'm not sure how to go about finding the general consensus on these things... |
Related #9084 #12561 fn to_case_fold(c: char) -> &' static [char]; in the spirit of http://docs.python.org/dev/library/stdtypes.html#str.casefold |
This should live in a |
Shouldn't we move all unicode tables to libunicode then? |
Ideally, yes. On Wed, Feb 26, 2014 at 2:25 PM, Piotr Zolnierek
|
(It's probably more useful/efficient for case folding to yield |
This is now implemented as |
As mentioned by Josh Aas on IRC, rust doesn't seem to have a function to convert Unicode strings to their upper and lower case equivalents, in a similar way to
to_lower()
andto_upper()
for ascii.The text was updated successfully, but these errors were encountered: