Char::to_{lower,upper}case
should return Option<&'static str>
instead of char
#20333
Labels
A-collections
Area: `std::collections`
These two methods should not be stabilized as-is. They should be changed to return a variable number of code points (between one and three), per Unicode’s
SpecialCasing.txt
.Such results could be represented as
&'static str
or&'static [char]
slices of a static table in libunicode. The former avoid re-encoding to UTF-8 when accumulating results in aString
. To avoid having an entry in that table for every one of the 1114111 code points, the return type could be anOption
, whereNone
means that the code point is unchanged by the mapping. (This is by large the common case.) Or it could be a new special-purpose type likeenum CaseMappingResult { Unchanged, MappedTo(&'static str) }
.Since the
Char
methods become less convenient to use, there should bestr::to_{lower,upper}case() -> String
wrappers.SpecialCasing.txt
also defines some language-sensitive mappings for Turkish and Lithuanian, but I suggest not including them, for a few reasons:Using the system’s locale is a very bad idea. Programs behaving differently on different systems is a source of countless bugs, and the system’s locale may not even be that of the end users (e.g for server-side software.)
Forcing users to specify a language is counter-productive since it might often end up being hard-coded to English or something. There should be a default.
Users who do care about language-specific tailoring may want to do more anyway.
SpecialCasing.txt
says:Finally, there are conditional mappings that depend on the context of surrounding code points, but not on the language. They could be special cases in the
str
methods, but I don’t know if it’s worth the bother since there is currently only one such special case. (Greek capital sigma at the end of a word.)More background on Unicode case mappings:
http://unicode.org/faq/casemap_charprop.html
http://www.unicode.org/reports/tr44/tr44-14.html#Casemapping
CC @huonw, @aturon
The text was updated successfully, but these errors were encountered: