-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Word-final sigma in str::to_lowercase
#26035
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
CC @aturon, @alexcrichton |
Our current trend of conventions and such leads me to think that we should do the special behavior in |
Untested first attempt: impl str {
pub fn to_lowercase(&self) -> String {
let mut s = String::with_capacity(self.len());
for (i, c) in self[..].char_indices() {
if c == 'Σ' && is_final_sigma(self, i) {
s.push_str("ς")
} else {
s.extend(c.to_lowercase());
}
}
return s;
fn is_final_sigma(s: &str, i: usize) -> bool {
debug_assert!('Σ'.len_utf8() == 2);
s[..i].chars().rev().skip_while(is_case_ignorable).any(is_cased_letter) &&
!s[i + 2..].chars() .skip_while(is_case_ignorable).any(is_cased_letter)
}
}
} … where |
In other words: since there is only one such mapping, I think it makes more sense to hard-code it than to try and generalize |
According to a Greek speaker, not doing this is "Very, very bad. Basically, bad enough that people won’t use it.". |
* Add “complex” mappings to `char::to_lowercase` and `char::to_uppercase`, making them yield sometimes more than on `char`: #25800. `str::to_lowercase` and `str::to_uppercase` are affected as well. * Add `char::to_titlecase`, since it’s the same algorithm (just different data). However this does **not** add `str::to_titlecase`, as that would require UAX#29 Unicode Text Segmentation which we decided not to include in of `std`: rust-lang/rfcs#1054 I made `char::to_titlecase` immediately `#[stable]`, since it’s so similar to `char::to_uppercase` that’s already stable. Let me know if it should be `#[unstable]` for a while. * Add a special case for upper-case Sigma in word-final position in `str::to_lowercase`: #26035. This is the only language-independent conditional mapping currently in `SpecialCasing.txt`. * Stabilize `str::to_lowercase` and `str::to_uppercase`. The `&self -> String` on `str` signature seems straightforward enough, and the only relevant issue I’ve found is #24536 about naming. But `char` already has stable methods with the same name, and deprecating them for a rename doesn’t seem worth it. r? @alexcrichton
This `to_lowercase` approach converts Σ to σ instead of ς in word-final position. See rust-lang/rust#26035.
By design,
str::to_lowercase
andstr::to_uppercase
do not depend on the language of the text (which shouldn’t be assumed to be the same as the locale of the machine running the program).Mostly, this means ignoring the conditional mappings in Unicode’s
SpecialCasing.txt
, with one exception: the greek letter Sigma is Σ in upper-case and σ in lower-case except in word-final position, where it is ς. The corresponding mapping inSpecialCasing.txt
is:With
Final_Sigma
defined in the Unicode standard:(cased letter and other terms have a precise definition given beforehand.)
Since
char::to_lowercase
doesn’t know context, I think it should just return σ for Σ. Butstr::to_lowercase
does have context and could implement this conditional mapping.The text was updated successfully, but these errors were encountered: