-
Notifications
You must be signed in to change notification settings - Fork 60
Description
DerivedCombiningClass.txt contains the Canonical_Combining_Class field from UnicodeData.txt (see http://www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values). This field is intended to be used for the collation algorithm.
wcwidth.py is currently assuming that characters are zero width combining characters if and only if they have a non-zero combining class. I think this is an invalid assumption. For example, characters that are enclosing marks (General Category = Me) all have a zero combining class, but they are also zero width combining characters.
I'm not sure what the standard way to determine zero width combining characters is. One possibility is to check for a General Category of Mn or Me, but I don't know if there are any exceptions to this. Also note that there are combining characters that do have a width (category Mc).