Using DerivedCombiningClass.txt to determine width is inappropriate

DerivedCombiningClass.txt contains the Canonical_Combining_Class field from UnicodeData.txt (see http://www.unicode.org/reports/tr44/#Canonical_Combining_Class_Values). This field is intended to be used for the collation algorithm.

wcwidth.py is currently assuming that characters are zero width combining characters if and only if they have a non-zero combining class. I think this is an invalid assumption. For example, characters that are enclosing marks (General Category = Me) all have a zero combining class, but they are also zero width combining characters.

I'm not sure what the standard way to determine zero width combining characters is. One possibility is to check for a General Category of Mn or Me, but I don't know if there are any exceptions to this. Also note that there are combining characters that do have a width (category Mc).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Using DerivedCombiningClass.txt to determine width is inappropriate #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Using DerivedCombiningClass.txt to determine width is inappropriate #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions