Segmentation of combined emojis #42

RazrFalcon · 2018-05-20T16:33:03Z

for c in UnicodeSegmentation::graphemes("🏳️‍🌈", true) {
    println!("{}", c);
}

Outputs:

🏳️‍
🌈

🏳️‍
🌈

But should output:

🏳️‍🌈

🏳️‍🌈

Another example: 👮‍♀.

Is it UnicodeSegmentation bug or am I doing this wrong? For my current task this should be a single "character".

The text was updated successfully, but these errors were encountered:

Manishearth · 2018-05-21T04:18:00Z

We're operating off an old unicode version (9) where that's not in the tables.

https://www.unicode.org/Public/9.0.0/ucd/auxiliary/GraphemeBreakProperty.txt

Filed #43

That may take a while to fix, but it may be worth updating to Unicode 10 in the interim (which is an easier update than 10 to 11), and will also fix your issue.

Manishearth closed this as completed May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Segmentation of combined emojis #42

Segmentation of combined emojis #42

RazrFalcon commented May 20, 2018

Manishearth commented May 21, 2018

Segmentation of combined emojis #42

Segmentation of combined emojis #42

Comments

RazrFalcon commented May 20, 2018

Manishearth commented May 21, 2018