-
Notifications
You must be signed in to change notification settings - Fork 308
content: implement support for unicode emojis in messages #245
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajveermalviya! See review comments below.
We'll also want some unit tests for this code before merging it.
lib/model/content.dart
Outdated
|
||
final String text; | ||
final String emojiName; | ||
final String? emojiUnicode; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you describe the situation where this is expected to be null?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, after reading the surrounding code a bit more, it seems like it would be unreachable.
In previous revision, when tryParseEmojiCodeToUnicode
returned null, parser would set emojiUnicode
to null too. And in the widgets layer the emojiName
with the fallback container would be displayed.
But thinking about it a bit more; if tryParseEmojiCodeToUnicode
fails then we should just emit "unimplemented" from the parser rather than just adding a silent workaround. And also it will be pretty rare that tryParseEmojiCodeToUnicode
would fail as it is mostly unreachable because of the regexp check in parser and if it does fail it would be a bug in the server.
lib/model/content.dart
Outdated
@@ -568,7 +570,8 @@ class _ZulipContentParser { | |||
&& classes.length == 2 | |||
&& classes.contains('emoji') | |||
&& classes.every(_emojiClassRegexp.hasMatch)) { | |||
return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode); | |||
final emojiUnicode = tryParseEmojiCodeToUnicode(classes.elementAt(1).replaceFirst('emoji-', '')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This elementAt(1)
implicitly assumes the emoji
class comes first, and the emoji-12345
class after that. But the conditions above it don't ensure that — they might come in the other order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, now uses .firstWhere()
lib/model/content.dart
Outdated
// Ported from https://github.com/zulip/zulip-mobile/tree/main/src/emoji/data.js | ||
// | ||
// Which was in turn ported from https://github.com/zulip/zulip/blob/main/zerver/models.py |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When linking to code on GitHub for reference, use a permalink. Links to main
can break when the file is renamed, or perhaps worse can silently become unhelpful when the relevant code moves elsewhere or otherwise changes.
GitHub has a handy keyboard shortcut y
to get a permalink when you're looking at a page like this.
(I guess we didn't do that in the code you're borrowing from in zulip-mobile, oops. Still, let's fix it here.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
lib/model/content.dart
Outdated
String? tryParseEmojiCodeToUnicode(String code) { | ||
try { | ||
return String.fromCharCodes(code.split('-').map((hex) => int.parse(hex, radix: 16))); | ||
} catch (_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, catch (_)
is a bit of a code smell :-)
Generally if there's some exception we expect to sometimes happen, and doesn't represent a bug in the code, we'll want to be explicit about what we expect and just catch that exception type specifically. That way if some unrelated exception is thrown as a result of a bug, the exception propagates just like it normally would.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@@ -12,6 +12,7 @@ class ZulipApp extends StatelessWidget { | |||
@override | |||
Widget build(BuildContext context) { | |||
final theme = ThemeData( | |||
fontFamilyFallback: const <String>['sans-serif', 'Noto Color Emoji'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you say more about what this line is doing and why?
The commit message says "correctly propogate emoji font fallback", but this doesn't appear to be propagating anything; instead it's introducing something new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, added a comment explaining the reason and moved to a separate commit.
For the commit adding the Noto Color Emoji font to assets, here's a revision for the commit message:
The main difference is just to make it a bit more explicit exactly where these files come from, because the existing URL https://github.com/googlefonts/noto-emoji/releases/tag/v2.038 doesn't provide the individual font file as a download. It does lead to the whole source tree; but there's a lot of files in there (and one wouldn't necessarily expect the font file to be checked into the source tree anyway, as presumably it's the result of building the tree), so it's not obvious from that where to find these files. (And, mentioning this mostly for my own notes: I did just go and download those files myself, and verified that I get the same contents as what's in this PR.) |
3069f62
to
721c58a
Compare
Thanks for the review @gnprice, I have rebased and updated with the suggested changes. |
721c58a
to
b997135
Compare
70be18d
to
cca88c2
Compare
(Just rebased to main, and squashed some commits) |
cca88c2
to
8ef079b
Compare
(rebased again, to fix conflicts with main branch) |
312bcdb
to
19bf5b2
Compare
(rebased to main; with updated commit messages) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajveermalviya for the revision! Comments below.
static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)?$"); | ||
static final _emojiClassRegexp = RegExp(r"^emoji(-[0-9a-f]+)*$"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm interesting, good catch.
This is basically fixing a bug, right? We would have not produced a UnicodeEmojiNode when the emoji had multiple codepoints, and instead produced an UnimplementedNode.
Let's include a test that exercises that case, in the same commit that fixes the bug.
lib/model/content.dart
Outdated
final className = classes | ||
.firstWhere((className) => className.startsWith('emoji-'), orElse: () => '') | ||
.replaceFirst('emoji-', ''); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The replaceFirst
means this isn't really a "class name" anymore. A good name might be emojiCode
— I believe this string is now the same as the Zulip "emoji code" for the emoji.
lib/model/content.dart
Outdated
final className = classes | ||
.firstWhere((className) => className.startsWith('emoji-'), orElse: () => '') | ||
.replaceFirst('emoji-', ''); | ||
if (className.isEmpty) return unimplemented(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the orElse
/ className.isEmpty
case here actually happen?
I think it can't, even with malformed data from the server — that is, I think our own code, and in fact the few lines of code around this spot, ensure that it can't happen. In the condition above, we check that
- there are two classes
- one of them is just
emoji
- the other matches the regexp
And I think any string that matches the regexp and isn't emoji
has to start with emoji-
.
So at this spot we can just assume that we do find a class starting with emoji-
, and barrel ahead using the !
operator or equivalent. If the resulting logic seems unclear, we can cheerfully add an assert
to explicitly verify (but at zero cost when in release mode) that we did find such a class.
lib/model/content.dart
Outdated
@@ -582,7 +584,22 @@ class _ZulipContentParser { | |||
&& classes.length == 2 | |||
&& classes.contains('emoji') | |||
&& classes.every(_emojiClassRegexp.hasMatch)) { | |||
return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode); | |||
// TODO: Check if font contains the glyph for the specific unicode codepoints; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, interesting thought.
I'm not sure we'll actually want to do this, though. The question is: if the answer is that the font doesn't seem to have the emoji, then what do we do instead?
One possible answer would be an UnimplementedNode. But in a UX for general release (i.e. post #194), I think the best option for handling that would likely be to just show the Unicode anyway. And I think this case wouldn't be real informative for #190. It'd tell us that the server has updated to a newer list of emoji than we have in our font, and we should update our emoji font. But updating emoji lists is going to be an annual housekeeping task for both Zulip server/web and us for the foreseeable future, and I think we can find a lower-effort way than this to get reminded of it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the answer is that the font doesn't seem to have the emoji, then what do we do instead?
We could fallback to the previous behavior (before this PR) - showing the emoji name: :wave:
But yeah the emoji list will have to be maintained accordingly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yeah we can't do that because if font doesn't have that emoji that will mean the "emoji list" won't have that new emoji the user typed too. Also if all the projects will update the emoji sets at same time - that will mean server doesn't know about the new emoji too and won't send us an emoji_name.
So, I guess we can't do this - removing the todo.
lib/model/content.dart
Outdated
// eg. unicode v15 set of emojis. Note that in both of these options parsing | ||
// each character of the whole message text will be required to make the | ||
// checks reliable, because emojis can be present in non encoded form in the | ||
// message text. (For example, when server hasn't been updated to account for | ||
// newer unicode versions.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And yeah, if there are emoji in the message as just straight-up Unicode emoji, encoded in the normal Unicode way, then I am definitely happy to pass them right through to the Text
widget and not worry about them.
If they're not in a font we bundled, they may still — or may not — be in a system font on the user's device. In that way they have the same status as if the message sender had used some script we didn't plan for, out of the 100+ scripts in Unicode.
lib/widgets/app.dart
Outdated
@@ -14,6 +14,14 @@ class ZulipApp extends StatelessWidget { | |||
@override | |||
Widget build(BuildContext context) { | |||
final theme = ThemeData( | |||
// This correctly sets up a the font fallback for normal text that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// This correctly sets up a the font fallback for normal text that | |
// This sets up a font fallback for normal text that |
test/model/content_test.dart
Outdated
// "🪿" | ||
'<p>🪿</p>', | ||
const TextNode('\u{1fabf}')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: clearest to read if these either all use the \u
notation, or all have the literal emoji in the source
(either way it's the same string, just two different ways in the source code of expressing it)
test/model/content_test.dart
Outdated
// ":thumbs_up:" | ||
'<p><span aria-label="thumbs up" class="emoji emoji-1f44d" role="img" title="thumbs up">:thumbs_up:</span></p>', | ||
const UnicodeEmojiNode(text: ':thumbs_up:')); | ||
const UnicodeEmojiNode(emojiName: ':thumbs_up:', codepoints: '\u{1f44d}')); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I guess this example shows that the thing labeled emojiName
isn't exactly the name — it's the name (which here is thumbs_up
) plus colons to decorate it.
One good name would be just colonsName
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or perhaps better yet, drop that field entirely, since we're no longer using it. (Easy to add it back later if we start using it again for something.)
That will mean squashing the commit that changes how widgets/content.dart
consumes UnicodeEmojiNode into the commit that changes what those nodes contain and how we parse them. I think squashing those is probably cleanest anyway.
@@ -620,23 +619,6 @@ class UserMention extends StatelessWidget { | |||
// borderRadius: BorderRadius.all(Radius.circular(3)))); | |||
} | |||
|
|||
class MessageUnicodeEmoji extends StatelessWidget { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's squash deleting this into the same commit that makes it no longer needed.
Meanwhile, from the commit message:
This widget displayed the emoji name as a fallback in a bordered
container. But now that we rely on TextSpan to display the emoji
codepoints text, we can't reliably determine the fallback condition
_for now_.
I don't understand the last bit here. What fallback condition are you thinking of; and what would need to change between "for now" and being able to determine it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This commits talks about the possibility to determine if the emojiUnicode is present in a supported emoji list. If not we could fallback to showing this container. See discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm, see comment
lib/model/content.dart
Outdated
// > "unified" in the following data, with "non_qualified" taking | ||
// > precedence when both present: | ||
// > https://github.com/raw/iamcal/emoji-data/master/emoji_pretty.json | ||
String? tryParseEmojiCodeToUnicode(String code) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this puts this function in amongst the functions for parsing block content; it's parsing inline content, so it should instead go amongst the parsing of inline content.
Specifically let's put it above parseInlineContent
. In general this parser class has the more general and more toward-the-root logic at the bottom, and the more specific and more toward-the-leaves logic at the top.
19bf5b2
to
6965a15
Compare
Thanks for the review @gnprice, revision pushed! |
6965a15
to
a67621b
Compare
(and rebased to main) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @rajveermalviya for all your work on this! Looks good with just a couple of nits (below); I'll fix and merge.
Happy to see that it works nicely — here's on my Pixel 5:
before | after |
---|---|
![]() |
![]() |
In fact, it even effectively fixes zulip/zulip#11767 ! We just yesterday merged a fix for that in the server's main branch, but old messages will remain affected. Here in this client implementation, though, because we translate the Unicode emoji right back into Unicode, even the old mojibake messages show up perfectly:
before | after |
---|---|
![]() |
![]() |
That basically comes as a bonus from our decision to use Unicode (and ship an emoji font for consistency across devices) instead of managing these glyphs as images.
lib/model/content.dart
Outdated
@@ -582,7 +604,14 @@ class _ZulipContentParser { | |||
&& classes.length == 2 | |||
&& classes.contains('emoji') | |||
&& classes.every(_emojiClassRegexp.hasMatch)) { | |||
return UnicodeEmojiNode(text: element.text, debugHtmlNode: debugHtmlNode); | |||
final emojiCode = classes | |||
.firstWhereOrNull((className) => className.startsWith('emoji-'))! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simplify slightly:
.firstWhereOrNull((className) => className.startsWith('emoji-'))! | |
.firstWhere((className) => className.startsWith('emoji-')) |
@@ -460,8 +460,7 @@ class _InlineContentBuilder { | |||
return WidgetSpan(alignment: PlaceholderAlignment.middle, | |||
child: UserMention(node: node)); | |||
} else if (node is UnicodeEmojiNode) { | |||
return WidgetSpan(alignment: PlaceholderAlignment.middle, | |||
child: MessageUnicodeEmoji(node: node)); | |||
return TextSpan(text: node.emojiUnicode, recognizer: _recognizer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit in commit message:
- Generate a tappable TextSpan for each unicode emoji
It's true that this will be tappable if it's inside something tappable (i.e., a link). But this line in the commit message sounds like it's saying the span will be tappable in its own way, related somehow to its status as an emoji. Which is a plausible feature — you could imagine tapping (maybe more likely long-tapping, but possibly just tapping) to pull up the name of the emoji in a tooltip, or something like that.
So just s/tappable // clarifies it, I think.
Well, I ran into a small road bump: this PR was causing a test to break, once rebased atop main. The issue is the test's fault, though, not this PR's. Sent #285 to fix that, and meanwhile I'll push my rebased and nits-fixed version of this PR back to the PR branch. |
An emoji with multiple codepoints is encoded as `emoji-xxxxxx-xxxxxx-..` (where xxxxxx corresponds to a hex sequence). Current regexp matches strings with either zero or single occurence of subgroup, changing it to match string with either zero or infinitely repeating subgroup.
- Parse the zulip emoji code class names to unicode codepoints - Generate a TextSpan for each unicode emoji - Remove fallback emoji name container Fixes: zulip#58
OK, and merging! Thanks again @rajveermalviya for the contribution. |
tl;dr: When this Flutter issue is resolved: flutter/flutter#134897 try adding the Noto Color Emoji font back again (COLRv1 format) and set it as a fallback font behind some higher-priority font wherever we want to show text that might contain emojis. Before we do that, we're at the mercy of the system font, which might not be updated to handle newer emojis. This seems especially true of Android devices in the wild. ----- In zulip#245, we added a COLRv1 version of this font, probably because of a warning that made us doubt if other formats would work on iOS: https://github.com/googlefonts/noto-emoji#using-notocoloremoji > NotoColorEmoji uses the CBDT/CBLC color font format, which is > supported by Android and Chrome/Chromium OS. Windows supports it > starting with Windows 10 Anniversary Update in Chrome and Edge. On > macOS, only Chrome supports it, while on Linux it will support it > with some fontconfig tweaking, see issue zulip#36. Currently we do not > build other color font formats. (It seems like that part of the README wasn't updated when the COLRv1 format was added, and hasn't been updated since.) And I guess there wasn't a clear iOS compatibility problem with the COLRv1 format, so we used that. But, in zulip#245, it seems like we didn't do a good manual test that the COLRv1 version of this font could actually handle emojis in the app on iOS. That testing shows that it can't (or at least couldn't on my phone or in a simulator). When the font is in the effective list of fallbacks, it takes responsibility for rendering the emoji, before the system font gets a chance to, but then it fails to render it, and a blank space appears. We suspect this Flutter issue ("Rendering of COLRv1 fonts is broken"): flutter/flutter#134897 We don't know why that Flutter issue doesn't, or at least doesn't always, prevent emojis from successfully rendering in the font on *Android*. I followed the issue's reproduction recipe with the COLRv1 font they specified, and indeed the glyphs weren't showing up on Android (but they did on web). So apparently the specific COLRv1 font is a variable. Since we don't know what all the variables are, and we have other short-term priorities, just take the font out of the picture for now, with a plan to reinstate it once we're sure it won't cause emojis to fail to render. Discussion: https://chat.zulip.org/#narrow/stream/243-mobile-team/topic/Emoji.20rendering.20issue.20.28iOS.29/near/1683965
Fixes: #58
Context
Implement parsing of the unicode emojis encoded in
<span class="emoji emoji-*">
elements and then generating directTextSpan
s that contains the unicode codepoint for emoji and falling back to displayingemoji_name
in case the parsing had failed.This PR also vendors the "Noto Color Emoji" font in COLRv1 format (4.53 MB) that was downloaded from it's latest release.
Alternatives
As mentioned in #58 there are many alternative ways to handle the displaying of emojis. AFAIK using a font is the most efficient way to render emojis in Flutter (even for icons it prefers fonts). Using a font also provides us a way to render the emojis that are in the message content & other strings (topic/stream name). Especially on older servers which may fail to parse the newer emojis and fail to encode them in
<span class="emoji">
elements and rather including the unicode codepoints in the text message content as is. (Albeit we can use the spritesheet method there too, but not without parsing each character of the message content to check for emojis.)Now coming to different possible routes for the loading of the fonts:
google_fonts
that downloads the fonts fromfont.google.com
server and dynamically load the font in the app.google_fonts
ourselves that downloads & caches the preferred emoji font from the realm server (this will probably require that server hosts the font files) and then load the downloaded font in the app usingFontLoader
API.Testing
Updated the current ones.