-
-
Notifications
You must be signed in to change notification settings - Fork 2.8k
Speed up utf8 decoding #7068
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up utf8 decoding #7068
Conversation
LUT codeconst utf8_first_byte_lens = [256]u3{
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x1F
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x3F
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x5F
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, // 0x7F
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0x9F
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xBF
0, 0, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, // 0xDF
3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, // 0xEF
4, 4, 4, 4, 4, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, // 0xFF
};
pub fn utf8ByteSequenceLength(first_byte: u8) !u3 {
const len = utf8_first_byte_lens[first_byte];
if (len == 0) return error.Utf8InvalidStartByte;
return len;
} |
Cool, you stripped away my name out of this code and lumped everything in a single commit... A true class act. You couldn't really wait for #6390 to be reviewed? |
@data-man can you please explain what happened here? |
I referenced your PR:
I'm sorry if it hurt your feelings... :( |
I don't understand why you have not splitted your PR into several parts.
This comment is not a good reason. That is why I have given the real results.
I hope I have satisfied your request. |
You should've cherry-picked the commits you wanted (or apply the
Because it was not reviewed yet?
🤔 what?
I kept the original naming structure, the idea was to introduce some kind of namespacing a-la |
git is still a dark forest for me. Sorry if I don't know how to do something. |
lol I'm 48 years old, excuse the old man. :) |
Extracted from #6390 and added
isValidCodepoint
andutf8CountCodepoints
.Added
unicode/benchmark.zig
.Results for me (x86_64):