-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Investigate possible performance wins with TextEncoder#encodeInto
#1313
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I put a slightly polished restatement of my previous comment onto MDN. It would be nice to copy and paste the code resulting from fixing this issue into the "Examples" section of the MDN article. |
Oh thanks for the link @hsivonen! |
Note that another alternative that might be worth considering and measuring is calculating UTF-8 byte length upfront. I've already added usage of let size = arg.length;
for (let i = 0; i < arg.length; i++) {
let code = arg.charCodeAt(i);
if (code > 0x7f) size++;
if (code > 0x7ff) size++;
if (code >= 0xD800 && code <= 0xDBFF) i++; // high surrogate
} After that, we can allocate the right size right away. |
Gecko used to do this for its internal UTF-16 to UTF-8 conversions, but doing imprecise allocations was better for performance. (With jemalloc, allocations are imprecise anyway. With a precise allocator, it might be more interesting to attempt to do precise allocations.) |
Does WASM in Rust use jemalloc? AFAIK Rust switched from it to the system allocator by default, but now that I think about it, I'm not sure which allocator is considered "system" and included in the WASM target. Although maybe most people use |
The wasm target does not use jemalloc, it uses a port of dlmalloc. |
Good to know. So, point stands - probably worth optimising for it and/or wee_alloc if we do try to take allocator characteristics into account. Alternatively, I'd rely on bucket allocators to already round up any allocations to the bucket size and then |
Instead of doubling the size on each iteration, use precise upper limit (3 * JS length) if the string turned out not to be ASCII-only. This results in maximum of 1 reallocation instead of O(log N). Some dummy examples of what this would change: - 1000 of ASCII chars: no change, allocates 1000 bytes and bails out. - 1000 ASCII chars + 1 '😃': before allocated 1000 bytes and reallocated to 2000; now allocates 1000 bytes and reallocates to 1006. - 1000 of '😃' chars: before allocated 1000 bytes, reallocated to 2000, finally reallocated again to 4000; now allocates 1000 bytes and reallocates to 4000 right away. Related issue: rustwasm#1313
This was done originally in #1414 and we can always follow up with further improvements if necessary! |
There's some discussion starting here about how we can probably improve the current logic of using
encodeInto
through some more clever usage and possibly some magic numbers. We should take a look into this! Ideally we'd also take a look at actual performance numbers when doing so.The text was updated successfully, but these errors were encountered: