-
Notifications
You must be signed in to change notification settings - Fork 787
[OptimizeForJS] Optimize 64-bit div by constant #4055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I am not sure I understand this. It uses code from V8 to optimize constant division at compile time - but wouldn't V8 and other engines do it at run time anyhow? Is there a benefit to doing it in the tools vs the VM? |
Yes, v8 uses this (btw only for 32-bits for now). The problem is that 64-bit division in JS (ASM.JS) turns into a rather expensive |
Fuzzed until |
I've updated the description. I hope the motivation is clearer now |
Oh, thanks, I missed that this was for JS specifically. Yes, that seems like it could be useful. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this does add a lot of complexity. But the speedup is significant as you said.
Do you have an idea of whether this happens in real-world code? Or is the worry theoretical for now?
To be clear, I think I can be convinced this is worth the complexity, but I'm not sure yet. |
Dividing by a constant is a fairly common operation. Perhaps even more frequent than the more common operation when the divisor is a more complex expression. The question, of course, is how often in real code we work with 64-bit values and use asmjs target for old browsers for example. I think it depends a lot on the application. Usually it is cryptography or compression/decompression algorithms. Also division is always involved in de-serializations (conv a number into a string). |
The reasoning makes sense. I don't like the complexity, but given the large speed I don't object. If there was something in the middle - much simpler, but not as fast - I'd prefer that, but I can't think of anything. |
@@ -21,7 +21,7 @@ function asmFunc(env) { | |||
} | |||
|
|||
function legalstub$2($0, $1, $2, $3, $4, $5) { | |||
return ($4 | 0) == ($0 - $2 | 0) & ($5 | 0) == ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0); | |||
return ($0 - $2 | 0) == ($4 | 0) & ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0) == ($5 | 0); | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you know why this changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also added extra optimization pass after optimize-for-js:
https://github.com/WebAssembly/binaryen/pull/4055/files#diff-503ab0ba884eaf72c574346fa45a6875959b6b1b3c90787a7d3ce4d7f6b464f6R348
this necessary due to this rule skip some cases (paths) which can be handled by "optimize-instructions" pass
(i64.div_u | ||
(local.get $x) | ||
(i64.const 0) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't make sense to me. Division by zero should trap, so why is it turned into 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because in JavaScript it's valid and dones't trap. Basically I keep the same behaviour which already exists. You can check that __wasm_i64_udiv(10, 10, 0, 0)
will return 0
and i64toi32_i32$HIGH_BITS
also will be zero. The same, for 32-bits division: (10 / 0) | 0
-> 0
In another PR the discussion suggested that JS opts are not a priority and we should focus on wasm ones, and on getting AS to use the wasm build. I wanted to check if we are in agreement in that? If so then I think this PR is an example of something that I think may not be worth focusing on, given the complexity, and the low benefit if we switch AS to use wasm anyhow. |
Optimize only for division. (Opts for reminder will be in separate PR later).
JavaScript does not have 64-bit integer types (except BigInt which is not portable and slow), so most operations such as division must be replaced by emulated routine of the corresponding operation. In this case it is
__wasm_i64_udiv
and__wasm_i64_sdiv
which do quite a lot of work and contain loops. They are called even for such cases when the divisor is a constant (including zero). We can avoid this and use well-known algorithms in v8 or LLVM / GCC for that and replace expensive calls with cheaper multiplication and shift and add also some fast paths for special cases, e.g. for case when sign divisor is power of two value.I evaluated the acceleration that this transformation gives and for a divisible whose high part of word is not zero. This speedup unsigned division up to 7 times.
For unsigned:
=>
before was:
__wasm_i64_udiv(x, 3)
For signed (power of two):
For signed (non-power of two):
=>
Fix #4054