Skip to content

[OptimizeForJS] Optimize 64-bit div by constant #4055

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 42 commits into from

Conversation

MaxGraey
Copy link
Contributor

@MaxGraey MaxGraey commented Aug 4, 2021

Optimize only for division. (Opts for reminder will be in separate PR later).

JavaScript does not have 64-bit integer types (except BigInt which is not portable and slow), so most operations such as division must be replaced by emulated routine of the corresponding operation. In this case it is __wasm_i64_udiv and __wasm_i64_sdiv which do quite a lot of work and contain loops. They are called even for such cases when the divisor is a constant (including zero). We can avoid this and use well-known algorithms in v8 or LLVM / GCC for that and replace expensive calls with cheaper multiplication and shift and add also some fast paths for special cases, e.g. for case when sign divisor is power of two value.

I evaluated the acceleration that this transformation gives and for a divisible whose high part of word is not zero. This speedup unsigned division up to 7 times.

For unsigned:

u64(x) / 3   

=>

if (high_part(x) == 0) {
  return u64(u32(x) / 3) // fast path
} else {
  return mulh(x, -6148914691236517205) >> 1
}

before was: __wasm_i64_udiv(x, 3)

For signed (power of two):

i64(x) /  4    =>     ((x < 0 ? x + 3 : x) >> 2)
i64(x) / -8    =>    -((x < 0 ? x + 7 : x) >> 3)

For signed (non-power of two):

i64(x) /  3

=>

if (high_part(x) == 0) {
  return i64(i32(x) / 3) // fast path
} else {
  return mulh(x, 6148914691236517206) + (u64(x) >> 63)
}

Fix #4054

@MaxGraey MaxGraey changed the title [OptimizeForJS] Optimaze 64-bit div / rem by constant [OptimizeForJS] Optimize 64-bit div / rem by constant Aug 5, 2021
@MaxGraey MaxGraey changed the title [OptimizeForJS] Optimize 64-bit div / rem by constant [OptimizeForJS] Optimize 64-bit div by constant Aug 5, 2021
@kripken
Copy link
Member

kripken commented Aug 5, 2021

I am not sure I understand this. It uses code from V8 to optimize constant division at compile time - but wouldn't V8 and other engines do it at run time anyhow? Is there a benefit to doing it in the tools vs the VM?

@MaxGraey
Copy link
Contributor Author

MaxGraey commented Aug 5, 2021

Yes, v8 uses this (btw only for 32-bits for now). The problem is that 64-bit division in JS (ASM.JS) turns into a rather expensive __wasm_i64_udiv / __wasm_i64_sdiv call which can be avoided by applying the same practices used in js engines and LLVM (replacing division with multiplication and shift). I made benchmarks and found out that this speeds up division by a constant for 64-bit values up to 7 times for ASM.js code. And ofc it's apply only for ASM.js target

@MaxGraey
Copy link
Contributor Author

MaxGraey commented Aug 6, 2021

Fuzzed until ITERATION: 57567

@MaxGraey MaxGraey marked this pull request as ready for review August 6, 2021 12:52
@MaxGraey
Copy link
Contributor Author

MaxGraey commented Aug 6, 2021

I've updated the description. I hope the motivation is clearer now

@kripken
Copy link
Member

kripken commented Aug 9, 2021

Oh, thanks, I missed that this was for JS specifically. Yes, that seems like it could be useful.

Copy link
Member

@kripken kripken left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this does add a lot of complexity. But the speedup is significant as you said.

Do you have an idea of whether this happens in real-world code? Or is the worry theoretical for now?

@kripken
Copy link
Member

kripken commented Aug 9, 2021

To be clear, I think I can be convinced this is worth the complexity, but I'm not sure yet.

@MaxGraey
Copy link
Contributor Author

MaxGraey commented Aug 9, 2021

Do you have an idea of whether this happens in real-world code? Or is the worry theoretical for now?

Dividing by a constant is a fairly common operation. Perhaps even more frequent than the more common operation when the divisor is a more complex expression. The question, of course, is how often in real code we work with 64-bit values and use asmjs target for old browsers for example. I think it depends a lot on the application. Usually it is cryptography or compression/decompression algorithms. Also division is always involved in de-serializations (conv a number into a string).

@kripken
Copy link
Member

kripken commented Aug 11, 2021

The reasoning makes sense. I don't like the complexity, but given the large speed I don't object.

If there was something in the middle - much simpler, but not as fast - I'd prefer that, but I can't think of anything.

@@ -21,7 +21,7 @@ function asmFunc(env) {
}

function legalstub$2($0, $1, $2, $3, $4, $5) {
return ($4 | 0) == ($0 - $2 | 0) & ($5 | 0) == ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0);
return ($0 - $2 | 0) == ($4 | 0) & ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0) == ($5 | 0);
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you know why this changes?

Copy link
Contributor Author

@MaxGraey MaxGraey Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also added extra optimization pass after optimize-for-js:
https://github.com/WebAssembly/binaryen/pull/4055/files#diff-503ab0ba884eaf72c574346fa45a6875959b6b1b3c90787a7d3ce4d7f6b464f6R348

this necessary due to this rule skip some cases (paths) which can be handled by "optimize-instructions" pass

(i64.div_u
(local.get $x)
(i64.const 0)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't make sense to me. Division by zero should trap, so why is it turned into 0?

Copy link
Contributor Author

@MaxGraey MaxGraey Aug 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because in JavaScript it's valid and dones't trap. Basically I keep the same behaviour which already exists. You can check that __wasm_i64_udiv(10, 10, 0, 0) will return 0 and i64toi32_i32$HIGH_BITS also will be zero. The same, for 32-bits division: (10 / 0) | 0 -> 0

@kripken
Copy link
Member

kripken commented Aug 23, 2021

In another PR the discussion suggested that JS opts are not a priority and we should focus on wasm ones, and on getting AS to use the wasm build. I wanted to check if we are in agreement in that?

If so then I think this PR is an example of something that I think may not be worth focusing on, given the complexity, and the low benefit if we switch AS to use wasm anyhow.

@MaxGraey
Copy link
Contributor Author

Sure. But how about more simpler PRs? Like #4078 and #4083?

@MaxGraey MaxGraey closed this Jul 21, 2022
@MaxGraey MaxGraey deleted the opt-for-js-div-by-const branch July 21, 2022 17:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Magic division by constant for OptimizeForJS. Does it make sense?
2 participants