[OptimizeForJS] Optimize 64-bit div by constant #4055

MaxGraey · 2021-08-04T18:39:13Z

Optimize only for division. (Opts for reminder will be in separate PR later).

JavaScript does not have 64-bit integer types (except BigInt which is not portable and slow), so most operations such as division must be replaced by emulated routine of the corresponding operation. In this case it is __wasm_i64_udiv and __wasm_i64_sdiv which do quite a lot of work and contain loops. They are called even for such cases when the divisor is a constant (including zero). We can avoid this and use well-known algorithms in v8 or LLVM / GCC for that and replace expensive calls with cheaper multiplication and shift and add also some fast paths for special cases, e.g. for case when sign divisor is power of two value.

I evaluated the acceleration that this transformation gives and for a divisible whose high part of word is not zero. This speedup unsigned division up to 7 times.

For unsigned:

u64(x) / 3

=>

if (high_part(x) == 0) {
  return u64(u32(x) / 3) // fast path
} else {
  return mulh(x, -6148914691236517205) >> 1
}

before was: __wasm_i64_udiv(x, 3)

For signed (power of two):

i64(x) /  4    =>     ((x < 0 ? x + 3 : x) >> 2)

i64(x) / -8    =>    -((x < 0 ? x + 7 : x) >> 3)

For signed (non-power of two):

i64(x) /  3

=>

if (high_part(x) == 0) {
  return i64(i32(x) / 3) // fast path
} else {
  return mulh(x, 6148914691236517206) + (u64(x) >> 63)
}

Fix #4054

kripken · 2021-08-05T21:45:19Z

I am not sure I understand this. It uses code from V8 to optimize constant division at compile time - but wouldn't V8 and other engines do it at run time anyhow? Is there a benefit to doing it in the tools vs the VM?

MaxGraey · 2021-08-05T21:50:20Z

Yes, v8 uses this (btw only for 32-bits for now). The problem is that 64-bit division in JS (ASM.JS) turns into a rather expensive __wasm_i64_udiv / __wasm_i64_sdiv call which can be avoided by applying the same practices used in js engines and LLVM (replacing division with multiplication and shift). I made benchmarks and found out that this speeds up division by a constant for 64-bit values up to 7 times for ASM.js code. And ofc it's apply only for ASM.js target

MaxGraey · 2021-08-06T12:52:10Z

Fuzzed until ITERATION: 57567

MaxGraey · 2021-08-06T13:11:14Z

I've updated the description. I hope the motivation is clearer now

kripken · 2021-08-09T20:40:18Z

Oh, thanks, I missed that this was for JS specifically. Yes, that seems like it could be useful.

kripken

Overall this does add a lot of complexity. But the speedup is significant as you said.

Do you have an idea of whether this happens in real-world code? Or is the worry theoretical for now?

src/passes/OptimizeForJS.cpp

src/passes/wasm-intrinsics.wat

src/support/div-by-const.cpp

kripken · 2021-08-09T20:46:02Z

To be clear, I think I can be convinced this is worth the complexity, but I'm not sure yet.

MaxGraey · 2021-08-09T21:21:05Z

Do you have an idea of whether this happens in real-world code? Or is the worry theoretical for now?

Dividing by a constant is a fairly common operation. Perhaps even more frequent than the more common operation when the divisor is a more complex expression. The question, of course, is how often in real code we work with 64-bit values and use asmjs target for old browsers for example. I think it depends a lot on the application. Usually it is cryptography or compression/decompression algorithms. Also division is always involved in de-serializations (conv a number into a string).

kripken · 2021-08-11T19:59:37Z

The reasoning makes sense. I don't like the complexity, but given the large speed I don't object.

If there was something in the middle - much simpler, but not as fast - I'd prefer that, but I can't think of anything.

kripken · 2021-08-11T20:02:24Z

test/wasm2js/i64-add-sub.2asm.js.opt

@@ -21,7 +21,7 @@ function asmFunc(env) {
 }

 function legalstub$2($0, $1, $2, $3, $4, $5) {
-  return ($4 | 0) == ($0 - $2 | 0) & ($5 | 0) == ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0);
+  return ($0 - $2 | 0) == ($4 | 0) & ($1 - (($0 >>> 0 < $2 >>> 0) + $3 | 0) | 0) == ($5 | 0);
 }


do you know why this changes?

I also added extra optimization pass after optimize-for-js:
https://github.com/WebAssembly/binaryen/pull/4055/files#diff-503ab0ba884eaf72c574346fa45a6875959b6b1b3c90787a7d3ce4d7f6b464f6R348

this necessary due to this rule skip some cases (paths) which can be handled by "optimize-instructions" pass

kripken · 2021-08-11T20:03:29Z

test/lit/passes/optimize-for-js.wast

+  (i64.div_u
+   (local.get $x)
+   (i64.const 0)
+  )


This doesn't make sense to me. Division by zero should trap, so why is it turned into 0?

Because in JavaScript it's valid and dones't trap. Basically I keep the same behaviour which already exists. You can check that __wasm_i64_udiv(10, 10, 0, 0) will return 0 and i64toi32_i32$HIGH_BITS also will be zero. The same, for 32-bits division: (10 / 0) | 0 -> 0

kripken · 2021-08-23T22:59:45Z

In another PR the discussion suggested that JS opts are not a priority and we should focus on wasm ones, and on getting AS to use the wasm build. I wanted to check if we are in agreement in that?

If so then I think this PR is an example of something that I think may not be worth focusing on, given the complexity, and the low benefit if we switch AS to use wasm anyhow.

MaxGraey · 2021-08-24T05:40:47Z

Sure. But how about more simpler PRs? Like #4078 and #4083?

MaxGraey added 6 commits August 4, 2021 18:11

init

bf6eb7e

fix

414a0eb

comments

c985cdc

fix (wip)

95f3472

wip

86a42b3

fix

1cb12f6

MaxGraey mentioned this pull request Aug 4, 2021

Magic division by constant for OptimizeForJS. Does it make sense? #4054

Closed

MaxGraey added 9 commits August 4, 2021 21:49

lint

d282466

lint

37a416f

more tests and fixes

860860c

lint

14c8039

cleanups

5b1790d

refactor

397919f

special case for signed Power of Two divisors

a3fca5f

fix

5de26a0

lint

c544a2d

MaxGraey changed the title ~~[OptimizeForJS] Optimaze 64-bit div / rem by constant~~ [OptimizeForJS] Optimize 64-bit div / rem by constant Aug 5, 2021

more tests

2167b6e

MaxGraey changed the title ~~[OptimizeForJS] Optimize 64-bit div / rem by constant~~ [OptimizeForJS] Optimize 64-bit div by constant Aug 5, 2021

MaxGraey added 11 commits August 5, 2021 11:25

comment reminder rules for now

2e789a7

minor refactoring

59092b3

fix

0b924ba

no skip

3de6a63

refactor

89eac9d

skip negative divisors for unsigned divs

984b530

skip const-by-const divs

49ec6dd

fix

69457d7

Merge branch 'main' into opt-for-js-div-by-const

88f1f5f

fixes

aa3e3cc

lint

cda2348

MaxGraey added 2 commits August 5, 2021 21:48

fix

19e63e5

lint

a8f78af

MaxGraey added 3 commits August 6, 2021 00:57

Merge branch 'main' into opt-for-js-div-by-const

448cfb8

Merge branch 'main' into opt-for-js-div-by-const

25ac50d

clarify comment

bda17ad

MaxGraey marked this pull request as ready for review August 6, 2021 12:52

MaxGraey added 4 commits August 6, 2021 16:40

add optimize instructions after optimize-for-js

2f5ea6a

update wasm2js fixtures

be3c454

add test for i64(x) / -4

6f3b04d

add div by smin test

d97849e

kripken reviewed Aug 9, 2021

View reviewed changes

src/passes/OptimizeForJS.cpp Outdated Show resolved Hide resolved

src/passes/OptimizeForJS.cpp Show resolved Hide resolved

src/passes/wasm-intrinsics.wat Show resolved Hide resolved

src/support/div-by-const.cpp Outdated Show resolved Hide resolved

embed full license content for header and cpp

bafc225

suggestions

014993c

kripken reviewed Aug 11, 2021

View reviewed changes

MaxGraey added 2 commits August 11, 2021 23:28

add comment for wasm-intrinsics.wat

1ce29d5

remove empry gap

6ae253f

MaxGraey mentioned this pull request Aug 14, 2021

[Wasm2JS] More optimal JS codegen for some special cases #4078

Open

MaxGraey closed this Jul 21, 2022

MaxGraey deleted the opt-for-js-div-by-const branch July 21, 2022 17:27

[OptimizeForJS] Optimize 64-bit div by constant #4055

[OptimizeForJS] Optimize 64-bit div by constant #4055

Uh oh!

Conversation

MaxGraey commented Aug 4, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kripken commented Aug 5, 2021

Uh oh!

MaxGraey commented Aug 5, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

MaxGraey commented Aug 6, 2021

Uh oh!

MaxGraey commented Aug 6, 2021

Uh oh!

kripken commented Aug 9, 2021

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kripken commented Aug 9, 2021

Uh oh!

MaxGraey commented Aug 9, 2021

Uh oh!

kripken commented Aug 11, 2021

Uh oh!

kripken Aug 11, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGraey Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken Aug 11, 2021

Choose a reason for hiding this comment

Uh oh!

MaxGraey Aug 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kripken commented Aug 23, 2021

Uh oh!

MaxGraey commented Aug 24, 2021

Uh oh!

Uh oh!

MaxGraey commented Aug 4, 2021 •

edited

Loading

MaxGraey commented Aug 5, 2021 •

edited

Loading

MaxGraey Aug 11, 2021 •

edited

Loading

MaxGraey Aug 11, 2021 •

edited

Loading