Skip to content

Explicitly recognise asm_float_zero #5046

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Apr 7, 2017

Conversation

aidanhs
Copy link
Contributor

@aidanhs aidanhs commented Mar 17, 2017

With the bitcode I have, llvm (in -O3) generates (before any js opts) a return as follows:

  • return +(+0); with -s WASM=0
  • return Math_fround((Math_fround(+0))); with -s WASM=1 (because PRECISE_F32 is always enabled with wasm)

Emscripten then performs JS opts (presumably to prepare for the emterpreter) in three steps:

  • asmPreciseF32 asm eliminate simplifyExpressions emitJSON (native)
  • asmPreciseF32 asm receiveJSON localCSE safeLabelSetting emitJSON (non-native)
  • asmPreciseF32 asm receiveJSON optimizeFrounds registerizeHarder asmLastOpts (native)

+(+0) simplifies to +0, but, courtesy of optimizeFrounds, Math_fround((Math_fround(+0))); simplifies to the variable f0. Now the emterpretify step is unable to figure out what the return type is (when detecting return type, the asmData isn't ready yet so isn't passed to detectType and the variable type can't be detected). This PR explicitly matches against the global float zero.

So the report in #5031 is more accurately expressed as "-s EMTERPRETIFY=1 does not work with PRECISE_F32=1", it's just that wasm made it obvious. I'm somewhat surprised that no code you've tested has exhibited this issue, but maybe code that returns a Math_fround call is relatively rare and so is emterpreter testing with binaryen?

@kripken
Copy link
Member

kripken commented Mar 17, 2017

Thanks! Looks correct. We should just add a test for this. I suggest adding it to test_emterpretify in test_core.py (i.e. make a new test cpp file instead of the hello world we use, and add a return of a float - mark it no-inline to be sure).

I'm somewhat surprised that no code you've tested has exhibited this issue

I'm surprised fuzzing didn't catch it - perhaps the fuzzer doesn't return floats? Otherwise, PRECISE_F32 + emterpreter don't have much overlap in the suite, and we do have some wasm + emterpreter tests, which as you say, would catch this if we returned a float, but they happen to not have that...

@aidanhs
Copy link
Contributor Author

aidanhs commented Apr 6, 2017

It turns out my explanation for why this is happening is wrong. After I was unable to come up with a test, I looked again - optimizeFrounds deliberately does not optimize Math_fround(0) to f0 in returns (and even if it did, the optimization would not happen as I describe above because there is a leading + in front of the 0 which would inhibit it).

Instead, it turns out that this is an optimization performed by registerizeHarder! Here's a simplified piece of asm.js:

function _combine_soft_light_c($0,$1,$2,$3) {
 $0 = Math_fround($0);
 $1 = Math_fround($1);
 $2 = Math_fround($2);
 $3 = Math_fround($3);
 var $4 = 0, $5 = 0, $6 = 0, $7 = 0, $8 = 0, $9 = 0, $or$cond = 0, label = 0, sp = 0;
 var $10 = 0, $11 = 0, $12 = 0, $13 = 0, $20 = Math_fround(0), $21 = Math_fround(0), $22 = Math_fround(0), $23 = Math_fround(0), $24 = Math_fround(0);
 sp = STACKTOP;
 STACKTOP = STACKTOP + 48|0;
 $4 = sp + 36|0;
 $5 = sp + 32|0;
 $6 = sp + 28|0;
 $7 = sp + 24|0;
 $8 = sp + 20|0;
 $9 = sp + 16|0;
 $10 = sp + 12|0;
 $11 = sp + 8|0;
 $12 = sp + 4|0;
 $13 = sp;
 $20 = Math_fround(HEAPF32[$13>>2]);
 $21 = Math_fround(HEAPF32[$9>>2]);
 $22 = Math_fround(HEAPF32[$8>>2]);
 $23 = Math_fround(HEAPF32[$7>>2]);
 $24 = $23 <= $22;
 if ($24) {
  $90 = Math_fround($23);
  STACKTOP = sp;return (Math_fround($90));
 } else {
  $90 = Math_fround($22);
  STACKTOP = sp;return (Math_fround($90));
 }
 return Math_fround((Math_fround(+0)));
}

You can optimize this directly with ./optimizer x.js asmPreciseF32 asm eliminate simplifyExpressions optimizeFrounds registerizeHarder to get

function _combine_soft_light_c(f1, f2, f4, f5) {
 f1 = Math_fround(f1);
 f2 = Math_fround(f2);
 f4 = Math_fround(f4);
 f5 = Math_fround(f5);
 var i3 = 0;
 i3 = STACKTOP;
 STACKTOP = STACKTOP + 48 | 0;
 f1 = Math_fround(HEAPF32[i3 + 20 >> 2]);
 f2 = Math_fround(HEAPF32[i3 + 24 >> 2]);
 if (f2 <= f1) {
  $90 = Math_fround(f2);
  STACKTOP = i3;
  return Math_fround($90);
 } else {
  $90 = Math_fround(f1);
  STACKTOP = i3;
  return Math_fround($90);
 }
 return f0;
}

I guess registerizeHarder figures that the final return is dead, so it can replace it with whatever (possibly it shouldn't, given that optimizeFrounds tries so hard not to). So in fact, the combination of conditions required to see this issue is "have a float-returning function with an if/else at the end of the function with each branch returning something and being sufficiently complex that llvm does not attempt a simplification to just use the single dead return statement at the end of the function to return the value from the two branches (i.e. avoiding if (x) { $ret = fround(1) } else { $ret = fround(2) } return fround($ret);").

As you might imagine with the description above, it's actually pretty hard to come up with a test case here because llvm on O3 takes great pleasure in rearranging if conditions. Looking at the actual asm.js that tickles the bug, there's no reason I can see that LLVM hasn't done exactly that - I guess it's just exceeded some code size threshold.

Click to see unreduced asm.js for function

function _combine_soft_light_c($0,$1,$2,$3) {
 $0 = Math_fround($0);
 $1 = Math_fround($1);
 $2 = Math_fround($2);
 $3 = Math_fround($3);
 var $10 = 0, $11 = 0, $12 = 0, $13 = 0, $14 = Math_fround(0), $15 = Math_fround(0), $16 = Math_fround(0), $17 = Math_fround(0), $18 = Math_fround(0), $19 = Math_fround(0), $20 = Math_fround(0), $21 = Math_fround(0), $22 = Math_fround(0), $23 = Math_fround(0), $24 = Math_fround(0), $25 = Math_fround(0), $26 = Math_fround(0), $27 = Math_fround(0), $28 = Math_fround(0), $29 = Math_fround(0);
 var $30 = Math_fround(0), $31 = 0, $32 = Math_fround(0), $33 = 0, $34 = Math_fround(0), $35 = 0, $36 = Math_fround(0), $37 = Math_fround(0), $38 = Math_fround(0), $39 = Math_fround(0), $4 = 0, $40 = Math_fround(0), $41 = Math_fround(0), $42 = Math_fround(0), $43 = Math_fround(0), $44 = Math_fround(0), $45 = Math_fround(0), $46 = Math_fround(0), $47 = Math_fround(0), $48 = Math_fround(0);
 var $49 = Math_fround(0), $5 = 0, $50 = Math_fround(0), $51 = Math_fround(0), $52 = Math_fround(0), $53 = Math_fround(0), $54 = Math_fround(0), $55 = 0, $56 = Math_fround(0), $57 = Math_fround(0), $58 = Math_fround(0), $59 = Math_fround(0), $6 = 0, $60 = Math_fround(0), $61 = Math_fround(0), $62 = Math_fround(0), $63 = Math_fround(0), $64 = Math_fround(0), $65 = Math_fround(0), $66 = Math_fround(0);
 var $67 = Math_fround(0), $68 = Math_fround(0), $69 = Math_fround(0), $7 = 0, $70 = Math_fround(0), $71 = Math_fround(0), $72 = Math_fround(0), $73 = Math_fround(0), $74 = Math_fround(0), $75 = Math_fround(0), $76 = Math_fround(0), $77 = Math_fround(0), $78 = Math_fround(0), $79 = Math_fround(0), $8 = 0, $80 = Math_fround(0), $81 = Math_fround(0), $82 = Math_fround(0), $83 = Math_fround(0), $84 = Math_fround(0);
 var $85 = Math_fround(0), $86 = Math_fround(0), $87 = Math_fround(0), $88 = Math_fround(0), $89 = Math_fround(0), $9 = 0, $90 = Math_fround(0), $or$cond = 0, label = 0, sp = 0;
 sp = STACKTOP;
 STACKTOP = STACKTOP + 48|0;
 $4 = sp + 36|0;
 $5 = sp + 32|0;
 $6 = sp + 28|0;
 $7 = sp + 24|0;
 $8 = sp + 20|0;
 $9 = sp + 16|0;
 $10 = sp + 12|0;
 $11 = sp + 8|0;
 $12 = sp + 4|0;
 $13 = sp;
 HEAPF32[$9>>2] = $0;
 HEAPF32[$10>>2] = $1;
 HEAPF32[$11>>2] = $2;
 HEAPF32[$12>>2] = $3;
 $14 = Math_fround(HEAPF32[$9>>2]);
 $15 = Math_fround(Math_fround(+1) - $14);
 $16 = Math_fround(HEAPF32[$12>>2]);
 $17 = Math_fround($15 * $16);
 $18 = Math_fround(HEAPF32[$11>>2]);
 $19 = Math_fround(Math_fround(+1) - $18);
 $20 = Math_fround(HEAPF32[$10>>2]);
 $21 = Math_fround($19 * $20);
 $22 = Math_fround($17 + $21);
 HEAPF32[$13>>2] = $22;
 $23 = Math_fround(HEAPF32[$13>>2]);
 $24 = Math_fround(HEAPF32[$9>>2]);
 $25 = Math_fround(HEAPF32[$10>>2]);
 $26 = Math_fround(HEAPF32[$11>>2]);
 $27 = Math_fround(HEAPF32[$12>>2]);
 HEAPF32[$5>>2] = $24;
 HEAPF32[$6>>2] = $25;
 HEAPF32[$7>>2] = $26;
 HEAPF32[$8>>2] = $27;
 $28 = Math_fround(HEAPF32[$6>>2]);
 $29 = Math_fround(Math_fround(+2) * $28);
 $30 = Math_fround(HEAPF32[$5>>2]);
 $31 = $29 < $30;
 $32 = Math_fround(HEAPF32[$7>>2]);
 $33 = Math_fround(+-1.17549435E-38) < $32;
 $34 = Math_fround(HEAPF32[$7>>2]);
 $35 = $34 < Math_fround(+1.17549435E-38);
 $or$cond = $33 & $35;
 if ($31) {
  $36 = Math_fround(HEAPF32[$8>>2]);
  $37 = Math_fround(HEAPF32[$5>>2]);
  $38 = Math_fround($36 * $37);
  if ($or$cond) {
   HEAPF32[$4>>2] = $38;
   $89 = Math_fround(HEAPF32[$4>>2]);
   $90 = Math_fround($23 + $89);
   STACKTOP = sp;return (Math_fround($90));
  } else {
   $39 = Math_fround(HEAPF32[$8>>2]);
   $40 = Math_fround(HEAPF32[$7>>2]);
   $41 = Math_fround(HEAPF32[$8>>2]);
   $42 = Math_fround($40 - $41);
   $43 = Math_fround($39 * $42);
   $44 = Math_fround(HEAPF32[$5>>2]);
   $45 = Math_fround(HEAPF32[$6>>2]);
   $46 = Math_fround(Math_fround(+2) * $45);
   $47 = Math_fround($44 - $46);
   $48 = Math_fround($43 * $47);
   $49 = Math_fround(HEAPF32[$7>>2]);
   $50 = Math_fround($48 / $49);
   $51 = Math_fround($38 - $50);
   HEAPF32[$4>>2] = $51;
   $89 = Math_fround(HEAPF32[$4>>2]);
   $90 = Math_fround($23 + $89);
   STACKTOP = sp;return (Math_fround($90));
  }
 }
 if ($or$cond) {
  HEAPF32[$4>>2] = Math_fround(+0);
  $89 = Math_fround(HEAPF32[$4>>2]);
  $90 = Math_fround($23 + $89);
  STACKTOP = sp;return (Math_fround($90));
 }
 $52 = Math_fround(HEAPF32[$8>>2]);
 $53 = Math_fround(Math_fround(+4) * $52);
 $54 = Math_fround(HEAPF32[$7>>2]);
 $55 = $53 <= $54;
 $56 = Math_fround(HEAPF32[$8>>2]);
 $57 = Math_fround(HEAPF32[$5>>2]);
 $58 = Math_fround($56 * $57);
 if ($55) {
  $59 = Math_fround(HEAPF32[$6>>2]);
  $60 = Math_fround(Math_fround(+2) * $59);
  $61 = Math_fround(HEAPF32[$5>>2]);
  $62 = Math_fround($60 - $61);
  $63 = Math_fround(HEAPF32[$8>>2]);
  $64 = Math_fround($62 * $63);
  $65 = Math_fround(HEAPF32[$8>>2]);
  $66 = Math_fround(Math_fround(+16) * $65);
  $67 = Math_fround(HEAPF32[$7>>2]);
  $68 = Math_fround($66 / $67);
  $69 = Math_fround($68 - Math_fround(+12));
  $70 = Math_fround(HEAPF32[$8>>2]);
  $71 = Math_fround($69 * $70);
  $72 = Math_fround(HEAPF32[$7>>2]);
  $73 = Math_fround($71 / $72);
  $74 = Math_fround($73 + Math_fround(+3));
  $75 = Math_fround($64 * $74);
  $76 = Math_fround($58 + $75);
  HEAPF32[$4>>2] = $76;
  $89 = Math_fround(HEAPF32[$4>>2]);
  $90 = Math_fround($23 + $89);
  STACKTOP = sp;return (Math_fround($90));
 } else {
  $77 = Math_fround(HEAPF32[$8>>2]);
  $78 = Math_fround(HEAPF32[$7>>2]);
  $79 = Math_fround($77 * $78);
  $80 = (Math_fround(Math_sqrt((Math_fround($79)))));
  $81 = Math_fround(HEAPF32[$8>>2]);
  $82 = Math_fround($80 - $81);
  $83 = Math_fround(HEAPF32[$6>>2]);
  $84 = Math_fround(Math_fround(+2) * $83);
  $85 = Math_fround(HEAPF32[$5>>2]);
  $86 = Math_fround($84 - $85);
  $87 = Math_fround($82 * $86);
  $88 = Math_fround($58 + $87);
  HEAPF32[$4>>2] = $88;
  $89 = Math_fround(HEAPF32[$4>>2]);
  $90 = Math_fround($23 + $89);
  STACKTOP = sp;return (Math_fround($90));
 }
 return Math_fround((Math_fround(+0)));
}

So, thoughts? I'm not really sure how to create a test case that doesn't heavily depend on details of the LLVM optimizer.

@kripken
Copy link
Member

kripken commented Apr 6, 2017

Interesting. My guess is that there is special code in registerizeHarder to handle the final return in a function, which is special in asm.js, e.g. it must exist, even if right before it is an if-else which never returns, so sounds related to what you see there. One theory is it is removing it, then re-introducing a new return to ensure one exists, and it does that last part wrong.

For testing, I'd say just add to other.test_js_optimizer. Then you just provide the asm.js before and after the js optimizer is run, and the LLVM optimizer is never in the picture.

@aidanhs
Copy link
Contributor Author

aidanhs commented Apr 6, 2017

Ok, I've added a couple of tests. test-reduce-dead-float-return-output verifies that optimizeFrounds and registerizeHarder combined reduce a dead return Math_fround((Math_fround(+0))); to return f0;. This is unchanged by my patch.

test-no-reduce-dead-float-return-to-nothing-output verifies that a return f0; is left alone by registerizeHarder (i.e. makes sure that my patch fixes this bug).

@kripken
Copy link
Member

kripken commented Apr 7, 2017

Great, thanks!

We'll need to bump the version number so the c++ optimizer gets rebuilt. I'll do that later (til then the bots might show an error on that test).

@kripken kripken merged commit 36a62c6 into emscripten-core:incoming Apr 7, 2017
@aidanhs aidanhs deleted the aphs-detect-global-f0 branch April 7, 2017 01:43
@kripken
Copy link
Member

kripken commented Apr 11, 2017

Looks like this broke other.test_emterpreter. The error contains Assertion failed: [["binary","+",["name","f0"],["name","f0"]],2,0] which looks relevant as well.

@kripken
Copy link
Member

kripken commented Apr 11, 2017

Hmm, the test may be invalid, this is the right area,

    do_js_test('float', r'''
function _main() {
  var f = f0;
  f = f0 + f0;
  print(f);
}
''', [], '0\n', floaty=True)

Perhaps we are missing the declaration of f0?

@kripken
Copy link
Member

kripken commented Apr 11, 2017

Ah no, it's something else. PR coming soon.

kripken added a commit that referenced this pull request Apr 11, 2017
…w we detect f0 properly (since #5046), so we do notice float operations in more cases, which is why this went unnoticed before. we do all emterpreter math in doubles anyhow, so this just requires us to accept float ops and emit the same double ops
kripken added a commit that referenced this pull request May 3, 2017
…cessary as no… (#5129)

* support float versions of binary operations, which is necessary as now we detect f0 properly (since #5046), so we do notice float operations in more cases, which is why this went unnoticed before. we do all emterpreter math in doubles anyhow, so this just requires us to accept float ops and emit the same double ops

* improve emterpreter makeBinary assertion message
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants