Skip to content

VM number optimizations occasionally give wrong numerical result. #18143

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lrhn opened this issue Apr 10, 2014 · 13 comments
Closed

VM number optimizations occasionally give wrong numerical result. #18143

lrhn opened this issue Apr 10, 2014 · 13 comments
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.

Comments

@lrhn
Copy link
Member

lrhn commented Apr 10, 2014

While running the co19 test for Rectangle.boundingBox, the VM occasionally gives a failure that isn't reproducible when plugging the same values into a separate test. The problem goes away when optimizations are disabled.

Example:

    while ( out/ReleaseIA32/dart tests/co19/src/LibTest/math/Rectangle/boundingBox_A01_t01.dart ) ; do echo -n .; done

This usually errs after a few dozen tests run. Changing it to:

    while ( out/ReleaseIA32/dart --optimization-filter=xxx tests/co19/src/LibTest/math/Rectangle/boundingBox_A01_t01.dart ) ; do echo -n .; done

makes it not fail (at least for a couple of hundred rounds).

@lrhn
Copy link
Member Author

lrhn commented Apr 10, 2014

The bug isn't new. It seems to have existed at least since v1.0.


cc @whesse.

@iposva-google
Copy link
Contributor

Srdjan has investigated issues with this test in the past.


Set owner to @sgmitrovic.
Added Accepted label.

@ghost
Copy link

ghost commented Apr 10, 2014

Please specify what release you are running. I cannot reproduce failures neither on Mac nor on Linux.
Note that an issue with that test was fixed in r34855 (April 8th).


Added NeedsInfo label.

@ghost
Copy link

ghost commented Apr 10, 2014

Added CannotReproduce label.

@ghost
Copy link

ghost commented Apr 10, 2014

Added Accepted label.

@ghost
Copy link

ghost commented Apr 10, 2014

Updated status with co19 issue #679: Rectangle equality may not work with double coordinates. Maybe this should be a library bug?

expected: <Rectangle (0, 99.99999999999999) 0.9999999999999999 x 4503599627370495.0>, actual: <Rectangle (0, 99.99999999999999) 0.9999999999999999 x 4503599627370496.0>


cc @lrhn.
Added Invalid label.

@iposva-google
Copy link
Contributor

Exhaustively search through all possible combinations

/*
 * Copyright (c) 2014, the Dart project authors. Please see the AUTHORS file
 * for details. All rights reserved. Use of this source code is governed by a
 * BSD-style license that can be found in the LICENSE file.
 /
/**
 
@­assertion Rectangle<T> boundingBox(Rectangle<T> other)
 * Returns a new rectangle which completely contains this and other.
 * @­description checks that this operation is symmetric.
 * @­author kaigorodov
 */
import "dart:math";
import "../../../Utils/expect.dart";

const List<num> values = const [
   0, -1, 1, 10, -10, 100, -100,
   -4.5035996273704955E15, -4.294967296000001E9, -4.294967296E9, -4.2949672959999995E9, -6031769.5,
   -101.0, -100.00000000000001, -100.0, -99.99999999999999, -99.0, -10.0, -1.0000000000000002, -1.0,
   -.9999999999999999, -.7, .9999999999999999, 1.0, 1.0000000000000002, 10.0, 99.0, 99.99999999999999,
   100.0, 100.00000000000001, 101.0, 6031769.5, 4.2949672959999995E9, 4.294967296E9, 4.294967296000001E9,
   4.5035996273704955E15, 4.503599627370496E15, 4.503599627370497E15, 3.273390607896142,
 ];
 
Random rand = new Random();

num randomVal() {
  int k = rand.nextInt(values.length);
  return values[k];
}

var failed = false;

verify(i, x, y, z, w) {
  Rectangle r1=new Rectangle(x, y, z, w);
  Rectangle r2=new Rectangle(z, w, x, y);
  Rectangle bb1=r1.boundingBox(r2);
  Rectangle bb2=r2.boundingBox(r1);
  if (bb1 != bb2) {
    print("i=$i; x=$x, y=$y, z=$z, w=$w, bb1=$bb1, bb2=$bb2");
    failed = true;
  }
}

main() {
  // Warmup.
  for(int i = 0; i < 1000; i++) {
    num x = randomVal();
    num y = randomVal();
    num z = randomVal();
    num w = randomVal();
    verify(i, x, y, z, w);
  }
  var len = values.length;
  var ii = 0;
  for (var x = 0; x < len; x++) {
    for (var y = 0; y < len; y++) {
      for (var z = 0; z < len; z++) {
        for (var w = 0; w < len; w++) {
          verify(--ii, values[x], values[y], values[z], values[w]);
        }
      }
    }
  }
  Expect.equals(failed, false);
}

An interesting fact is the following (and this only happens on Linux): The occasional rounding error is only reported when we go through the initial loop. It is never reported in the exhaustive search.

I am starting to wonder whether the rounding mode in C (!) is inconsistent on Linux?

@lrhn
Copy link
Member Author

lrhn commented Apr 11, 2014

Running VM on Linux (ReleaseIA32) of tip of bleeding_edge (as of ~20 hours ago). Also tried running it with archived versions of the 1.0, 1.2 and 1.2 releases, and it exhibited the same behavior.

When the test fails, it prints the values used for x,y,z,w. Running simple code with just those values again does not fail, so the problem is not in Rectangle. Disabling optimization removes the problem too, so the problem does seem to be in the VM optimization code.

It seems all failing cases mismatch on a 4503599627370495.0 / 4503599627370496.0 pair. That is the least significant bit of an integer, but I'm not sure it's rounding.

Equality of Rectangle is defined by equality of its left/right/top/bottom edges - the rectangles are geometrically closed, so different edge values means they differ on which points they contain. The boundingBox operation only does +/- (to get the right edge from left edge+width, get width from new left and right edge) and min/max to find the new edges. Those operations should be exactly defined - I see no reason for rounding differently in some cases. Is it hitting x87 FP (80-bit precission) code because it's going to runtime? Or, as you say, not using the same rounding mode all the time?

@iposva-google
Copy link
Contributor

Lasse, I have seen other mismatches in my testing.

Srdjan, Here is the minimal non-Random code causing this:

<SNIP>
import "dart:math";

verify(i, r1, r2, expected) {
  var bb = r2.boundingBox(r1);
  print("##");
  if (bb != expected) {
    throw i;
  }
}

main() {
  var i = 0;
  // Make boundingBox code polymorphic.
  var r1 = new Rectangle(1, 2, 3, 4);
  var r2 = new Rectangle(4, 3, 2, 1);
  r1.boundingBox(r2);
  r1 = new Rectangle(1.0, 2.0, 3.0, 4.0);
  r2 = new Rectangle(4.0, 3.0, 2.0, 1.0);
  r1.boundingBox(r2);

  // Use values that will trigger bad rounding on Linux.
  var x = 10;
  var y = -4.5035996273704955E15;
  var z = -10.0;
  var w = 99.99999999999999;
  r1 = new Rectangle(x, y, z, w);
  r2 = new Rectangle(z, w, x, y);
  var bb = r1.boundingBox(r2);
  while (true) {
    print(i);
    verify(i++, r1, r2, bb);
  }
}
<SNIP>

Running it with ./out/ReleaseIA32/dart --no-use-osr --optimization-filter=boundingBox --trace-compiler --trace-deoptimization --print-inlining-tree BoundingBox.dart we see the following:
On iteration 622 we optimize the function RectangleBase.boundingBox the first time, but it is immediately deoptimized.

On iteration 1248 RectangleBase.boundingBox is optimized a second time. This time we do not deoptimize, but we fail the comparison afterwards. Looking at it in more details with Srdjan we realized that the optimized code calls out to C++ for the first time ever during execution to execute a Double subtraction (all other calls to the double operator - were intrinsified), leading us to strongly suspect that the subtraction from C++ is different ever so slightly.


Added Accepted label.

@ghost
Copy link

ghost commented Apr 11, 2014

Thank you Ivan for the test case.

While DartVM code uses SSE 64-bit precision operations, the Linux C++ code uses x87 80-bit precision doubles. The discrepancy occurs when we subtract once in 64-bit and once in 80-bit precision.

There are no issues on Mac OS X (C++ code uses SSE) and on Linux64 (C++ uses SSE as well).

@ghost
Copy link

ghost commented Apr 12, 2014

The fix is to use -mfpmath=sse in 32-bit mode for Linux. CL pending.

@ghost
Copy link

ghost commented Apr 18, 2014

Fixed in r35001

@ghost
Copy link

ghost commented Jun 10, 2014

Added Fixed label.

@lrhn lrhn added Type-Defect area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends. labels Jun 10, 2014
@lrhn lrhn assigned ghost Jun 10, 2014
This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-vm Use area-vm for VM related issues, including code coverage, and the AOT and JIT backends.
Projects
None yet
Development

No branches or pull requests

3 participants