Fix MatchNumberWithTolerance to better compare floating-point values #1145

tannergooding · 2018-10-04T02:53:44Z

This updates MatchNumberWithTolerance to better compare floating-point values and enables it on Windows.

The previous algorithm was not properly accounting for the distribution of binary floating-point values and would not allow a match for numbers that could have been reasonably considered as equivalent.

tannergooding · 2018-10-04T03:06:56Z

A good example of a place where the previous algorithm would have been less than ideal is:

static void Main(string[] args)
{
    int xi = 16777217;
    int yi = 16777219;
    int zi = yi - xi;

    float xf = xi;
    float yf = yi;
    float zf = yf - xf;

    // 16777219 - 16777217 = 2
    Console.WriteLine($"{yi} - {xi} = {zi}");

    // 16777220 - 16777216 = 4
    Console.WriteLine($"{yf:G9} - {xf:G9} = {zf:G9}");

    // 0x4B800002 - 0x4B800000 = 0x40800000
    Console.WriteLine($"0x{BitConverter.SingleToInt32Bits(yf):X8} - 0x{BitConverter.SingleToInt32Bits(xf):X8} = 0x{BitConverter.SingleToInt32Bits(zf):X8}");
}

As you can see, the inputs are 16777217 and 16777219. However, the nearest representable floats are 16777216 and 16777220, respectively. And looking at the bit representation of the values, these numbers only differ by 2 bits (there is only one other representable value in between the two of them: 16777218). The previous algorithm would have only allowed a variance of 1.677722, which is not large enough to even make it to the next representable value.

The new algorithm rounds each input (both expected and actual) to a given number of significant digits (currently defaulting to 7), gets the delta of the rounded numbers, and then ensures that is within the tolerance (which is 10^-digits). This should properly account for the varying delta between representable values (for both large and small inputs).

sfilipi · 2018-10-04T03:36:29Z

test/Microsoft.ML.TestFramework/BaseTestBaseline.cs

@@ -23,7 +23,7 @@ namespace Microsoft.ML.Runtime.RunTests
    /// </summary>
    public abstract partial class BaseTestBaseline : BaseTestClass
    {
-        public const decimal Tolerance = 10_000_000;
+        public const int DigitsOfPrecision = 7;


public [](start = 8, length = 6)

nit: can it be internal?

Maybe, but the existing code already had the previous constant as public.

It looks like it is used by inherited classes. And this is "just tests" so internal vs. public may not make that much of a difference.

In reply to: 222529405 [](ancestors = 222529405)

sfilipi · 2018-10-04T03:57:08Z

Thanks for the change, Tanner. Did you want to try enabling any tests with the PR, to see if it helps?

In reply to: 426870598 [](ancestors = 426870598)

danmoseley · 2018-10-04T17:28:17Z

Will this allow you to remove the tolerance related disables on #1008 before you merge that?

test/Microsoft.ML.TestFramework/BaseTestBaseline.cs

eerhardt

Anipik · 2018-10-04T19:46:20Z

Couple of cases where this is failing // digitsOfPrecision = 1 , variance = 0.1
f1 = 12, f2 = 13 (MulticlassLRNonNegativeTest)
f1 = -2.49565363 , f2 = -2.50574446 // f1 should have got rounded to 3 but it gets rounded to 2 , we need to correct this (DefaultCalibratorPerceptronTest)
f1 = 0.743881166, f2 = 0.7518015 // delta comes out to be -0.10000000000000009 where as it should be -0.1 (RandomCalibratorPerceptronTest)

Anipik · 2018-10-04T19:53:28Z

similarly for f1 = 0.7099695, f2 = 0.6931915 delta should be one but its value 0.02 but its value is 0.020000000000000018

tannergooding · 2018-10-05T05:48:41Z

@Anipik, you have to decide which rounding behavior is the most desirable as each has its pros/cons. I've defaulted to the IEEE default rounding mode as that tends to have the best overall behavior for the binary floating-point format.
The delta differences are due to the IEEE floating-point format: 0.1 + 0.2 != 0.3; it equals 0.30000000000000004.

The algorithm should be generally sufficient for ML.NET, where I would expect that, when dealing with System.Single inputs. we will be getting results to within at least 4 significant digits of accuracy (but should ideally aim for 6-9). The exact error for any given scalar algorithm depends on the number of inputs and how they are ordered. Vectorized algorithms can have additional error based on the alignment of the inputs and how many elements are processed at a time.

I still need to do some checking on some disabled tests, on the x86 tests, and when the CPU only supports 128-bit vectors to see if there needs to be any small tweaks to the default

…dows

sfilipi

sfilipi reviewed Oct 4, 2018

View reviewed changes

danmoseley requested a review from Anipik October 4, 2018 17:28

danmoseley approved these changes Oct 4, 2018

View reviewed changes

Anipik approved these changes Oct 4, 2018

View reviewed changes

eerhardt reviewed Oct 4, 2018

View reviewed changes

test/Microsoft.ML.TestFramework/BaseTestBaseline.cs Show resolved Hide resolved

eerhardt approved these changes Oct 4, 2018

View reviewed changes

tannergooding added 2 commits October 5, 2018 10:18

Fix MatchNumberWithTolerance to better compare floating-point values

6a2975c

Updating CheckEqualityFromPathsCore to allow a tolerance match on Win…

8dacc8f

…dows

sfilipi approved these changes Oct 5, 2018

View reviewed changes

sfilipi merged commit 02e85cc into dotnet:master Oct 5, 2018

ghost locked as resolved and limited conversation to collaborators Mar 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix MatchNumberWithTolerance to better compare floating-point values #1145

Fix MatchNumberWithTolerance to better compare floating-point values #1145

Uh oh!

tannergooding commented Oct 4, 2018

Uh oh!

tannergooding commented Oct 4, 2018

Uh oh!

sfilipi Oct 4, 2018

Uh oh!

tannergooding Oct 4, 2018

Uh oh!

eerhardt Oct 4, 2018 •

edited

Loading

Uh oh!

sfilipi commented Oct 4, 2018

Uh oh!

danmoseley commented Oct 4, 2018

Uh oh!

Uh oh!

eerhardt left a comment

Uh oh!

Anipik commented Oct 4, 2018 •

edited

Loading

Uh oh!

Anipik commented Oct 4, 2018

Uh oh!

tannergooding commented Oct 5, 2018

Uh oh!

sfilipi left a comment

Uh oh!

Uh oh!

Fix MatchNumberWithTolerance to better compare floating-point values #1145

Fix MatchNumberWithTolerance to better compare floating-point values #1145

Uh oh!

Conversation

tannergooding commented Oct 4, 2018

Uh oh!

tannergooding commented Oct 4, 2018

Uh oh!

sfilipi Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

tannergooding Oct 4, 2018

Choose a reason for hiding this comment

Uh oh!

eerhardt Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sfilipi commented Oct 4, 2018

Uh oh!

danmoseley commented Oct 4, 2018

Uh oh!

Uh oh!

eerhardt left a comment

Choose a reason for hiding this comment

Uh oh!

Anipik commented Oct 4, 2018 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Anipik commented Oct 4, 2018

Uh oh!

tannergooding commented Oct 5, 2018

Uh oh!

sfilipi left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

eerhardt Oct 4, 2018 •

edited

Loading

Anipik commented Oct 4, 2018 •

edited

Loading