Skip to content

Fix MatchNumberWithTolerance to better compare floating-point values #1145

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 5, 2018
Merged

Fix MatchNumberWithTolerance to better compare floating-point values #1145

merged 2 commits into from
Oct 5, 2018

Conversation

tannergooding
Copy link
Member

This updates MatchNumberWithTolerance to better compare floating-point values and enables it on Windows.

The previous algorithm was not properly accounting for the distribution of binary floating-point values and would not allow a match for numbers that could have been reasonably considered as equivalent.

@tannergooding
Copy link
Member Author

A good example of a place where the previous algorithm would have been less than ideal is:

static void Main(string[] args)
{
    int xi = 16777217;
    int yi = 16777219;
    int zi = yi - xi;

    float xf = xi;
    float yf = yi;
    float zf = yf - xf;

    // 16777219 - 16777217 = 2
    Console.WriteLine($"{yi} - {xi} = {zi}");

    // 16777220 - 16777216 = 4
    Console.WriteLine($"{yf:G9} - {xf:G9} = {zf:G9}");

    // 0x4B800002 - 0x4B800000 = 0x40800000
    Console.WriteLine($"0x{BitConverter.SingleToInt32Bits(yf):X8} - 0x{BitConverter.SingleToInt32Bits(xf):X8} = 0x{BitConverter.SingleToInt32Bits(zf):X8}");
}

As you can see, the inputs are 16777217 and 16777219. However, the nearest representable floats are 16777216 and 16777220, respectively. And looking at the bit representation of the values, these numbers only differ by 2 bits (there is only one other representable value in between the two of them: 16777218). The previous algorithm would have only allowed a variance of 1.677722, which is not large enough to even make it to the next representable value.

The new algorithm rounds each input (both expected and actual) to a given number of significant digits (currently defaulting to 7), gets the delta of the rounded numbers, and then ensures that is within the tolerance (which is 10^-digits). This should properly account for the varying delta between representable values (for both large and small inputs).

@@ -23,7 +23,7 @@ namespace Microsoft.ML.Runtime.RunTests
/// </summary>
public abstract partial class BaseTestBaseline : BaseTestClass
{
public const decimal Tolerance = 10_000_000;
public const int DigitsOfPrecision = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

public [](start = 8, length = 6)

nit: can it be internal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, but the existing code already had the previous constant as public.

Copy link
Member

@eerhardt eerhardt Oct 4, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like it is used by inherited classes. And this is "just tests" so internal vs. public may not make that much of a difference.


In reply to: 222529405 [](ancestors = 222529405)

@sfilipi
Copy link
Member

sfilipi commented Oct 4, 2018

Thanks for the change, Tanner. Did you want to try enabling any tests with the PR, to see if it helps?


In reply to: 426870598 [](ancestors = 426870598)

@danmoseley
Copy link
Member

Will this allow you to remove the tolerance related disables on #1008 before you merge that?

@danmoseley danmoseley requested a review from Anipik October 4, 2018 17:28
Copy link
Member

@eerhardt eerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@Anipik
Copy link
Contributor

Anipik commented Oct 4, 2018

Couple of cases where this is failing // digitsOfPrecision = 1 , variance = 0.1
f1 = 12, f2 = 13 (MulticlassLRNonNegativeTest)
f1 = -2.49565363 , f2 = -2.50574446 // f1 should have got rounded to 3 but it gets rounded to 2 , we need to correct this (DefaultCalibratorPerceptronTest)
f1 = 0.743881166, f2 = 0.7518015 // delta comes out to be -0.10000000000000009 where as it should be -0.1 (RandomCalibratorPerceptronTest)

@Anipik
Copy link
Contributor

Anipik commented Oct 4, 2018

similarly for f1 = 0.7099695, f2 = 0.6931915 delta should be one but its value 0.02 but its value is 0.020000000000000018

@tannergooding
Copy link
Member Author

@Anipik, you have to decide which rounding behavior is the most desirable as each has its pros/cons. I've defaulted to the IEEE default rounding mode as that tends to have the best overall behavior for the binary floating-point format.
The delta differences are due to the IEEE floating-point format: 0.1 + 0.2 != 0.3; it equals 0.30000000000000004.

The algorithm should be generally sufficient for ML.NET, where I would expect that, when dealing with System.Single inputs. we will be getting results to within at least 4 significant digits of accuracy (but should ideally aim for 6-9). The exact error for any given scalar algorithm depends on the number of inputs and how they are ordered. Vectorized algorithms can have additional error based on the alignment of the inputs and how many elements are processed at a time.

  • I still need to do some checking on some disabled tests, on the x86 tests, and when the CPU only supports 128-bit vectors to see if there needs to be any small tweaks to the default

Copy link
Member

@sfilipi sfilipi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@sfilipi sfilipi merged commit 02e85cc into dotnet:master Oct 5, 2018
@ghost ghost locked as resolved and limited conversation to collaborators Mar 28, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants