-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Enabling FFM tests #1206
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling FFM tests #1206
Conversation
// see https://github.com/dotnet/machinelearning/issues/404 | ||
// in Linux, the clang sqrt() results vary highly from the ones in mac and Windows. | ||
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux)) | ||
RunAllTests(binaryPredictors, binaryClassificationDatasets, digitsOfPrecision:4); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it mean that we only check "9487" in "9487.05"? The issue you opened said that the difference happens at the 17th decimal, so probably we can increate it to 7? #Pending
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the small subset that I explored, there were differences starting on the 4th decimal digit, for Linux. Should update the issue.
In reply to: 223888003 [](ancestors = 223888003)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
giving 5 another try after updating the comparison code.
In reply to: 224267254 [](ancestors = 224267254,223888003)
/// <paramref name="toCompare"/> objects are used for comparison only. | ||
/// </summary> | ||
/// <returns>Whether this test succeeded.</returns> | ||
protected bool TestCore(RunContextBase ctx, string cmdName, string args, int digitsOfPrecision, params PathArgument[] toCompare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) a single space was added to the beginning of the method, messing up the alignment.
Can you also align the ///
comments as well? #Closed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -355,7 +360,14 @@ protected bool CheckTestOutputMatchesTrainTest(string trainTestOutPath, string t | |||
/// </summary> | |||
public abstract partial class TestDmCommandBase : TestCommandBase | |||
{ | |||
private bool TestCoreCore(RunContextBase ctx, string cmdName, string dataPath, PathArgument.Usage situation, OutputPath inModelPath, OutputPath outModelPath, string loaderArgs, string extraArgs, params PathArgument[] toCompare) | |||
private bool TestCoreCore(RunContextBase ctx, string cmdName, string dataPath, PathArgument.Usage situation, | |||
OutputPath inModelPath, OutputPath outModelPath, string loaderArgs, string extraArgs, params PathArgument[] toCompare){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) opening curly brace should go on a new line. #Resolved
@@ -407,6 +419,11 @@ protected bool TestCore(RunContextBase ctx, string cmdName, string dataPath, str | |||
return TestCoreCore(ctx, cmdName, dataPath, PathArgument.Usage.DataModel, null, ctx.ModelPath(), loaderArgs, extraArgs, toCompare); | |||
} | |||
|
|||
protected bool TestCore(RunContextBase ctx, string cmdName, string dataPath, string loaderArgs, string extraArgs, int digitsOfPrecision, params PathArgument[] toCompare) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
digitsOfPrecision [](start = 126, length = 17)
I don't see digitsOfPrecision
being used in the method. Am I missing something? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -1944,8 +1944,10 @@ public void BinaryClassifierFieldAwareFactorizationMachineTest() | |||
|
|||
// see https://github.com/dotnet/machinelearning/issues/404 | |||
// in Linux, the clang sqrt() results vary highly from the ones in mac and Windows. | |||
// goign for 3 digits of precision, because the range of search is (-0.0001 - 0.0001) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
goign? #Resolved
@@ -1944,8 +1944,10 @@ public void BinaryClassifierFieldAwareFactorizationMachineTest() | |||
|
|||
// see https://github.com/dotnet/machinelearning/issues/404 | |||
// in Linux, the clang sqrt() results vary highly from the ones in mac and Windows. | |||
// goign for 3 digits of precision, because the range of search is (-0.0001 - 0.0001) | |||
// for one of the values, and the actual value is 0.00099999999999989 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
0.00099999999999989 is close enough to 0.001; it looks like the difference is smaller than 10^-7. Why do we need a larger tolerance? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it is expecting a number in the -0.0001 to 0.0001 range. Still looking into as the test artifacts are not saved.
In reply to: 224266472 [](ancestors = 224266472)
…as treating floats with scientific notation as strings; amending the regex to pick those up. Our custom Round method was rounding oen digit short than the digitsOfPrecision. Seems like Math.Round is not doing a bad job for the subset of tests i run. the delta calculated in the basetestbaseline, for some cases were passing the allowedVariance by a small fraction, outside of the digits we care to compare. Rounding that to truncate those digits before submitting it to the range comparison. Removing the digitsOfPrecision for a test drive on the CI, from some of the FFM tests, as it seems like they are doing ok without it for my local windows/linux runs.
@@ -521,26 +521,13 @@ private static void MatchNumberWithTolerance(MatchCollection firstCollection, Ma | |||
double f2 = double.Parse(secondCollection[i].ToString()); | |||
|
|||
double allowedVariance = Math.Pow(10, -digitsOfPrecision); | |||
double delta = Round(f1, digitsOfPrecision) - Round(f2, digitsOfPrecision); | |||
double delta = Math.Round(f1, digitsOfPrecision) - Math.Round(f2, digitsOfPrecision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should not be using Math.Round
, it does not correctly handle significant digits that land on the left side of the decimal point (the integer part of the number). It only deals with the fractional half. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re-introduced Round, thanks for the feedback. I moved to using it only on the difference, as rounding was not helping on a few cases.
In reply to: 224629061 [](ancestors = 224629061)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a good comment in the code - why we have a Round
method.
In reply to: 224671986 [](ancestors = 224671986,224629061)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…the comparison to fail for certain cases.
@@ -354,7 +354,7 @@ public string GetLoaderTransformSettings(TestDataset dataset) | |||
string[] extraSettings = null, string extraTag = "", bool summary = false, int digitsOfPrecision = DigitsOfPrecision) | |||
{ | |||
Contracts.Assert(IsActive); | |||
Run_TrainTest(predictor, dataset, extraSettings, extraTag, summary: summary, digitsOfPrecision: digitsOfPrecision); | |||
// Run_TrainTest(predictor, dataset, extraSettings, extraTag, summary: summary, digitsOfPrecision: digitsOfPrecision); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this a mistake? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} | ||
return all; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) you removed a space here. #Closed
This might resolve the fluctuations in precision for the MulticlassTreefeaturizedLR, and therefore not need to disable it like in #1185 . |
@ErcinDedeoglu - you mean the C# ‘decimal’ type? ML.NET doesn’t support that type. You would need to convert to a ‘float’, run it through ML.NET, and then convert it back to a decimal. |
Dear @eerhardt, What's your suggestion?, |
Yes, that is the downfall of using floating point numbers over a number that has a precise representation. In the machine learning world, usually the precision is not that important. I don't know of anything in the industry that uses a precise number representation. Everything I've seen uses floating point numbers. |
@eerhardt Stock market uses :) |
Resolves part of #404