Handle NaN optimization metric in AutoML #5031

najeeb-kazmi · 2020-04-16T00:32:46Z

In AutoML, CrossValSummaryRunner is invoked if the dataset contains less than 15000 rows. It runs 10-fold cross validation on it, and then returns the model from the fold with the best optimization metric. It does this by looking for the index of the model with the best metric in the list of run results, and returning the element at this index.

If the metric in all 10 folds is NaN, then this index is -1, resulting in an IndexOutOfRangeException.

…Runner

justinormont · 2020-04-16T01:48:44Z

src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs

@@ -70,6 +71,9 @@ internal class CrossValSummaryRunner<TMetrics> : IRunner<RunDetail<TMetrics>>

            // Get the model from the best fold
            var bestFoldIndex = BestResultUtil.GetIndexOfBestScore(trainResults.Select(r => r.score), _optimizingMetricInfo.IsMaximizing);
+            // bestFoldIndex will be -1 if the optimization metric for all folds is NaN.
+            // In this case, return model from the first fold.
+            bestFoldIndex = bestFoldIndex != -1 ? bestFoldIndex : 0;


I'd also look into places where the metric gets compared.

For instance below in GetIndexClosestToAverage():

machinelearning/src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs

Lines 86 to 100 in 9726b41

private static int GetIndexClosestToAverage(IEnumerable<double> values, double average)

{

int avgFoldIndex = -1;

var smallestDistFromAvg = double.PositiveInfinity;

for (var i = 0; i < values.Count(); i++)

{

var distFromAvg = Math.Abs(values.ElementAt(i) - average);

if (distFromAvg < smallestDistFromAvg || smallestDistFromAvg == double.PositiveInfinity)

{

smallestDistFromAvg = distFromAvg;

avgFoldIndex = i;

}

}

return avgFoldIndex;

}

While GetIndexClosestToAverage() could be adjusted to handle CV folds returning NaN, it would be better to remove the function and instead return a new instance of the metric class w/ the actual averages. The current function was created before the AutoML code had access to create a new instance of the metric classes, so it just returned the closest to the average of the folds.

I would also look at a bit further to where AutoML compares the return metrics from each model in the sweep, where it chooses which is the best. The comparison to NaN may also be there. #Resolved

I have changed the calculation of average to exclude any folds with NaN metrics. In finding the best run, there are not any comparisons to NaN, but an index of -1 is returned if all the runs have NaN metric. I have added a warning there checking for that case.

As for returning a Metrics object containing the averages of the metrics across the folds, I think this would be out of scope for this particular issue. Also, since we only have the type TMetrics here, creating the right metric would be fairly involved. #Resolved

Thanks. Looks good.

In the future, we can use the MetricsStatistics class to do the averaging so that we're not duplicating functionality. It has the benefit of handling the PerClassLogLoss, though I suspect it's not properly handing the class ordering differences.

machinelearning/src/Microsoft.ML.Transforms/MetricStatistics.cs

Lines 84 to 88 in d849ba4

/// <summary>

/// The <see cref="RegressionMetricsStatistics"/> class holds summary

/// statistics over multiple observations of <see cref="RegressionMetrics"/>.

/// </summary>

public sealed class RegressionMetricsStatistics : IMetricsStatistics<RegressionMetrics>

…ng metric closest to average

justinormont · 2020-04-22T17:15:52Z

src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs

+                    // TODO:
+                    //   Figure out whether class label ordering can be different across different folds.
+                    //   If yes, whether it is possible to get the information from the objects available.
+                    perClassLogLoss: null);


The classes can be different in ordering and some can be missing. The missing arises from small datasets or skewed labels where all given classes may not be represented in the training dataset.

I'm not sure what our log-loss code does for labels which are not in the test dataset (perhaps 0.0 or NaN). #Resolved

Would it be possible to return the closest to average fold's perClassLogLoss? #Resolved

Thanks. Current method looks good for the moment.

For averaging the ConfusionMatrix in the future, it stores the class names within itself. The class names can be used to align the classes and allow for proper averaging.

machinelearning/src/Microsoft.ML.Data/Evaluators/Metrics/ConfusionMatrix.cs

Line 62 in 770327a

/// <param name="labelNames">The predicted classes names, or the indexes of the classes, if the names are missing.</param>

It mentions the class index may also be stored, though I'm unsure how that could arise. Perhaps for binary classification there's no class names to store?

src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs

justinormont · 2020-04-22T19:17:07Z

src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs

+
+        private static double GetAverageOfNonNaNScores(IEnumerable<double> results)
+        {
+            var newResults = results.Where(r => !double.IsNaN(r));


What's your thoughts on metrics that can be +/-Infinity? Should they included in the average or not? Any trade-offs?

Behavior currently:

Input includes only double.PositiveInfinity xor double.NegativeInfinity, GetAverageOfNonNaNScores() would return average=+/-Inf

Input has both double.PositiveInfinity and double.NegativeInfinity, GetAverageOfNonNaNScores() would return average=NaN

So I've looked at the calculation of LogLoss, which is the main metric I would be concerned about being infinity. It looks to me that it can only be Double.PositiveInfinity if the aggregation over the all the rows in the evaluation set overflows Double.MaxValue.

Here's the code for binary classification:

machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs

Lines 465 to 466 in 062be28

return Double.IsNaN(_logLoss) ? Double.NaN : (_numLogLossPositives + _numLogLossNegatives > 0)

? _logLoss / (_numLogLossPositives + _numLogLossNegatives) : 0;

This is only Double.PositiveInfinity if _logLoss is, which will only be the case if it overflows here

machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs

Line 527 in 062be28

_logLoss += logloss * weight;

because logloss cannot be Double.PositiveInfinity. prob will never be 0 if label is positive, and prob will never be 1 if label is negative.

machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs

Lines 663 to 677 in 062be28

Double logloss;

if (!Single.IsNaN(prob))

{

if (_label > 0)

{

// REVIEW: Should we bring back the option to use ln instead of log2?

logloss = -Math.Log(prob, 2);

}

else

logloss = -Math.Log(1.0 - prob, 2);

}

else

logloss = Double.NaN;

UnweightedCounters.Update(_score, prob, _label, logloss, 1);

This does mean that if prob is NaN for any row, then logloss will be NaN for that row, and _logLoss for the entire evaluation set will be NaN.

Similarly for multiclass classification:

LogLoss will only be Double.PositiveInfinity if _totalLogLoss is

machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs

Line 276 in 062be28

public double LogLoss { get { return _numInstances > 0 ? _totalLogLoss / _numInstances : 0; } }

and that will only be the case if it overflows here

machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs

Line 332 in 062be28

_totalLogLoss += loglossCurr * weight;

because loglossCurr itself will never be Double.PositiveInfinity, as it can only be as large as -Math.Log(Epsilon), which is approximately 34.54

machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs

Lines 451 to 464 in 062be28

double logloss;

if (intLabel < _scoresArr.Length)

{

// REVIEW: This assumes that the predictions are probabilities, not just relative scores

// for the classes. Is this a correct assumption?

float p = Math.Min(1, Math.Max(Epsilon, _scoresArr[intLabel]));

logloss = -Math.Log(p);

}

else

{

// Penalize logloss if the label was not seen during training

logloss = -Math.Log(Epsilon);

_numUnknownClassInstances++;

}

I agree with the principle of reporting NaN log loss if a single row has NaN probability and log loss. Because if there is a problem, the user needs to know via the metric. While it may be fine to suppress this for a handful of rows, this leads to questions on what to do when a lot or majority of rows have this problem. So, I think it is better to not suppress NaNs in the calculation of total log loss.

By the same principle, we should not be suppressing infinities in the metrics for all the folds. If one fold has an infinity, the average returned should be infinity.

For log-loss, the Infinity occurs when the model is perfectly confident (p=0 or p=1) and wrong, for any of the predicted labels in the scoring dataset.

I created a dotnet fiddle to show this:
https://dotnetfiddle.net/DPIn5Y

It's demonstrating the code for binary log-loss:

machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs

Lines 663 to 677 in 062be28

Double logloss;

if (!Single.IsNaN(prob))

{

if (_label > 0)

{

// REVIEW: Should we bring back the option to use ln instead of log2?

logloss = -Math.Log(prob, 2);

}

else

logloss = -Math.Log(1.0 - prob, 2);

}

else

logloss = Double.NaN;

UnweightedCounters.Update(_score, prob, _label, logloss, 1);

Ah yes, that's the true label, not the predicted label.

I filed an issue to investigate if we should threshold the probability for log-loss in binary classification. This would cause the AutoML code (discussed above) to not receive a Infinity value.

Issue: #5055 Investigate thresholding binary log-loss

…age score

justinormont

LGTM; few items in comments which can be filed as future work in issues.

We'll also need a mlnet-core approver since we modified the multiclass and binary metrics class to allow the setting of the confusion matrix.

Handle all folds returning NaN optimization metric in CrossValSummary…

8e2bbda

…Runner

najeeb-kazmi requested a review from justinormont April 16, 2020 00:32

najeeb-kazmi requested a review from a team as a code owner April 16, 2020 00:32

justinormont reviewed Apr 16, 2020

View reviewed changes

najeeb-kazmi added 8 commits April 16, 2020 16:32

Handle NaN in calculation of average scores and index of closest fold

307d51d

Handle all metrics being NaN in finding Best Run

1a5a852

nit

2ec8cb7

nit

dd38e59

nit

0782e58

Handle all NaNs in best model selection

d800b53

Return average metrics instead of metrics form the fold with optimizi…

a811282

…ng metric closest to average

nit

9d937cd

najeeb-kazmi mentioned this pull request Apr 20, 2020

Return average metrics in AutoML CrossValSummaryRunner #5042

Closed

justinormont reviewed Apr 22, 2020

View reviewed changes

src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs Outdated Show resolved Hide resolved

justinormont reviewed Apr 22, 2020

View reviewed changes

Add PerClassLogLoss and ConfusionMatrix from the fold closest to aver…

401cb4a

…age score

najeeb-kazmi requested a review from a team as a code owner April 23, 2020 03:46

feedback

7141539

justinormont mentioned this pull request Apr 23, 2020

Investigate thresholding binary log-loss #5055

Open

justinormont approved these changes Apr 24, 2020

View reviewed changes

harishsk approved these changes Apr 24, 2020

View reviewed changes

najeeb-kazmi merged commit db84060 into dotnet:master Apr 24, 2020

ghost locked as resolved and limited conversation to collaborators Mar 18, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handle NaN optimization metric in AutoML #5031

Handle NaN optimization metric in AutoML #5031

Uh oh!

najeeb-kazmi commented Apr 16, 2020 •

edited

Loading

Uh oh!

justinormont Apr 16, 2020 •

edited by najeeb-kazmi

Loading

Uh oh!

najeeb-kazmi Apr 17, 2020 •

edited

Loading

Uh oh!

justinormont Apr 23, 2020

Uh oh!

justinormont Apr 22, 2020 •

edited by najeeb-kazmi

Loading

Uh oh!

justinormont Apr 22, 2020 •

edited by najeeb-kazmi

Loading

Uh oh!

justinormont Apr 23, 2020

Uh oh!

Uh oh!

justinormont Apr 22, 2020

Uh oh!

najeeb-kazmi Apr 23, 2020

Uh oh!

najeeb-kazmi Apr 23, 2020

Uh oh!

najeeb-kazmi Apr 23, 2020

Uh oh!

justinormont Apr 23, 2020

Uh oh!

najeeb-kazmi Apr 23, 2020

Uh oh!

justinormont Apr 23, 2020

Uh oh!

justinormont left a comment •

edited

Loading

Uh oh!

Uh oh!

	private static int GetIndexClosestToAverage(IEnumerable<double> values, double average)
	{
	int avgFoldIndex = -1;
	var smallestDistFromAvg = double.PositiveInfinity;
	for (var i = 0; i < values.Count(); i++)
	{
	var distFromAvg = Math.Abs(values.ElementAt(i) - average);
	if (distFromAvg < smallestDistFromAvg \|\| smallestDistFromAvg == double.PositiveInfinity)
	{
	smallestDistFromAvg = distFromAvg;
	avgFoldIndex = i;
	}
	}
	return avgFoldIndex;
	}

	/// <summary>
	/// The <see cref="RegressionMetricsStatistics"/> class holds summary
	/// statistics over multiple observations of <see cref="RegressionMetrics"/>.
	/// </summary>
	public sealed class RegressionMetricsStatistics : IMetricsStatistics<RegressionMetrics>

	return Double.IsNaN(_logLoss) ? Double.NaN : (_numLogLossPositives + _numLogLossNegatives > 0)
	? _logLoss / (_numLogLossPositives + _numLogLossNegatives) : 0;

	Double logloss;
	if (!Single.IsNaN(prob))
	{
	if (_label > 0)
	{
	// REVIEW: Should we bring back the option to use ln instead of log2?
	logloss = -Math.Log(prob, 2);
	}
	else
	logloss = -Math.Log(1.0 - prob, 2);
	}
	else
	logloss = Double.NaN;

	UnweightedCounters.Update(_score, prob, _label, logloss, 1);

	double logloss;
	if (intLabel < _scoresArr.Length)
	{
	// REVIEW: This assumes that the predictions are probabilities, not just relative scores
	// for the classes. Is this a correct assumption?
	float p = Math.Min(1, Math.Max(Epsilon, _scoresArr[intLabel]));
	logloss = -Math.Log(p);
	}
	else
	{
	// Penalize logloss if the label was not seen during training
	logloss = -Math.Log(Epsilon);
	_numUnknownClassInstances++;
	}

Handle NaN optimization metric in AutoML #5031

Handle NaN optimization metric in AutoML #5031

Uh oh!

Conversation

najeeb-kazmi commented Apr 16, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

justinormont Apr 16, 2020 • edited by najeeb-kazmi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

najeeb-kazmi Apr 17, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinormont Apr 22, 2020 • edited by najeeb-kazmi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinormont Apr 22, 2020 • edited by najeeb-kazmi Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

justinormont left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

najeeb-kazmi commented Apr 16, 2020 •

edited

Loading

justinormont Apr 16, 2020 •

edited by najeeb-kazmi

Loading

najeeb-kazmi Apr 17, 2020 •

edited

Loading

justinormont Apr 22, 2020 •

edited by najeeb-kazmi

Loading

justinormont Apr 22, 2020 •

edited by najeeb-kazmi

Loading

justinormont left a comment •

edited

Loading