Skip to content

Handle NaN optimization metric in AutoML #5031

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Apr 24, 2020
Merged

Conversation

najeeb-kazmi
Copy link
Member

@najeeb-kazmi najeeb-kazmi commented Apr 16, 2020

Fixes #4663
Fixes #5042

In AutoML, CrossValSummaryRunner is invoked if the dataset contains less than 15000 rows. It runs 10-fold cross validation on it, and then returns the model from the fold with the best optimization metric. It does this by looking for the index of the model with the best metric in the list of run results, and returning the element at this index.

If the metric in all 10 folds is NaN, then this index is -1, resulting in an IndexOutOfRangeException.

@najeeb-kazmi najeeb-kazmi requested a review from a team as a code owner April 16, 2020 00:32
@@ -70,6 +71,9 @@ internal class CrossValSummaryRunner<TMetrics> : IRunner<RunDetail<TMetrics>>

// Get the model from the best fold
var bestFoldIndex = BestResultUtil.GetIndexOfBestScore(trainResults.Select(r => r.score), _optimizingMetricInfo.IsMaximizing);
// bestFoldIndex will be -1 if the optimization metric for all folds is NaN.
// In this case, return model from the first fold.
bestFoldIndex = bestFoldIndex != -1 ? bestFoldIndex : 0;
Copy link
Contributor

@justinormont justinormont Apr 16, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also look into places where the metric gets compared.

For instance below in GetIndexClosestToAverage():

private static int GetIndexClosestToAverage(IEnumerable<double> values, double average)
{
int avgFoldIndex = -1;
var smallestDistFromAvg = double.PositiveInfinity;
for (var i = 0; i < values.Count(); i++)
{
var distFromAvg = Math.Abs(values.ElementAt(i) - average);
if (distFromAvg < smallestDistFromAvg || smallestDistFromAvg == double.PositiveInfinity)
{
smallestDistFromAvg = distFromAvg;
avgFoldIndex = i;
}
}
return avgFoldIndex;
}

While GetIndexClosestToAverage() could be adjusted to handle CV folds returning NaN, it would be better to remove the function and instead return a new instance of the metric class w/ the actual averages. The current function was created before the AutoML code had access to create a new instance of the metric classes, so it just returned the closest to the average of the folds.

I would also look at a bit further to where AutoML compares the return metrics from each model in the sweep, where it chooses which is the best. The comparison to NaN may also be there. #Resolved

Copy link
Member Author

@najeeb-kazmi najeeb-kazmi Apr 17, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the calculation of average to exclude any folds with NaN metrics. In finding the best run, there are not any comparisons to NaN, but an index of -1 is returned if all the runs have NaN metric. I have added a warning there checking for that case.

As for returning a Metrics object containing the averages of the metrics across the folds, I think this would be out of scope for this particular issue. Also, since we only have the type TMetrics here, creating the right metric would be fairly involved. #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Looks good.

In the future, we can use the MetricsStatistics class to do the averaging so that we're not duplicating functionality. It has the benefit of handling the PerClassLogLoss, though I suspect it's not properly handing the class ordering differences.

/// <summary>
/// The <see cref="RegressionMetricsStatistics"/> class holds summary
/// statistics over multiple observations of <see cref="RegressionMetrics"/>.
/// </summary>
public sealed class RegressionMetricsStatistics : IMetricsStatistics<RegressionMetrics>

// TODO:
// Figure out whether class label ordering can be different across different folds.
// If yes, whether it is possible to get the information from the objects available.
perClassLogLoss: null);
Copy link
Contributor

@justinormont justinormont Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The classes can be different in ordering and some can be missing. The missing arises from small datasets or skewed labels where all given classes may not be represented in the training dataset.

I'm not sure what our log-loss code does for labels which are not in the test dataset (perhaps 0.0 or NaN). #Resolved

Copy link
Contributor

@justinormont justinormont Apr 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to return the closest to average fold's perClassLogLoss? #Resolved

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Current method looks good for the moment.

For averaging the ConfusionMatrix in the future, it stores the class names within itself. The class names can be used to align the classes and allow for proper averaging.

/// <param name="labelNames">The predicted classes names, or the indexes of the classes, if the names are missing.</param>

It mentions the class index may also be stored, though I'm unsure how that could arise. Perhaps for binary classification there's no class names to store?


private static double GetAverageOfNonNaNScores(IEnumerable<double> results)
{
var newResults = results.Where(r => !double.IsNaN(r));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's your thoughts on metrics that can be +/-Infinity? Should they included in the average or not? Any trade-offs?

Behavior currently:

  • Input includes only double.PositiveInfinity xor double.NegativeInfinity, GetAverageOfNonNaNScores() would return average=+/-Inf
  • Input has both double.PositiveInfinity and double.NegativeInfinity, GetAverageOfNonNaNScores() would return average=NaN

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I've looked at the calculation of LogLoss, which is the main metric I would be concerned about being infinity. It looks to me that it can only be Double.PositiveInfinity if the aggregation over the all the rows in the evaluation set overflows Double.MaxValue.

Here's the code for binary classification:

return Double.IsNaN(_logLoss) ? Double.NaN : (_numLogLossPositives + _numLogLossNegatives > 0)
? _logLoss / (_numLogLossPositives + _numLogLossNegatives) : 0;

This is only Double.PositiveInfinity if _logLoss is, which will only be the case if it overflows here

because logloss cannot be Double.PositiveInfinity. prob will never be 0 if label is positive, and prob will never be 1 if label is negative.

Double logloss;
if (!Single.IsNaN(prob))
{
if (_label > 0)
{
// REVIEW: Should we bring back the option to use ln instead of log2?
logloss = -Math.Log(prob, 2);
}
else
logloss = -Math.Log(1.0 - prob, 2);
}
else
logloss = Double.NaN;
UnweightedCounters.Update(_score, prob, _label, logloss, 1);

This does mean that if prob is NaN for any row, then logloss will be NaN for that row, and _logLoss for the entire evaluation set will be NaN.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly for multiclass classification:

LogLoss will only be Double.PositiveInfinity if _totalLogLoss is

public double LogLoss { get { return _numInstances > 0 ? _totalLogLoss / _numInstances : 0; } }

and that will only be the case if it overflows here

because loglossCurr itself will never be Double.PositiveInfinity, as it can only be as large as -Math.Log(Epsilon), which is approximately 34.54

double logloss;
if (intLabel < _scoresArr.Length)
{
// REVIEW: This assumes that the predictions are probabilities, not just relative scores
// for the classes. Is this a correct assumption?
float p = Math.Min(1, Math.Max(Epsilon, _scoresArr[intLabel]));
logloss = -Math.Log(p);
}
else
{
// Penalize logloss if the label was not seen during training
logloss = -Math.Log(Epsilon);
_numUnknownClassInstances++;
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the principle of reporting NaN log loss if a single row has NaN probability and log loss. Because if there is a problem, the user needs to know via the metric. While it may be fine to suppress this for a handful of rows, this leads to questions on what to do when a lot or majority of rows have this problem. So, I think it is better to not suppress NaNs in the calculation of total log loss.

By the same principle, we should not be suppressing infinities in the metrics for all the folds. If one fold has an infinity, the average returned should be infinity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For log-loss, the Infinity occurs when the model is perfectly confident (p=0 or p=1) and wrong, for any of the predicted labels in the scoring dataset.

I created a dotnet fiddle to show this:
https://dotnetfiddle.net/DPIn5Y

It's demonstrating the code for binary log-loss:

Double logloss;
if (!Single.IsNaN(prob))
{
if (_label > 0)
{
// REVIEW: Should we bring back the option to use ln instead of log2?
logloss = -Math.Log(prob, 2);
}
else
logloss = -Math.Log(1.0 - prob, 2);
}
else
logloss = Double.NaN;
UnweightedCounters.Update(_score, prob, _label, logloss, 1);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that's the true label, not the predicted label.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed an issue to investigate if we should threshold the probability for log-loss in binary classification. This would cause the AutoML code (discussed above) to not receive a Infinity value.

Issue: #5055 Investigate thresholding binary log-loss

@najeeb-kazmi najeeb-kazmi requested a review from a team as a code owner April 23, 2020 03:46
Copy link
Contributor

@justinormont justinormont left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM; few items in comments which can be filed as future work in issues.

We'll also need a mlnet-core approver since we modified the multiclass and binary metrics class to allow the setting of the confusion matrix.

@najeeb-kazmi najeeb-kazmi merged commit db84060 into dotnet:master Apr 24, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Mar 18, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Return average metrics in AutoML CrossValSummaryRunner NaN metric value handling in AutoML
3 participants