-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Handle NaN optimization metric in AutoML #5031
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -70,6 +71,9 @@ internal class CrossValSummaryRunner<TMetrics> : IRunner<RunDetail<TMetrics>> | |||
|
|||
// Get the model from the best fold | |||
var bestFoldIndex = BestResultUtil.GetIndexOfBestScore(trainResults.Select(r => r.score), _optimizingMetricInfo.IsMaximizing); | |||
// bestFoldIndex will be -1 if the optimization metric for all folds is NaN. | |||
// In this case, return model from the first fold. | |||
bestFoldIndex = bestFoldIndex != -1 ? bestFoldIndex : 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also look into places where the metric gets compared.
For instance below in GetIndexClosestToAverage()
:
machinelearning/src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs
Lines 86 to 100 in 9726b41
private static int GetIndexClosestToAverage(IEnumerable<double> values, double average) | |
{ | |
int avgFoldIndex = -1; | |
var smallestDistFromAvg = double.PositiveInfinity; | |
for (var i = 0; i < values.Count(); i++) | |
{ | |
var distFromAvg = Math.Abs(values.ElementAt(i) - average); | |
if (distFromAvg < smallestDistFromAvg || smallestDistFromAvg == double.PositiveInfinity) | |
{ | |
smallestDistFromAvg = distFromAvg; | |
avgFoldIndex = i; | |
} | |
} | |
return avgFoldIndex; | |
} |
While GetIndexClosestToAverage()
could be adjusted to handle CV folds returning NaN
, it would be better to remove the function and instead return a new instance of the metric class w/ the actual averages. The current function was created before the AutoML code had access to create a new instance of the metric classes, so it just returned the closest to the average of the folds.
I would also look at a bit further to where AutoML compares the return metrics from each model in the sweep, where it chooses which is the best. The comparison to NaN
may also be there. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the calculation of average to exclude any folds with NaN metrics. In finding the best run, there are not any comparisons to NaN, but an index of -1 is returned if all the runs have NaN metric. I have added a warning there checking for that case.
As for returning a Metrics object containing the averages of the metrics across the folds, I think this would be out of scope for this particular issue. Also, since we only have the type TMetrics
here, creating the right metric would be fairly involved. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Looks good.
In the future, we can use the MetricsStatistics
class to do the averaging so that we're not duplicating functionality. It has the benefit of handling the PerClassLogLoss
, though I suspect it's not properly handing the class ordering differences.
machinelearning/src/Microsoft.ML.Transforms/MetricStatistics.cs
Lines 84 to 88 in d849ba4
/// <summary> | |
/// The <see cref="RegressionMetricsStatistics"/> class holds summary | |
/// statistics over multiple observations of <see cref="RegressionMetrics"/>. | |
/// </summary> | |
public sealed class RegressionMetricsStatistics : IMetricsStatistics<RegressionMetrics> |
…ng metric closest to average
// TODO: | ||
// Figure out whether class label ordering can be different across different folds. | ||
// If yes, whether it is possible to get the information from the objects available. | ||
perClassLogLoss: null); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The classes can be different in ordering and some can be missing. The missing arises from small datasets or skewed labels where all given classes may not be represented in the training dataset.
I'm not sure what our log-loss code does for labels which are not in the test dataset (perhaps 0.0
or NaN
). #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be possible to return the closest to average fold's perClassLogLoss
? #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Current method looks good for the moment.
For averaging the ConfusionMatrix
in the future, it stores the class names within itself. The class names can be used to align the classes and allow for proper averaging.
/// <param name="labelNames">The predicted classes names, or the indexes of the classes, if the names are missing.</param> |
It mentions the class index may also be stored, though I'm unsure how that could arise. Perhaps for binary classification there's no class names to store?
src/Microsoft.ML.AutoML/Experiment/Runners/CrossValSummaryRunner.cs
Outdated
Show resolved
Hide resolved
|
||
private static double GetAverageOfNonNaNScores(IEnumerable<double> results) | ||
{ | ||
var newResults = results.Where(r => !double.IsNaN(r)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's your thoughts on metrics that can be +/-Infinity? Should they included in the average or not? Any trade-offs?
Behavior currently:
- Input includes only
double.PositiveInfinity
xordouble.NegativeInfinity
,GetAverageOfNonNaNScores()
would return average=+/-Inf - Input has both
double.PositiveInfinity
anddouble.NegativeInfinity
,GetAverageOfNonNaNScores()
would return average=NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I've looked at the calculation of LogLoss
, which is the main metric I would be concerned about being infinity. It looks to me that it can only be Double.PositiveInfinity
if the aggregation over the all the rows in the evaluation set overflows Double.MaxValue
.
Here's the code for binary classification:
machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs
Lines 465 to 466 in 062be28
return Double.IsNaN(_logLoss) ? Double.NaN : (_numLogLossPositives + _numLogLossNegatives > 0) | |
? _logLoss / (_numLogLossPositives + _numLogLossNegatives) : 0; |
This is only Double.PositiveInfinity
if _logLoss
is, which will only be the case if it overflows here
_logLoss += logloss * weight; |
because logloss
cannot be Double.PositiveInfinity
. prob
will never be 0 if label is positive, and prob
will never be 1 if label is negative.
machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs
Lines 663 to 677 in 062be28
Double logloss; | |
if (!Single.IsNaN(prob)) | |
{ | |
if (_label > 0) | |
{ | |
// REVIEW: Should we bring back the option to use ln instead of log2? | |
logloss = -Math.Log(prob, 2); | |
} | |
else | |
logloss = -Math.Log(1.0 - prob, 2); | |
} | |
else | |
logloss = Double.NaN; | |
UnweightedCounters.Update(_score, prob, _label, logloss, 1); |
This does mean that if prob
is NaN
for any row, then logloss
will be NaN
for that row, and _logLoss
for the entire evaluation set will be NaN
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similarly for multiclass classification:
LogLoss
will only be Double.PositiveInfinity
if _totalLogLoss
is
machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs
Line 276 in 062be28
public double LogLoss { get { return _numInstances > 0 ? _totalLogLoss / _numInstances : 0; } } |
and that will only be the case if it overflows here
machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs
Line 332 in 062be28
_totalLogLoss += loglossCurr * weight; |
because loglossCurr
itself will never be Double.PositiveInfinity
, as it can only be as large as -Math.Log(Epsilon)
, which is approximately 34.54
machinelearning/src/Microsoft.ML.Data/Evaluators/MulticlassClassificationEvaluator.cs
Lines 451 to 464 in 062be28
double logloss; | |
if (intLabel < _scoresArr.Length) | |
{ | |
// REVIEW: This assumes that the predictions are probabilities, not just relative scores | |
// for the classes. Is this a correct assumption? | |
float p = Math.Min(1, Math.Max(Epsilon, _scoresArr[intLabel])); | |
logloss = -Math.Log(p); | |
} | |
else | |
{ | |
// Penalize logloss if the label was not seen during training | |
logloss = -Math.Log(Epsilon); | |
_numUnknownClassInstances++; | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with the principle of reporting NaN
log loss if a single row has NaN
probability and log loss. Because if there is a problem, the user needs to know via the metric. While it may be fine to suppress this for a handful of rows, this leads to questions on what to do when a lot or majority of rows have this problem. So, I think it is better to not suppress NaN
s in the calculation of total log loss.
By the same principle, we should not be suppressing infinities in the metrics for all the folds. If one fold has an infinity, the average returned should be infinity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For log-loss, the Infinity
occurs when the model is perfectly confident (p=0 or p=1) and wrong, for any of the predicted labels in the scoring dataset.
I created a dotnet fiddle to show this:
https://dotnetfiddle.net/DPIn5Y
It's demonstrating the code for binary log-loss:
machinelearning/src/Microsoft.ML.Data/Evaluators/BinaryClassifierEvaluator.cs
Lines 663 to 677 in 062be28
Double logloss; | |
if (!Single.IsNaN(prob)) | |
{ | |
if (_label > 0) | |
{ | |
// REVIEW: Should we bring back the option to use ln instead of log2? | |
logloss = -Math.Log(prob, 2); | |
} | |
else | |
logloss = -Math.Log(1.0 - prob, 2); | |
} | |
else | |
logloss = Double.NaN; | |
UnweightedCounters.Update(_score, prob, _label, logloss, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yes, that's the true label, not the predicted label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I filed an issue to investigate if we should threshold the probability for log-loss in binary classification. This would cause the AutoML code (discussed above) to not receive a Infinity
value.
Issue: #5055 Investigate thresholding binary log-loss
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM; few items in comments which can be filed as future work in issues.
We'll also need a mlnet-core approver since we modified the multiclass and binary metrics class to allow the setting of the confusion matrix.
Fixes #4663
Fixes #5042
In AutoML,
CrossValSummaryRunner
is invoked if the dataset contains less than 15000 rows. It runs 10-fold cross validation on it, and then returns the model from the fold with the best optimization metric. It does this by looking for the index of the model with the best metric in the list of run results, and returning the element at this index.If the metric in all 10 folds is
NaN
, then this index is-1
, resulting in anIndexOutOfRangeException
.