Description
System information
- OS version/distro: macOS & Windows
- .NET Version (eg., dotnet --info): .Net Core
Issue: PredictedLabel is always true for Anomaly Detection
In my experience, and as demonstrated by this sample, predictions from models trained with the RandomizedPcaTrainer
always set the value for PredictedLabel
to true
.
Note: I’m very new to machine learning, I am not a data scientist, nor am I very familiar with this code base, but I’ve taken a crack at figuring out why...
The BinaryClassifierScorer
is used for scoring anomaly detection models, specifically those trained using the RandomizedPcaTrainer
. Which I think makes sense, as with binary classification the PredictedLabel
in anomaly detection will be one of two values, true
or false
.
However, when using binary classification, PredictiveLabel
is set to true
if the prediction's Score
is a positive value and set to false
if the Score
is negative. This is one place it seems to break down with anomaly detection, as the Score
is going to be a value between one and zero. So, the current implementation of BinaryClassifierScorer
is going to return a value of true for any prediction that does not have a Score
of zero or NAN.
Additionally, it’s my understanding that in anomaly detection it is up to the user to set the threshold of the model that indicates whether a Score
is considered an anomaly or a normal value. (Or at least this is the case for supervised training). From what I can tell, the implementation of BinaryClassifierScorer
used by anomaly detection, does have a Threshold
property which it compares the Score
value to, to get the value for PredictedLabel
. It would seem the BinaryClassifierScorer
could be used for anomaly detection if the user was able to manually set a value for Threshold
, or if the scorer could intelligently set the value based on the distribution of Score
s. However, the Threshold
property is by default set to zero, with no public way of changing its value.
Thus, based on my understanding, the Scorer compares the prediction’s Score
to zero, and the value for PredictedLabel
will always be set to true
, with the exception of the edge case where score is zero or NAN.
During my research, I did find that BinaryClassificationCatalog
has a method ChangeModelThreshold
to manually override the value of the scorer’s Threshold
property. Unfortunately, this functionality is is not exposed on the AnomalyDetectionCatalog
, so can’t be used with anomaly detection.
Finally, and this may need to be moved to a separate issue, but I've found contradictory information on how to interpret the Score
value of an anomaly detection prediction. For example this sample indicates that outliers (or anomalies) will have a smaller value for Score
than will normal values. However, this documentation states "If the error is close to 0, the instance is considered normal (non-anomaly)." This matches the results I'm getting from my sample, where anomalies have a higher value for Score
than normal values.