Skip to content

PredictedLabel is always true for Anomaly Detection #3990

Closed
@colbylwilliams

Description

@colbylwilliams

System information

  • OS version/distro: macOS & Windows
  • .NET Version (eg., dotnet --info): .Net Core

Issue: PredictedLabel is always true for Anomaly Detection

In my experience, and as demonstrated by this sample, predictions from models trained with the RandomizedPcaTrainer always set the value for PredictedLabel to true.

Note: I’m very new to machine learning, I am not a data scientist, nor am I very familiar with this code base, but I’ve taken a crack at figuring out why...

The BinaryClassifierScorer is used for scoring anomaly detection models, specifically those trained using the RandomizedPcaTrainer. Which I think makes sense, as with binary classification the PredictedLabel in anomaly detection will be one of two values, true or false.

However, when using binary classification, PredictiveLabel is set to true if the prediction's Score is a positive value and set to false if the Score is negative. This is one place it seems to break down with anomaly detection, as the Score is going to be a value between one and zero. So, the current implementation of BinaryClassifierScorer is going to return a value of true for any prediction that does not have a Score of zero or NAN.

Additionally, it’s my understanding that in anomaly detection it is up to the user to set the threshold of the model that indicates whether a Score is considered an anomaly or a normal value. (Or at least this is the case for supervised training). From what I can tell, the implementation of BinaryClassifierScorer used by anomaly detection, does have a Threshold property which it compares the Score value to, to get the value for PredictedLabel. It would seem the BinaryClassifierScorer could be used for anomaly detection if the user was able to manually set a value for Threshold, or if the scorer could intelligently set the value based on the distribution of Scores. However, the Threshold property is by default set to zero, with no public way of changing its value.

Thus, based on my understanding, the Scorer compares the prediction’s Score to zero, and the value for PredictedLabel will always be set to true, with the exception of the edge case where score is zero or NAN.

During my research, I did find that BinaryClassificationCatalog has a method ChangeModelThreshold to manually override the value of the scorer’s Threshold property. Unfortunately, this functionality is is not exposed on the AnomalyDetectionCatalog, so can’t be used with anomaly detection.


Finally, and this may need to be moved to a separate issue, but I've found contradictory information on how to interpret the Score value of an anomaly detection prediction. For example this sample indicates that outliers (or anomalies) will have a smaller value for Score than will normal values. However, this documentation states "If the error is close to 0, the instance is considered normal (non-anomaly)." This matches the results I'm getting from my sample, where anomalies have a higher value for Score than normal values.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions