PredictedLabel is always true for Anomaly Detection

### System information

- **OS version/distro**: macOS & Windows
- **.NET Version (eg., dotnet --info)**:  .Net Core

### Issue: PredictedLabel is always true for Anomaly Detection

In my experience, and as demonstrated by [this sample](https://github.com/dotnet/machinelearning-samples/tree/AnomalyDetection_FraudDetection/samples/csharp/getting-started/AnomalyDetection_CreditCardFraudDetection), predictions from models trained with the `RandomizedPcaTrainer` always set the value for `PredictedLabel` to `true`.

_Note: I’m very new to machine learning, I am not a data scientist, nor am I very familiar with this code base, but I’ve taken a crack at figuring out why..._

The `BinaryClassifierScorer` is [used for scoring anomaly detection models](https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs#L295-L297), specifically those trained using the `RandomizedPcaTrainer`. Which I _think_ makes sense, as with binary classification the `PredictedLabel` in anomaly detection will be one of two values, `true` or `false`.  

However, when using binary classification, `PredictiveLabel` is set to `true` if the prediction's `Score` is a positive value and set to `false` if the `Score` is negative. This is one place it seems to break down with anomaly detection, as the `Score` is going to be a value between one and zero.  So, the current implementation of `BinaryClassifierScorer` is going to return a value of true for any prediction that does not have a `Score` of zero or NAN.

Additionally, it’s my understanding that in anomaly detection it is up to the user to set the threshold of the model that indicates whether a `Score` is considered an anomaly or a normal value.  (Or at least this is the case for supervised training).  From what I can tell, the implementation of `BinaryClassifierScorer` used by anomaly detection, does have a `Threshold` property which [it compares the `Score` value to, to get the value for `PredictedLabel`](https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Scorers/BinaryClassifierScorer.cs#L259-L263).  It would seem the `BinaryClassifierScorer` could be used for anomaly detection if the user was able to manually set a value for `Threshold`, or if the scorer could intelligently set the value based on the distribution of `Score`s.  However, the [`Threshold` property is by default set to zero](https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/Scorers/PredictionTransformer.cs#L270), with no public way of changing its value.

Thus, based on my understanding, the Scorer compares the prediction’s `Score` to zero, and the value for `PredictedLabel` will always be set to `true`, with the exception of the edge case where score is zero or NAN.

During my research, I did find that `BinaryClassificationCatalog` has a method [`ChangeModelThreshold`](https://github.com/dotnet/machinelearning/blob/master/src/Microsoft.ML.Data/TrainCatalog.cs#L261-L267) to manually override the value of the scorer’s `Threshold` property.  Unfortunately, this functionality is is not exposed on the `AnomalyDetectionCatalog`, so can’t be used with anomaly detection.

---

Finally, and this may need to be moved to a separate issue, but I've found contradictory information on how to interpret the `Score` value of an anomaly detection prediction.  For example [this sample](https://github.com/dotnet/machinelearning/blob/master/docs/samples/Microsoft.ML.Samples/Dynamic/Trainers/AnomalyDetection/RandomizedPcaSample.cs#L92) indicates that outliers (or anomalies) will have a **smaller** value for `Score` than will normal values.  However, [this documentation](https://docs.microsoft.com/en-us/dotnet/api/microsoft.ml.trainers.randomizedpcatrainer?view=ml-dotnet#training-algorithm-details) states _"If the error is close to 0, the instance is considered normal (non-anomaly)."_  This matches the results I'm getting from [my sample](https://github.com/dotnet/machinelearning-samples/tree/AnomalyDetection_FraudDetection/samples/csharp/getting-started/AnomalyDetection_CreditCardFraudDetection), where anomalies have a **higher** value for `Score` than normal values.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

PredictedLabel is always true for Anomaly Detection #3990

System information

Issue: PredictedLabel is always true for Anomaly Detection

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

PredictedLabel is always true for Anomaly Detection #3990

Description

System information

Issue: PredictedLabel is always true for Anomaly Detection

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions