You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Good morning. I am encountering some issues with the RPCA trainer and was hoping someone could help me out here. I'm not really sure what I'm doing wrong but I am not getting the results I would expect.
I've made a toy model to test out the ML.NET anomaly detection funtionality. I manufacture two random numbers for each data point, and call them gene one and gene two. They are constrained to lie in a particular range: gene one lies between .8 and .9, and gene two lies between .1 and .5.
Using a sample from this data (with the same seed each time), I apply an RPCA pipeline. then I call fit, then transform the training data.
I then make a gene entry with a ludicrous score (10000, 25000), and transform that to see where it lies. ML.Net claims this is not an anomaly.
PROBLEM
I expect to see an anomaly. I've tried this with less silly values, more silly values, with or without all the available kinds of normalisation, and reducing the rank of the PCA trainer.
LOGS
Here's the output of the program. It shows the score, whether data is an inlier, and the transform of the data points.
Could anyone point out what is going on here? I would not expect the score to be the value it is.
Do I need to give the trainer anomalous data? Why does the scorer think such distant values should be considered 'normal'? I have had trouble finding useful documentation/tutorials on this.
The text was updated successfully, but these errors were encountered:
Hi @LDWDev , it seems things have changed since you opened this issue, particularly since #4039 added samples, and other methods related to PCA Anomaly detection.
If I run your code as it is right now I get the following output, which is different from your original output (where you used to have all the input points labeled as anomalies):
Notice that now none of them were tagged as anomalies.
Still, if I change the rank you used (rank = 2) to a lower value (rank = 1) (as suggested here) I get the following output which is probably more what you expected to see:
System information
Windows 10, .NET Core 2.2 console app, VS2019
Issue
SETUP
Good morning. I am encountering some issues with the RPCA trainer and was hoping someone could help me out here. I'm not really sure what I'm doing wrong but I am not getting the results I would expect.
I've made a toy model to test out the ML.NET anomaly detection funtionality. I manufacture two random numbers for each data point, and call them gene one and gene two. They are constrained to lie in a particular range: gene one lies between .8 and .9, and gene two lies between .1 and .5.
Using a sample from this data (with the same seed each time), I apply an RPCA pipeline. then I call fit, then transform the training data.
I then make a gene entry with a ludicrous score (10000, 25000), and transform that to see where it lies. ML.Net claims this is not an anomaly.
PROBLEM
I expect to see an anomaly. I've tried this with less silly values, more silly values, with or without all the available kinds of normalisation, and reducing the rank of the PCA trainer.
LOGS
Here's the output of the program. It shows the score, whether data is an inlier, and the transform of the data points.
Here's the pipeline:
Results from transforming first 20 training data: Predicted, score, PCA co-ordinates
True, 0.005851699, 0.7284454, 0.6617603
True, 0.002783414, 1.028686, 0.7808771
True, 0.004348077, 0.9824907, 1.543225
True, 0.004398529, 1.021341, 0.510879
True, 0.003683135, 1.005801, 1.091905
False, NaN, 0.9997306, 1.01117
True, 0.003618588, 1.004708, 0.8854353
True, 0.004349507, 0.9971809, 1.588225
True, 0.004412429, 1.018427, 0.4791145
True, 0.003997049, 1.008593, 1.246756
True, 0.004217377, 1.00574, 1.322197
True, 0.004192073, 0.9794555, 1.412197
True, 0.004289094, 1.019702, 1.569695
True, 0.005468629, 1.021341, 1.083963
True, 0.003488006, 1.007865, 0.7861712
False, NaN, 1.025348, 0.9171998
True, 0.004403637, 1.020734, 1.508814
False, NaN, 0.9888039, 0.9224939
True, 0.004757768, 1.009807, 0.8113181
True, 0.004348857, 1.011143, 0.4394088
Results from transforming the "anomaly": Predicted, score, PCA co-ordinates
True, 0.006252703, 121407.6, 82720.04
I've put this repo on github here:
https://github.com/LDWDev/MLWoes/blob/master/MLtestapp/Program.cs
Could anyone point out what is going on here? I would not expect the score to be the value it is.
Do I need to give the trainer anomalous data? Why does the scorer think such distant values should be considered 'normal'? I have had trouble finding useful documentation/tutorials on this.
The text was updated successfully, but these errors were encountered: