-
Notifications
You must be signed in to change notification settings - Fork 17
Description
When making /weights call on anomalies tab we need to use method=anomaly-rate
like before.
Ideally, if possible we should have a feature flag on FE for anomalies tab that would let us try both old and new head to head.
So a feature flag like anomalies-sorting=volume
for the new approach and anomalies-sorting=anomaly-rate
for the old.
But if this is too complex we can get to it later - main thing that i think needs to be done is the change to the /weights call used in the sorting.
I have been testing both approaches side by side, and while there often is a general correlation, at times it just fails too badly and there is very little correlation where it matters. For example in the image below you see the rank order from both approaches is just much too different in the little red box (where it matters - e.g. what will be near the top of the results):
So we need to change current call from something like this:
curl 'https://app.netdata.cloud/api/v3/spaces/31a2fba1-8ee2-4fa0-9fed-bb437e3fb75c/rooms/8ff5f814-5739-45bb-8da9-7fbd41b18cb0/weights' \
{
"selectors":{
"nodes":[
"*"
],
"contexts":[
"*"
],
"dimensions":[
"*"
],
"labels":[
"*"
],
"alerts":[
"*"
]
},
"aggregations":{
"time":{
"time_group":"average",
"time_group_options":"",
"time_resampling":0
},
"metrics":[
{
"group_by":[
],
"aggregation":"avg"
}
]
},
"window":{
"after":1684929127,
"before":1684929578,
"baseline":{
"after":1684926154,
"before":1684930894
}
},
"scope":{
"nodes":[
],
"contexts":[
"*"
]
},
"method":"volume",
"options":[
"null2zero",
"anomaly-bit",
"raw"
],
"timeout":30000
}
to this:
curl 'https://app.netdata.cloud/api/v3/spaces/31a2fba1-8ee2-4fa0-9fed-bb437e3fb75c/rooms/8ff5f814-5739-45bb-8da9-7fbd41b18cb0/weights' \
{
"selectors":{
"nodes":[
"*"
],
"contexts":[
"*"
],
"dimensions":[
"*"
],
"labels":[
"*"
],
"alerts":[
"*"
]
},
"aggregations":{
"time":{
"time_group":"average",
"time_group_options":"",
"time_resampling":0
},
"metrics":[
{
"group_by":[
],
"aggregation":"avg"
}
]
},
"window":{
"after":1684929127,
"before":1684929578,
},
"scope":{
"nodes":[
],
"contexts":[
"*"
]
},
"method":"anomaly-rate",
"options":[
"null2zero",
"raw"
],
"timeout":30000
}
so:
- remove
baseline
as not relevant when usingmethod=anomaly-rate
. - change
method
fromvolume
toanomaly-rate
. remove the(this is actually needed by the agent now).anomaly-bit
from theoptions
array since not needed/sensical when usingmethod=anomaly-rate
- we may also be able to drop the
null2zero
and addnonzero
as i believe the BE treats nulls and missing data as 0 when method=anomaly-rate. But we would need to double check this with @juacker so maybe for now we just leave thenull2zero
to just be explicit in passing the 0's and maybe optimize later.
NOTE: If we just make 1 /weights call with method-anomaly-rate this will result in some of the "bad models" getting to the top of the list (since they will always have 100% etc) so ideally we need to use the same approach as before to make 2 weights calls, compare the results and then filter out the obvious rubbish at the top. But i think we can tackle this later after we make the change. It will be better to run the risk of having some false positives at the top of the results (which we can deal with later) and still have the better order based on the anomaly rate. @ktsaou we could try do some of the filtering of rubbish in the /weights call itself but we can tackle this after. Idea here being that the agent itself could look at AR for window, compare it to baseline AR, and if they are both similar then just filter from whats returned by the agent - so this would be some new option for /weights to filter obviously based looking data when method=anomaly-rate
.