Skip to content

[Feat]: revert the ordering on the anomalies tab to use method=anomaly-rate #830

@andrewm4894

Description

@andrewm4894

When making /weights call on anomalies tab we need to use method=anomaly-rate like before.

Ideally, if possible we should have a feature flag on FE for anomalies tab that would let us try both old and new head to head.

So a feature flag like anomalies-sorting=volume for the new approach and anomalies-sorting=anomaly-rate for the old.

But if this is too complex we can get to it later - main thing that i think needs to be done is the change to the /weights call used in the sorting.

I have been testing both approaches side by side, and while there often is a general correlation, at times it just fails too badly and there is very little correlation where it matters. For example in the image below you see the rank order from both approaches is just much too different in the little red box (where it matters - e.g. what will be near the top of the results):

image

So we need to change current call from something like this:

curl 'https://app.netdata.cloud/api/v3/spaces/31a2fba1-8ee2-4fa0-9fed-bb437e3fb75c/rooms/8ff5f814-5739-45bb-8da9-7fbd41b18cb0/weights' \
{
   "selectors":{
      "nodes":[
         "*"
      ],
      "contexts":[
         "*"
      ],
      "dimensions":[
         "*"
      ],
      "labels":[
         "*"
      ],
      "alerts":[
         "*"
      ]
   },
   "aggregations":{
      "time":{
         "time_group":"average",
         "time_group_options":"",
         "time_resampling":0
      },
      "metrics":[
         {
            "group_by":[
               
            ],
            "aggregation":"avg"
         }
      ]
   },
   "window":{
      "after":1684929127,
      "before":1684929578,
      "baseline":{
         "after":1684926154,
         "before":1684930894
      }
   },
   "scope":{
      "nodes":[
         
      ],
      "contexts":[
         "*"
      ]
   },
   "method":"volume",
   "options":[
      "null2zero",
      "anomaly-bit",
      "raw"
   ],
   "timeout":30000
}

to this:

curl 'https://app.netdata.cloud/api/v3/spaces/31a2fba1-8ee2-4fa0-9fed-bb437e3fb75c/rooms/8ff5f814-5739-45bb-8da9-7fbd41b18cb0/weights' \
{
   "selectors":{
      "nodes":[
         "*"
      ],
      "contexts":[
         "*"
      ],
      "dimensions":[
         "*"
      ],
      "labels":[
         "*"
      ],
      "alerts":[
         "*"
      ]
   },
   "aggregations":{
      "time":{
         "time_group":"average",
         "time_group_options":"",
         "time_resampling":0
      },
      "metrics":[
         {
            "group_by":[
               
            ],
            "aggregation":"avg"
         }
      ]
   },
   "window":{
      "after":1684929127,
      "before":1684929578,
   },
   "scope":{
      "nodes":[
         
      ],
      "contexts":[
         "*"
      ]
   },
   "method":"anomaly-rate",
   "options":[
      "null2zero",
      "raw"
   ],
   "timeout":30000
}

so:

  • remove baseline as not relevant when using method=anomaly-rate.
  • change method from volume to anomaly-rate.
  • remove the anomaly-bit from the options array since not needed/sensical when using method=anomaly-rate (this is actually needed by the agent now).
  • we may also be able to drop the null2zero and add nonzero as i believe the BE treats nulls and missing data as 0 when method=anomaly-rate. But we would need to double check this with @juacker so maybe for now we just leave the null2zero to just be explicit in passing the 0's and maybe optimize later.

NOTE: If we just make 1 /weights call with method-anomaly-rate this will result in some of the "bad models" getting to the top of the list (since they will always have 100% etc) so ideally we need to use the same approach as before to make 2 weights calls, compare the results and then filter out the obvious rubbish at the top. But i think we can tackle this later after we make the change. It will be better to run the risk of having some false positives at the top of the results (which we can deal with later) and still have the better order based on the anomaly rate. @ktsaou we could try do some of the filtering of rubbish in the /weights call itself but we can tackle this after. Idea here being that the agent itself could look at AR for window, compare it to baseline AR, and if they are both similar then just filter from whats returned by the agent - so this would be some new option for /weights to filter obviously based looking data when method=anomaly-rate.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions