Open
Description
The new Locust CI system by @rzats is very cool and convenient! In order to increase our trust in the system, we should get some baseline numbers about how it works.
My main question is: what sort of variance should we expect in the benchmark numbers from run to run?
There are many factors that could contribute to this variance, including:
- the hardware that GH provisions us for the particular job run
- the proximity of the hardware to our staging database
- the load on GH's networks
- the load on the hardware that GH provisions us (since CI runs in a multi-tenant VM)
Without worrying about specifically any of these factors, but instead approaching the system as a whole, I propose that we do the following to test:
- open up a new PR
- choose a large query set (~1000 queries)
- trigger the benchmarking CI once every X minutes for about Y hours (e.g. X = 10, Y = 12)
- compile the output and get a sense of the variance in the benchmarks over time
Running the benchmarks over a long time will help us see how much GH load affects us.
Running the benchmarks over a short time will help us see how much variance there is even with the same GH load.
EDIT:
h/t @rzats found this link on GH Actions Perf Stability
Metadata
Metadata
Assignees
Labels
No labels