Skip to content

Tooling for automated stats gathering #115

Closed
@brandtbucher

Description

@brandtbucher

We have a wonderful mechanism (--with-pystats) for quantifying the success of many of our optimizations. However, there is currently quite a bit of friction involved in collecting that data. I think that the situation can be improved without too much difficulty.

My current wishlist regarding pyperformance runs built --with-pystats:

  • A cron job to run pyperformance with stats turned on, perhaps weekly, and check the results into the ideas repo. This can run on GitHub actions, since we don't actually care about performance stability. It can also be parallelized, and maybe even make use of Pyperformance's --fast option.
    • Bonus points: compare the stats with the previous run, and surface any "interesting" regressions (there have been times where hit rates have plummeted without our knowledge before).
  • A way to run a stats build of pyperformance using a label on any CPython PR, and report the results in a comment. Currently, collecting stats for a PR is a slow process that must be completed locally.
    • Bonus points: also run stats for the base commit.
      • Bonus bonus points: compare the stats automatically, surfacing any "interesting" changes.

I also know that @markshannon has also expressed a desire to have pyperformance (or maybe pyperf?) turn stats gathering on and off using 3.12's new sys utilities before and after running each benchmark, so that we're gathering just stats on the benchmarks themselves, and ignoring the external benchmarking machinery.

CC @mdboom

Individual tasks to get there:

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions