Skip to content

DevX: Track the benchmark infra health and usage #8247

@guangy10

Description

@guangy10

Today I'm monitoring the infra health only via the HUD by filtering jobs with "-perf": https://hud.pytorch.org/hud/pytorch/executorch/main/1?per_page=50&name_filter=-perf&mergeLF=true

I'm wondering if there is a better way to monitor the health and with detailed metrics. It could be something like this: https://hud.pytorch.org/metrics, where I can see the historical run and success rate of the benchmark jobs, nightly runs vs. on-demand. High frequent failures, hotspot devices, etc.

cc: @kimishpatel @digantdesai

cc @huydhn @kirklandsign @shoumikhin @mergennachin @byjlw

Metadata

Metadata

Assignees

Labels

enhancementNot as big of a feature, but technically not a bug. Should be easy to fixmodule: benchmarkIssues related to the benchmark infrastructuremodule: user experienceIssues related to reducing friction for userstriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Projects

Status

Cold Storage

Status

Backlog

Status

Ready

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions