Skip to content

[obs] Dashboard for psi metrics #13539

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 3, 2022
Merged

[obs] Dashboard for psi metrics #13539

merged 1 commit into from
Oct 3, 2022

Conversation

Furisto
Copy link
Member

@Furisto Furisto commented Oct 3, 2022

Description

We currently use the load average of a node to observe if it is overloaded. The shortcomings of that approach are:

  • It is coarse grained. It is not obvious which resource was contented and how strong the demand for that resource was. We need to consult additional metrics to find out why a load spike has occurred.
  • Even then there is some guesswork involved. It would be much nicer if the OS would just tell us why the system was slow or in other words which resource was contented and how much.
  • This is where pressure stall information comes in. Pressure stall information was first introduced in kernel 4.20 (so also available on self-hosted). For each major resource (cpu, memory and IO) it shows the percentage of the time that processes were not able to run due the resource not being available.

https://www.loom.com/share/1de5ca7c8aea42218c3dbd0cfc4131b0

Related Issue(s)

n.a.

How to test

Search for "Pressure Stall Information" in Grafana. The dashboard should show up.

Release Notes

None

Werft options:

  • /werft with-local-preview
    If enabled this will build install/preview
  • /werft with-preview
  • /werft with-integration-tests=all
    Valid options are all, workspace, webapp, ide

@Furisto Furisto added the team: workspace Issue belongs to the Workspace team label Oct 3, 2022
@Furisto Furisto self-assigned this Oct 3, 2022
@Furisto Furisto requested a review from a team October 3, 2022 14:54
@roboquat roboquat merged commit f4a71fa into main Oct 3, 2022
@roboquat roboquat deleted the fo/psi-dashboard branch October 3, 2022 22:34
@roboquat roboquat added deployed: workspace Workspace team change is running in production deployed Change is completely running in production labels Oct 5, 2022
@Furisto Furisto added the feature: psi Pressure Stall Information label Oct 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployed: workspace Workspace team change is running in production deployed Change is completely running in production feature: psi Pressure Stall Information release-note-none size/XXL team: workspace Issue belongs to the Workspace team
Projects
No open projects
Status: Done
Development

Successfully merging this pull request may close these issues.

3 participants