Cordon build nodes if their disk is more than 80% full #10116
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This changes the platform-trigger-werft-cleanup job so that instead of triggering a separate Werft job that is scheduled on each build node which performs a cleanup on the individual nodes, it will now SSH out to the instance and simply cordon it if the disk is more than 80% full.
This approach has two benefits
docker system prune
which had the downside of potentially breaking currently running builds. Because of this we couldn't run the job too often. Now we can run the job as often as we'd like. I have changed it to every 4 hoursThe relevant changes to the service account has been made here https://github.com/gitpod-io/ops/pull/2375
Related Issue(s)
Fixes https://github.com/gitpod-io/ops/issues/2050
Fixes https://github.com/gitpod-io/ops/issues/1227
How to test
See the comments in the code on how to run this. Here are two examples
Release Notes
Documentation
N/A