Skip to content

Free up additional disk space on GitHub runner #7574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 6, 2023
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions .github/workflows/pull-request-checks.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -910,6 +910,12 @@ jobs:
uses: actions/checkout@v3
with:
submodules: recursive
- name: Remove unnecessary software to free up disk space
run: |
# inspired by https://github.com/easimon/maximize-build-space/blob/master/action.yml
df -h
sudo rm -rf /usr/share/dotnet /usr/local/lib/* /opt/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a blunt instrument. A couple of questions:

  1. Does the whole environment get cached, or just specific directories? Weird we're caching the Haskell environment, etc. I get that these are pre-installed by Github on its actions runners, but is the whole runner cached every time?
  2. Are we confident this doesn't introduce side-effects in our build (say, purging a binary we might implicitly depend on) - or is this change experimental and we depend on the CI outcome of this PR's run too see if it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I believe that https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2204-Readme.md describes all that is cached. Maybe there is a way to access some layer that doesn't have all the stuff we don't even want?
  2. This PR is to try out whether there is anything that we should have kept. (Though the earlier revision of this PR already demonstrated things seemingly were working fine.)

Copy link
Contributor

@NlightNFotis NlightNFotis Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The software in the list is the pre-installed software in the runners, at least as I understand it.

That software is already present when the runners bootstrap, because it's pre-installed in the images, so I don't think we cache it (neither does it make sense for GitHub to cache it).*

What I think is happening is two fold:

  1. We have too many different jobs, and each of these get cached for each PR, and witch each cache spanning a range of 20-200MiB, we end up blowing the cache, and
  2. For some reason (probably how the coverage builds work) the coverage jobs have a massive cache of 700MiB (compared to a max size of 200MiB for the other jobs, with the average being about ~100MiB on eyeballing it).

I think this can go in for now, as it doesn't seem to be breaking anything, and I think it's an improvement from what I can see (I remember inspecting the coverage jobs last week and they were about 900MiB, so unless nothing else has changed this already has some impact).

But unless we trim the amount of jobs we have significantly (or narrow down the scope of caches on jobs to only run on say Pull Requests and not merges or releases), I'm afraid we will still continue to be plagued by such issues.


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I do wonder whether there is some confusion here: there is the cache that we control (I think we just put ccache results in there), and then there may be Docker image caches. The latter we don't really care about, that's just how GitHub may choose to build their images (I'm not even sure they use Docker images?). What we do get is VMs with a certain amount of disk space. Some of that disk space is consumed by pre-installed software. This software may come in via the latter kind of cache, but how it ends up in the image doesn't really matter to us - all that matters is that we end up with a disk image with the following mount points (and their available space):

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   58G   26G  70% /
devtmpfs        3.4G     0  3.4G   0% /dev
tmpfs           3.4G  4.0K  3.4G   1% /dev/shm
tmpfs           695M  1.1M  694M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.4G     0  3.4G   0% /sys/fs/cgroup
/dev/loop0       64M   64M     0 100% /snap/core20/1822
/dev/loop1       92M   92M     0 100% /snap/lxd/24061
/dev/loop2       50M   50M     0 100% /snap/snapd/17950
/dev/sdb15      105M  6.1M   99M   6% /boot/efi
/dev/sda1        14G  4.1G  9.0G  31% /mnt
tmpfs           695M     0  695M   0% /run/user/1001

This means that only 26 GB are available for our source code, build artefacts, and any software that we still may need to install. With builds with debug symbols enabled and the coverage logging that takes place those 26 GB are no longer sufficient, and we had to free up additional space before properly starting our job's work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, my apologies, I misunderstood.

It seems like you're talking about the actual disk space inside the runner. In this case, yeah, I agree.

I thought originally this was going to affect the cache utilisation (of which we're already using 16GiB 22GiB of the allocated 10GiB, and we're approaching the hard limit).

df -h
- name: Download testing and coverage dependencies
env:
# This is needed in addition to -yq to prevent apt-get from asking for
Expand Down