-
Notifications
You must be signed in to change notification settings - Fork 273
Free up additional disk space on GitHub runner #7574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## develop #7574 +/- ##
========================================
Coverage 78.50% 78.50%
========================================
Files 1670 1670
Lines 191714 191760 +46
========================================
+ Hits 150498 150540 +42
- Misses 41216 41220 +4
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
Building with coverage support creates large binaries as well as coverage records, all of which consume considerable disk space. 2cee3b1 appears to have pushed this over the top for it creates a library archive consuming another 2.2 GB of disk space. This additional build step cleans out binaries that we do not need, such as Haskell, .NET, or Android SDKs. This frees up 28 GB of memory (out of a total of 84 GB).
624cfa9
to
a4989b9
Compare
run: | | ||
# inspired by https://github.com/easimon/maximize-build-space/blob/master/action.yml | ||
df -h | ||
sudo rm -rf /usr/share/dotnet /usr/local/lib/* /opt/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a blunt instrument. A couple of questions:
- Does the whole environment get cached, or just specific directories? Weird we're caching the Haskell environment, etc. I get that these are pre-installed by Github on its actions runners, but is the whole runner cached every time?
- Are we confident this doesn't introduce side-effects in our build (say, purging a binary we might implicitly depend on) - or is this change experimental and we depend on the CI outcome of this PR's run too see if it works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I believe that https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2204-Readme.md describes all that is cached. Maybe there is a way to access some layer that doesn't have all the stuff we don't even want?
- This PR is to try out whether there is anything that we should have kept. (Though the earlier revision of this PR already demonstrated things seemingly were working fine.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The software in the list is the pre-installed software in the runners, at least as I understand it.
That software is already present when the runners bootstrap, because it's pre-installed in the images, so I don't think we cache it (neither does it make sense for GitHub to cache it).*
What I think is happening is two fold:
- We have too many different jobs, and each of these get cached for each PR, and witch each cache spanning a range of 20-200MiB, we end up blowing the cache, and
- For some reason (probably how the coverage builds work) the coverage jobs have a massive cache of 700MiB (compared to a max size of 200MiB for the other jobs, with the average being about ~100MiB on eyeballing it).
I think this can go in for now, as it doesn't seem to be breaking anything, and I think it's an improvement from what I can see (I remember inspecting the coverage jobs last week and they were about 900MiB, so unless nothing else has changed this already has some impact).
But unless we trim the amount of jobs we have significantly (or narrow down the scope of caches on jobs to only run on say Pull Requests and not merges or releases), I'm afraid we will still continue to be plagued by such issues.
- With the asterisk that we seem to be caching the contents of
.ccache
(https://github.com/diffblue/cbmc/blob/develop/.github/workflows/pull-request-checks.yaml#L932), so unless we expand the scope of that in a way that I don't understand, we just cache build artefacts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I do wonder whether there is some confusion here: there is the cache that we control (I think we just put ccache results in there), and then there may be Docker image caches. The latter we don't really care about, that's just how GitHub may choose to build their images (I'm not even sure they use Docker images?). What we do get is VMs with a certain amount of disk space. Some of that disk space is consumed by pre-installed software. This software may come in via the latter kind of cache, but how it ends up in the image doesn't really matter to us - all that matters is that we end up with a disk image with the following mount points (and their available space):
Filesystem Size Used Avail Use% Mounted on
/dev/root 84G 58G 26G 70% /
devtmpfs 3.4G 0 3.4G 0% /dev
tmpfs 3.4G 4.0K 3.4G 1% /dev/shm
tmpfs 695M 1.1M 694M 1% /run
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 3.4G 0 3.4G 0% /sys/fs/cgroup
/dev/loop0 64M 64M 0 100% /snap/core20/1822
/dev/loop1 92M 92M 0 100% /snap/lxd/24061
/dev/loop2 50M 50M 0 100% /snap/snapd/17950
/dev/sdb15 105M 6.1M 99M 6% /boot/efi
/dev/sda1 14G 4.1G 9.0G 31% /mnt
tmpfs 695M 0 695M 0% /run/user/1001
This means that only 26 GB are available for our source code, build artefacts, and any software that we still may need to install. With builds with debug symbols enabled and the coverage logging that takes place those 26 GB are no longer sufficient, and we had to free up additional space before properly starting our job's work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, my apologies, I misunderstood.
It seems like you're talking about the actual disk space inside the runner. In this case, yeah, I agree.
I thought originally this was going to affect the cache utilisation (of which we're already using 16GiB 22GiB of the allocated 10GiB, and we're approaching the hard limit).
Building with coverage support creates large binaries as well as
coverage records, all of which consume considerable disk space.
2cee3b1 appears to have pushed this over the top for it creates a
library archive consuming another 2.2 GB of disk space.
This additional build step cleans out binaries that we do not need, such
as Haskell, .NET, or Android SDKs. This frees up 28 GB of memory (out of
a total of 84 GB).
The merge of #6479 into develop was the last successful Codecov CI job run. Ever since all Codecov job runs got cancelled at some point.