Skip to content

Free up additional disk space on GitHub runner #7574

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 6, 2023

Conversation

tautschnig
Copy link
Collaborator

@tautschnig tautschnig commented Mar 3, 2023

Building with coverage support creates large binaries as well as
coverage records, all of which consume considerable disk space.
2cee3b1 appears to have pushed this over the top for it creates a
library archive consuming another 2.2 GB of disk space.

This additional build step cleans out binaries that we do not need, such
as Haskell, .NET, or Android SDKs. This frees up 28 GB of memory (out of
a total of 84 GB).

The merge of #6479 into develop was the last successful Codecov CI job run. Ever since all Codecov job runs got cancelled at some point.

  • Each commit message has a non-empty body, explaining why the change was made.
  • n/a Methods or procedures I have added are documented, following the guidelines provided in CODING_STANDARD.md.
  • n/a The feature or user visible behaviour I have added or modified has been documented in the User Guide in doc/cprover-manual/
  • Regression or unit tests are included, or existing tests cover the modified code (in this case I have detailed which ones those are in the commit message).
  • My commit message includes data points confirming performance improvements (if claimed).
  • My PR is restricted to a single feature or bugfix.
  • n/a White-space or formatting changes outside the feature-related changed lines are in commits of their own.

@tautschnig tautschnig self-assigned this Mar 3, 2023
@codecov
Copy link

codecov bot commented Mar 3, 2023

Codecov Report

Patch coverage: 86.95% and no project coverage change

Comparison is base (557f4bf) 78.50% compared to head (a4989b9) 78.50%.

Additional details and impacted files
@@           Coverage Diff            @@
##           develop    #7574   +/-   ##
========================================
  Coverage    78.50%   78.50%           
========================================
  Files         1670     1670           
  Lines       191714   191760   +46     
========================================
+ Hits        150498   150540   +42     
- Misses       41216    41220    +4     
Impacted Files Coverage Δ
regression/libcprover-cpp/call_bmc.cpp 87.50% <ø> (ø)
src/libcprover-cpp/api.h 100.00% <ø> (ø)
src/libcprover-cpp/api_options.cpp 100.00% <ø> (ø)
src/util/simplify_expr.cpp 85.29% <80.00%> (-0.11%) ⬇️
src/util/simplify_expr_int.cpp 88.44% <100.00%> (+0.17%) ⬆️
src/util/symbol_table.cpp 91.30% <0.00%> (+2.17%) ⬆️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Building with coverage support creates large binaries as well as
coverage records, all of which consume considerable disk space.
2cee3b1 appears to have pushed this over the top for it creates a
library archive consuming another 2.2 GB of disk space.

This additional build step cleans out binaries that we do not need, such
as Haskell, .NET, or Android SDKs. This frees up 28 GB of memory (out of
a total of 84 GB).
@tautschnig tautschnig force-pushed the bugfixes/coverage-ci branch from 624cfa9 to a4989b9 Compare March 6, 2023 11:24
@tautschnig tautschnig changed the title [EXPERIMENT] Try to revive Codecov job Free up additional disk space on GitHub runner Mar 6, 2023
@tautschnig tautschnig marked this pull request as ready for review March 6, 2023 11:26
@tautschnig tautschnig requested a review from a team as a code owner March 6, 2023 11:26
run: |
# inspired by https://github.com/easimon/maximize-build-space/blob/master/action.yml
df -h
sudo rm -rf /usr/share/dotnet /usr/local/lib/* /opt/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a blunt instrument. A couple of questions:

  1. Does the whole environment get cached, or just specific directories? Weird we're caching the Haskell environment, etc. I get that these are pre-installed by Github on its actions runners, but is the whole runner cached every time?
  2. Are we confident this doesn't introduce side-effects in our build (say, purging a binary we might implicitly depend on) - or is this change experimental and we depend on the CI outcome of this PR's run too see if it works?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I believe that https://github.com/actions/runner-images/blob/main/images/linux/Ubuntu2204-Readme.md describes all that is cached. Maybe there is a way to access some layer that doesn't have all the stuff we don't even want?
  2. This PR is to try out whether there is anything that we should have kept. (Though the earlier revision of this PR already demonstrated things seemingly were working fine.)

Copy link
Contributor

@NlightNFotis NlightNFotis Mar 6, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The software in the list is the pre-installed software in the runners, at least as I understand it.

That software is already present when the runners bootstrap, because it's pre-installed in the images, so I don't think we cache it (neither does it make sense for GitHub to cache it).*

What I think is happening is two fold:

  1. We have too many different jobs, and each of these get cached for each PR, and witch each cache spanning a range of 20-200MiB, we end up blowing the cache, and
  2. For some reason (probably how the coverage builds work) the coverage jobs have a massive cache of 700MiB (compared to a max size of 200MiB for the other jobs, with the average being about ~100MiB on eyeballing it).

I think this can go in for now, as it doesn't seem to be breaking anything, and I think it's an improvement from what I can see (I remember inspecting the coverage jobs last week and they were about 900MiB, so unless nothing else has changed this already has some impact).

But unless we trim the amount of jobs we have significantly (or narrow down the scope of caches on jobs to only run on say Pull Requests and not merges or releases), I'm afraid we will still continue to be plagued by such issues.


Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I do wonder whether there is some confusion here: there is the cache that we control (I think we just put ccache results in there), and then there may be Docker image caches. The latter we don't really care about, that's just how GitHub may choose to build their images (I'm not even sure they use Docker images?). What we do get is VMs with a certain amount of disk space. Some of that disk space is consumed by pre-installed software. This software may come in via the latter kind of cache, but how it ends up in the image doesn't really matter to us - all that matters is that we end up with a disk image with the following mount points (and their available space):

Filesystem      Size  Used Avail Use% Mounted on
/dev/root        84G   58G   26G  70% /
devtmpfs        3.4G     0  3.4G   0% /dev
tmpfs           3.4G  4.0K  3.4G   1% /dev/shm
tmpfs           695M  1.1M  694M   1% /run
tmpfs           5.0M     0  5.0M   0% /run/lock
tmpfs           3.4G     0  3.4G   0% /sys/fs/cgroup
/dev/loop0       64M   64M     0 100% /snap/core20/1822
/dev/loop1       92M   92M     0 100% /snap/lxd/24061
/dev/loop2       50M   50M     0 100% /snap/snapd/17950
/dev/sdb15      105M  6.1M   99M   6% /boot/efi
/dev/sda1        14G  4.1G  9.0G  31% /mnt
tmpfs           695M     0  695M   0% /run/user/1001

This means that only 26 GB are available for our source code, build artefacts, and any software that we still may need to install. With builds with debug symbols enabled and the coverage logging that takes place those 26 GB are no longer sufficient, and we had to free up additional space before properly starting our job's work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, my apologies, I misunderstood.

It seems like you're talking about the actual disk space inside the runner. In this case, yeah, I agree.

I thought originally this was going to affect the cache utilisation (of which we're already using 16GiB 22GiB of the allocated 10GiB, and we're approaching the hard limit).

@tautschnig tautschnig merged commit e024ecb into diffblue:develop Mar 6, 2023
@tautschnig tautschnig deleted the bugfixes/coverage-ci branch March 6, 2023 14:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants