Skip to content

[Clang] Major build regression due to OOM on windows #119781

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
h-vetinari opened this issue Dec 12, 2024 · 11 comments
Closed

[Clang] Major build regression due to OOM on windows #119781

h-vetinari opened this issue Dec 12, 2024 · 11 comments
Labels
build-problem clang Clang issues not falling into any other category

Comments

@h-vetinari
Copy link
Contributor

I've been building the LLVM 20 stack from main in conda-forge (mainly for testing #110217), and I've run into a regression that makes it impossible for us to build clang in our infrastructure. Granted, our free agents from azure-pipelines are quite puny (2 cores, 7-8GB memory, 6h time limit), but this has sufficed for many years to piece together LLVM, even though we have to slice things into several pieces (e.g. libllvm, clang, mlir, compiler-rt, libcxx, openmp, lld, etc.), both to stay under the 6h timeout, but also for packaging reasons.

Following the various stages of #110217, I've built sets of packages based off the following commits:

(The ✅ / ❌ here only refer to building clang on windows, the other plaforms were fine)

After reproducing the problem 5+ times (happens 100% of the time), the failure looks as follows:

[144/2145] Building CXX object lib\Basic\CMakeFiles\obj.clangBasic.dir\Targets.cpp.obj
##[warning]Free memory is lower than 5%; Currently used: 95.37%
##[warning]Free memory is lower than 5%; Currently used: 97.49%
##[warning]Free disk space on D:\ is lower than 5%; Currently used: 95.05%
##[warning]Free disk space on D:\ is lower than 5%; Currently used: 95.05%
[145/2145] Building CXX object lib\Basic\CMakeFiles\obj.clangBasic.dir\Targets\AArch64.cpp.obj
FAILED: lib/Basic/CMakeFiles/obj.clangBasic.dir/Targets/AArch64.cpp.obj 
C:\PROGRA~1\MICROS~2\2022\ENTERP~1\VC\Tools\MSVC\1429~1.301\bin\HostX64\x64\cl.exe  /nologo /TP -DCLANG_BUILD_STATIC -DUNICODE -D_CRT_NONSTDC_NO_DEPRECATE -D_CRT_NONSTDC_NO_WARNINGS -D_CRT_SECURE_NO_DEPRECATE -D_CRT_SECURE_NO_WARNINGS -D_HAS_EXCEPTIONS=0 -D_SCL_SECURE_NO_DEPRECATE -D_SCL_SECURE_NO_WARNINGS -D_UNICODE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -I%SRC_DIR%\clang\build\lib\Basic -I%SRC_DIR%\clang\lib\Basic -I%SRC_DIR%\clang\include -I%SRC_DIR%\clang\build\include -I%PREFIX%\Library\include -MD  /DWIN32 /D_WINDOWS   /Zc:inline /Zc:preprocessor /Zc:__cplusplus /Oi /bigobj /permissive- /W4 -wd4141 -wd4146 -wd4244 -wd4267 -wd4291 -wd4351 -wd4456 -wd4457 -wd4458 -wd4459 -wd4503 -wd4624 -wd4722 -wd4100 -wd4127 -wd4512 -wd4505 -wd4610 -wd4510 -wd4702 -wd4245 -wd4706 -wd4310 -wd4701 -wd4703 -wd4389 -wd4611 -wd4805 -wd4204 -wd4577 -wd4091 -wd4592 -wd4319 -wd4709 -wd5105 -wd4324 -wd4251 -wd4275 -w14062 -we4238 /Gw /O2 /Ob2 /DNDEBUG -std:c++17 -MD  /EHs-c- /GR /showIncludes /Folib\Basic\CMakeFiles\obj.clangBasic.dir\Targets\AArch64.cpp.obj /Fdlib\Basic\CMakeFiles\obj.clangBasic.dir\ /FS -c %SRC_DIR%\clang\lib\Basic\Targets\AArch64.cpp
%SRC_DIR%\clang\build\include\clang/Basic/arm_sve_builtins.inc(7787): fatal error C1060: compiler is out of heap space
ninja: build stopped: subcommand failed.

This is despite already using a ~16GB swapfile, and having reduced parallelism to -j1. It might be that somehing in include/clang/Basic/arm_sve_builtins.inc goes haywire? I wasn't able to check as that file is generated and not checked-in.

The problem is that we cannot even easily switch to compile with clang, because that creates a cycle from the POV of our build tool (clang building clang). Of course, that tool could be improved, and there are possible work-arounds, but it's still a pretty major regression IMO if even a single-threaded build cannot build clang with 8GB memory (+16GB swap) anymore.

FWIW, our build looks roughly as follows (against an existing libllvm & libcxx compiled from the same version or commit):

mkdir build
cd build

set "CXX=cl.exe"
set "CC=cl.exe"
cmake -G "Ninja" ^
    -DCMAKE_BUILD_TYPE="Release" ^
    -DCMAKE_PREFIX_PATH=%LIBRARY_PREFIX% ^
    -DCMAKE_INSTALL_PREFIX:PATH=%LIBRARY_PREFIX% ^
    -DCLANG_FORCE_MATCHING_LIBCLANG_SOVERSION=OFF ^
    -DCLANG_INCLUDE_TESTS=OFF ^
    -DCLANG_INCLUDE_DOCS=OFF ^
    -DLLVM_INCLUDE_TESTS=OFF ^
    -DLLVM_INCLUDE_DOCS=OFF ^
    -DLLVM_ENABLE_LIBXML2=FORCE_ON ^
    -DLLVM_ENABLE_ZLIB=FORCE_ON ^
    -DLLVM_ENABLE_ZSTD=FORCE_ON ^
    -DPython3_EXECUTABLE=%BUILD_PREFIX%\python ^
    ../clang
if %ERRORLEVEL% neq 0 exit 1

:: also tried with -j1; still OOMs
ninja -j%CPU_COUNT%
if %ERRORLEVEL% neq 0 exit 1
@github-actions github-actions bot added the clang Clang issues not falling into any other category label Dec 12, 2024
@h-vetinari
Copy link
Contributor Author

I'm bisecting this currently

bisection log
E:\upstreams\llvm-project>git bisect start
status: waiting for both good and bad commits

E:\upstreams\llvm-project>git bisect good d6ec7c82f383ae4268f350f4d2e267af45fae8c0
status: waiting for bad commit, 1 good commit known

E:\upstreams\llvm-project>git bisect bad 3c464d23682b0f9e6f70965e8f8f3861c9ba541
Bisecting: 269 revisions left to test after this (roughly 8 steps)
[44cd8f0d06b7efcd0101d9101f822386e12e4d66] [flang] Lower CSHIFT to hlfir.cshift operation. (#118917)

E:\upstreams\llvm-project>git bisect bad
Bisecting: 134 revisions left to test after this (roughly 7 steps)
[afef545efab77a8f081cae72900c273af4d5c35c] [VPlan] Address post-commit for #114305.

E:\upstreams\llvm-project>git bisect good
Bisecting: 67 revisions left to test after this (roughly 6 steps)
[788d5a5f1e1263657f0a281545e095769f6b375b] [polly] Add a Maintainers.md file

E:\upstreams\llvm-project>git bisect bad
Bisecting: 33 revisions left to test after this (roughly 5 steps)
[e0ea9fd6dc36f585e364d4e569095ebe063e2573] [RISCV][GISel] Lower G_SCMP and G_UCMP. (#119112)

E:\upstreams\llvm-project>git bisect bad
Bisecting: 16 revisions left to test after this (roughly 4 steps)
[b33c807b39e2fa07977277d13552f3d773c6b61e] [AMDGPU] Add MaxMemoryClauseSchedStrategy (#114957)

E:\upstreams\llvm-project>git bisect good
Bisecting: 8 revisions left to test after this (roughly 3 steps)
[4dcc2f5db9d3a9d317aa1acde7adbbe9ec467cb2] [libc] Replace LLVM_COMPILER_IS_GCC_COMPATIBLE with a local check. (#119164)

E:\upstreams\llvm-project>git bisect bad
Bisecting: 3 revisions left to test after this (roughly 2 steps)
[f2d18a4d00c5f5dea436b4f7b62ec5c87b98eac2] Reapply "[ORC] Introduce LazyReexportsManager, ... (#118923)" with fixes.

E:\upstreams\llvm-project>git bisect bad
Bisecting: 1 revision left to test after this (roughly 1 step)
[be2df95e9281985b61270bb6420ea0eeeffbbe59] Switch builtin strings to use string tables (#118734)

It's down to one of

Looking at the remaining commits, it's almost certainly due be2df95 (CC @chandlerc). I need to head out now, but will try later to confirm if a revert of that fixes things.

@chandlerc
Copy link
Member

Yeah it's probably that commit.

What version of MSVC are you using?

Most of versions seem to be OK with the new code, but some may struggle more with memory than others.

There is one version of MSVC that seems to be miscompiling some of this code, and we're likely to temporarily revert until that miscompile is addressed. But eventually I'd really like to re-land this as it is a pretty significant improvement. I'll look into whether there is a viable way to reduce MSVC's memory usage here in the mean time, but I'm not sure...

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Dec 14, 2024

Thanks for the quick response!

What version of MSVC are you using?

We're using the newest VS2022, but currently targeting the VC 14.2 toolset (corresponding to VS2019).

I can try if targeting VC 14.3 helps.

@chandlerc
Copy link
Member

This is definitely an area where I suspect MSVC's memory usage has improved substantially over time, and so using the later versions will help given how memory constrained the environment you're working in is.

@chandlerc
Copy link
Member

Note that #119638 reverted the code causing this, but not closing this issue because it may come back in that or a different form.

@h-vetinari
Copy link
Contributor Author

Unfortunately, even VC v14.42 == VS v17.12 fails. IOW, there seems to be no currently available MSVC-based compiler that can handle this.

@chandlerc
Copy link
Member

Unfortunately, even VC v14.42 == VS v17.12 fails. IOW, there seems to be no currently available MSVC-based compiler that can handle this.

With your memory limits... I'm not sure it's realistic to reliably remain under those kinds of limits.

All but one version that we have build bots for were ok.

@h-vetinari
Copy link
Contributor Author

h-vetinari commented Dec 14, 2024

8GB (+ more than that in swap) is not realistic anymore? We're not talking about a tiny embedded device here, but the default agent/VM of a major CI provider (GHA & Azure pipelines).

In #112789, @nikic recently asked that building flang should not take more than 2GB per thread for it to be enabled by default. Presumably clang should respect similar limits? Even if we doubled that, it'd still be fine for us. But moving the minimum memory a factor 10 higher seems excessive (8GB main mem + ~16GB swap > 20GB).

@nikic
Copy link
Contributor

nikic commented Dec 14, 2024

FWIW when building that commit with GCC there is no significant change to max-rss (it stays around 2GB). I don't have stats for a build using clang handy.

I expect it would be possible to reduce the memory impact for MSVC a lot by directly generating the offsets using TableGen, so we don't have to generate a huge table via a constexpr function, which I assume is what trips up MSVC here.

@chandlerc
Copy link
Member

I have put together a patch series with a set of improvements built on the original PR that I think will drastically reduce the memory usage requirements: #120534

My experiments with Compiler Explorer seems to indicate that the constexpr function alone wasn't the only problem, maybe not even the dominant problem.

The PR does a few things: it avoids the use of complex X-macro expansions to create the string literal, it reduces the number of string literal concatenations required, and it shrinks the largest of the string literals by a factor 3-ish.

@h-vetinari
Copy link
Contributor Author

Closing this as the PR in question was reverted. I also tested a previous iteration of #120534 and it worked fine, so I'm hopeful that it will not regress again. Otherwise I can always reopen here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build-problem clang Clang issues not falling into any other category
Projects
None yet
Development

No branches or pull requests

4 participants