You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LLVM appears to generate R_X86_64_TPOFF32 relocations for accesses to thread-local variables, even when the large code model is enabled with the clang flag -mcmodel=large. This results in a linker error for programs that require >4GB thread-local data sections (.tdata and .tbss).
Compile as follows: clang -mcmodel=large test.c
Produces following error:
/tmp/user/20498/large-633e12.o: in function `main':
large.c:(.text+0xf): relocation truncated to fit: R_X86_64_TPOFF32 against symbol `buf' defined in .tbss section in /tmp/user/20498/large-633e12.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
LLVM appears to generate R_X86_64_TPOFF32 relocations for accesses to thread-local variables, even when the large code model is enabled with the clang flag `-mcmodel=large`. This results in a linker error for programs that require >4GB thread-local data sections (.tdata and .tbss).
Compile as follows: clang -mcmodel=large test.c
Produces following error:
/tmp/user/20498/large-633e12.o: in function `main':
large.c:(.text+0xf): relocation truncated to fit: R_X86_64_TPOFF32 against symbol `buf' defined in .tbss section in /tmp/user/20498/large-633e12.o
clang: error: linker command failed with exit code 1 (use -v to see invocation)
LLVM appears to generate R_X86_64_TPOFF32 relocations for accesses to thread-local variables, even when the large code model is enabled with the clang flag -mcmodel=large. This results in a linker error for programs that require >4GB thread-local data sections (.tdata and .tbss).
The large code model is for code and data. There is no requirement that a large code model needs to support >4GB TLS.
I'd consider this a wont-change, as using a longer code sequence is going to pessimize every large code model user without tangible benefits. In addition, GCC generates R_X86_64_TPOFF32. There is no hard requirement that we must match it, but we should make careful judgement.
The large code model is already extremely slow, I doubt TLS performance is a concern.
IMO making the large code model work with large TLS data seems fine if many people request it. Is this something you actually hit in real binaries, or just a theoretical thing? 4GB TLS does seems like a lot.
Note: a 2GiB static TLS block almost assuredly won't work. It means that the dynamic loader needs to allocate 2GiB thread stack upfront for each new thread. No lazy allocation is possible. The memory use is just not affordable.
LLVM uses R_X86_64_TPOFF32 relocations for large code model (-mcmodel=large), resulting in 'relocation truncated to fit' error · Issue #77128 · llvm/llvm-project
Activity
[X86][ISel] Select MOV/ADD64ri32 for tglobaltlsaddr only under small …
llvmbot commentedon Jan 23, 2024
@llvm/issue-subscribers-backend-x86
Author: Nicholas Mosier (nmosier)
Here is an example program that fails to compile:
Compile as follows:
clang -mcmodel=large test.c
Produces following error:
MaskRay commentedon Jan 25, 2024
The large code model is for code and data. There is no requirement that a large code model needs to support >4GB TLS.
I'd consider this a wont-change, as using a longer code sequence is going to pessimize every large code model user without tangible benefits. In addition, GCC generates R_X86_64_TPOFF32. There is no hard requirement that we must match it, but we should make careful judgement.
aeubanks commentedon Jan 25, 2024
The large code model is already extremely slow, I doubt TLS performance is a concern.
IMO making the large code model work with large TLS data seems fine if many people request it. Is this something you actually hit in real binaries, or just a theoretical thing? 4GB TLS does seems like a lot.
MaskRay commentedon Jan 26, 2024
Note: a 2GiB static TLS block almost assuredly won't work. It means that the dynamic loader needs to allocate 2GiB thread stack upfront for each new thread. No lazy allocation is possible. The memory use is just not affordable.