-
Notifications
You must be signed in to change notification settings - Fork 13.4k
[GlobalMerge] Aggressively merge constants to reduce TOC entries #111756
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Symbols that get mapped into the read-only section are loaded as part of the text segment and will always need a TOC entry to be addressable. Add an option to aggressively merge these read only globals to reduce TOC usage.
@llvm/pr-subscribers-backend-powerpc Author: Zaara Syeda (syzaara) ChangesSymbols that get mapped into the read-only section are loaded as part of the text segment and will always need a TOC entry to be addressable. Add an option to aggressively merge these read only globals to reduce TOC usage. Full diff: https://github.com/llvm/llvm-project/pull/111756.diff 2 Files Affected:
diff --git a/llvm/lib/CodeGen/GlobalMerge.cpp b/llvm/lib/CodeGen/GlobalMerge.cpp
index 007bea9a6585ef..4c6c8c600ee2bb 100644
--- a/llvm/lib/CodeGen/GlobalMerge.cpp
+++ b/llvm/lib/CodeGen/GlobalMerge.cpp
@@ -119,6 +119,11 @@ static cl::opt<bool> GlobalMergeGroupByUse(
"global-merge-group-by-use", cl::Hidden,
cl::desc("Improve global merge pass to look at uses"), cl::init(true));
+static cl::opt<bool> GlobalMergeAllConst(
+ "global-merge-all-const", cl::Hidden,
+ cl::desc("Merge all const globals without looking at uses"),
+ cl::init(false));
+
static cl::opt<bool> GlobalMergeIgnoreSingleUse(
"global-merge-ignore-single-use", cl::Hidden,
cl::desc("Improve global merge pass to ignore globals only used alone"),
@@ -263,7 +268,7 @@ bool GlobalMergeImpl::doMerge(SmallVectorImpl<GlobalVariable *> &Globals,
});
// If we want to just blindly group all globals together, do so.
- if (!GlobalMergeGroupByUse) {
+ if (!GlobalMergeGroupByUse || (GlobalMergeAllConst && isConst)) {
BitVector AllGlobals(Globals.size());
AllGlobals.set();
return doMerge(Globals, AllGlobals, M, isConst, AddrSpace);
diff --git a/llvm/test/CodeGen/PowerPC/aix-aggressive-merge-const.ll b/llvm/test/CodeGen/PowerPC/aix-aggressive-merge-const.ll
new file mode 100644
index 00000000000000..ec2d409bfb971d
--- /dev/null
+++ b/llvm/test/CodeGen/PowerPC/aix-aggressive-merge-const.ll
@@ -0,0 +1,93 @@
+; RUN: llc --verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff \
+; RUN: -global-merge-all-const=true < %s | FileCheck %s
+
+; RUN: llc --verify-machineinstrs -mtriple powerpc-ibm-aix-xcoff \
+; RUN: -global-merge-all-const=false < %s | FileCheck --check-prefix=NOMERGE %s
+
+%struct.pc_t = type { i8 }
+%struct.S = type { i32, i32, i32, i32, [9 x i32] }
+
+@constinit = private unnamed_addr constant <{ i32, i32, i32, i32, [9 x i32] }> <{ i32 0, i32 0, i32 0, i32 2, [9 x i32] zeroinitializer }>, align 4
+@.str = private unnamed_addr constant [6 x i8] c"hello\00", align 1
+@.str.1 = private unnamed_addr constant [6 x i8] c"world\00", align 1
+@.str.2 = private unnamed_addr constant [6 x i8] c"abcde\00", align 1
+@.str.3 = private unnamed_addr constant [6 x i8] c"fghij\00", align 1
+@pc = internal constant %struct.pc_t zeroinitializer, align 1
+@s = internal constant %struct.S { i32 1, i32 2, i32 3, i32 4, [9 x i32] [i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11, i32 12, i32 13] }, align 4
+
+; Function Attrs: mustprogress
+define noundef i32 @f5() {
+entry:
+ %call = tail call noundef i32 @f4(ptr noundef nonnull @pc)
+ ret i32 %call
+}
+
+declare noundef i32 @f4(ptr noundef)
+declare noundef i32 @printf(ptr nocapture noundef readonly, ...)
+
+define noundef i32 @f1() {
+entry:
+ %call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str)
+ ret i32 %call
+}
+
+
+; Function Attrs: mustprogress nofree nounwind
+define noundef i32 @f2() {
+entry:
+ %call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.1)
+ ret i32 %call
+}
+
+define noundef i32 @f3() {
+entry:
+ %call = tail call i32 (ptr, ...) @printf(ptr noundef nonnull dereferenceable(1) @.str.2)
+ ret i32 %call
+}
+
+define noundef i32 @f7() {
+entry:
+ %call = tail call noundef i32 @f6(ptr noundef nonnull @s)
+ ret i32 %call
+}
+
+declare noundef i32 @f6(ptr noundef)
+
+; CHECK: .csect L.._MergedGlobals[RO],2
+; CHECK-NEXT: .lglobl pc # @_MergedGlobals
+; CHECK-NEXT: .lglobl s
+; CHECK-NEXT: .align 2
+; CHECK-NEXT:pc:
+; CHECK-NEXT: .space 1
+; CHECK-NEXT:L...str:
+; CHECK-NEXT: .string "hello"
+; CHECK-NEXT:L...str.1:
+; CHECK-NEXT: .string "world"
+; CHECK-NEXT:L...str.2:
+; CHECK-NEXT: .string "abcde"
+; CHECK-NEXT:L...str.3:
+; CHECK-NEXT: .string "fghij"
+; CHECK-NEXT: .space 3
+; CHECK-NEXT:L..constinit:
+; CHECK-NEXT: .vbyte 4, 0 # 0x0
+; CHECK-NEXT: .vbyte 4, 0 # 0x0
+; CHECK-NEXT: .vbyte 4, 0 # 0x0
+; CHECK-NEXT: .vbyte 4, 2 # 0x2
+; CHECK-NEXT: .space 36
+; CHECK-NEXT:s:
+; CHECK-NEXT: .vbyte 4, 1 # 0x1
+; CHECK-NEXT: .vbyte 4, 2 # 0x2
+; CHECK-NEXT: .vbyte 4, 3 # 0x3
+; CHECK-NEXT: .vbyte 4, 4 # 0x4
+; CHECK-NEXT: .vbyte 4, 5 # 0x5
+; CHECK-NEXT: .vbyte 4, 6 # 0x6
+; CHECK-NEXT: .vbyte 4, 7 # 0x7
+; CHECK-NEXT: .vbyte 4, 8 # 0x8
+; CHECK-NEXT: .vbyte 4, 9 # 0x9
+; CHECK-NEXT: .vbyte 4, 10 # 0xa
+; CHECK-NEXT: .vbyte 4, 11 # 0xb
+; CHECK-NEXT: .vbyte 4, 12 # 0xc
+; CHECK-NEXT: .vbyte 4, 13 # 0xd
+
+
+; NOMERGE-NOT: L.._MergedGGlobals[RO]
|
This seems like it will increase the number of loads from the TOC... and inserting more TOC loads so you can shrink the TOC doesn't seem like a good tradeoff. |
The number of loads from TOC is not changed since we still need TOC loads for the constants before merging. For example:
The constant strings “hello” and “world\n” are placed in the TOC and without aggressive constant merging, the code we produce is:
We can observe the TOC loads:
With aggressive merging of the constants, we can use one TOC entry, and keep the same number of TOC loads:
Now we see the TOC loads from the MergedGlobals:
|
If you manage to merge every global in the module into a single variable, the number of loads stays the same, sure. But in general, that isn't possible: because of the offset limitations, you'll end up generating multiple global variables in the general case. And in that case, how you partition the globals into groups to be merged matters. |
It seems like regardless of how the partitioning happens, we will end up reducing TOC size while keeping the number of loads the same.
We create 2 MergedGlobals due to the MaxOffset
This reduces the TOC size to 2 entries, while keeping the number of loads the same. |
If you put "a" and "c" together, and "b" and "d" together, each function has one TOC load. If you merge a+b and c+d instead, each function has two TOC loads. |
I see what you mean now. I tried prototyping another approach where we merge the same way as we currently do and then follow that on with merging any left overs. I then compared it with this approach of blindly merging all constants together using SPEC. The results showed that there was negligible impact on performance but the TOC size was ~2% smaller if we merge all constants without looking at use. Since reducing TOC size is beneficial, we would like to move forward with this patch. As a next step, our goal is to use this approach in GlobalMerge to replace the PPCMergeStringPoolPass as this can handle everything that PPCMergeStringPoolPass was meant to handle while using the existing infrastructure. Additionally, we hope to move away from PPCMergeStringPoolPass to GlobalMerge since PPCMergeStringPoolPass has observed some intermittent issues on some of the buildbots in the past. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you've done the analysis, and determined this is good enough for now, then that's fine.
LGTM
…m#111756) Symbols that get mapped into the read-only section are loaded as part of the text segment and will always need a TOC entry to be addressable. Add an option to aggressively merge these read only globals to reduce TOC usage.
Symbols that get mapped into the read-only section are loaded as part of the text segment and will always need a TOC entry to be addressable. Add an option to aggressively merge these read only globals to reduce TOC usage.