-
Notifications
You must be signed in to change notification settings - Fork 14.3k
[BOLT][BAT] Add entries for deleted basic blocks #91906
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BOLT][BAT] Add entries for deleted basic blocks #91906
Conversation
Created using spr 1.3.4 [skip ci]
Created using spr 1.3.4
@llvm/pr-subscribers-bolt Author: Amir Ayupov (aaupov) ChangesDeleted basic blocks are required for correct mapping of branches Test Plan: updated bb-with-two-tail-calls.s Full diff: https://github.com/llvm/llvm-project/pull/91906.diff 4 Files Affected:
diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
index 7ffb5d7c00816..817ad288aa34b 100644
--- a/bolt/docs/BAT.md
+++ b/bolt/docs/BAT.md
@@ -106,9 +106,14 @@ equals output offset.
`BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
(branch or call instruction). If not set, it signifies a control flow target
(basic block offset).
+
`InputAddr` is omitted for equal offsets in input and output function. In this
case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.
+Deleted basic blocks are emitted as having `OutputOffset` equal to the size of
+the function. They don't affect address translation and only participate in
+input basic block mapping.
+
### Secondary Entry Points table
The table is emitted for hot fragments only. It contains `NumSecEntryPoints`
offsets denoting secondary entry points, delta encoded, implicitly starting at zero.
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index 68b993ee363cc..16fe0442f10a5 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -217,6 +217,7 @@ class BoltAddressTranslation {
auto begin() const { return Map.begin(); }
auto end() const { return Map.end(); }
auto upper_bound(uint32_t Offset) const { return Map.upper_bound(Offset); }
+ auto size() const { return Map.size(); }
};
/// Map function output address to its hash and basic blocks hash map.
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index 7cfb9c132c2c6..724b348c7845e 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -108,6 +108,19 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
for (const BinaryBasicBlock *const BB :
Function.getLayout().getMainFragment())
writeEntriesForBB(Map, *BB, InputAddress, OutputAddress);
+ // Add entries for deleted blocks. They are still required for correct BB
+ // mapping of branches modified by SCTC. By convention, they would have the
+ // end of the function as output address.
+ const BBHashMapTy &BBHashMap = getBBHashMap(InputAddress);
+ if (BBHashMap.size() != Function.size()) {
+ const uint64_t EndOffset = Function.getOutputSize();
+ std::unordered_set<uint32_t> MappedInputOffsets;
+ for (const BinaryBasicBlock &BB : Function)
+ MappedInputOffsets.emplace(BB.getInputOffset());
+ for (const auto &[InputOffset, _] : BBHashMap)
+ if (!llvm::is_contained(MappedInputOffsets, InputOffset))
+ Map[EndOffset] = InputOffset << 1;
+ }
Maps.emplace(Function.getOutputAddress(), std::move(Map));
ReverseMap.emplace(OutputAddress, InputAddress);
diff --git a/bolt/test/X86/bb-with-two-tail-calls.s b/bolt/test/X86/bb-with-two-tail-calls.s
index bb2b0cd4cc23a..c0b01e1894a49 100644
--- a/bolt/test/X86/bb-with-two-tail-calls.s
+++ b/bolt/test/X86/bb-with-two-tail-calls.s
@@ -10,11 +10,20 @@
# RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -nostdlib
# RUN: llvm-bolt %t.exe -o %t.out --data %t.fdata --lite=0 --dyno-stats \
# RUN: --print-sctc --print-only=_start -enable-bat 2>&1 | FileCheck %s
+# RUN: llvm-objdump --syms %t.out | FileCheck %s --check-prefix=CHECK-NM
+# RUN: llvm-bat-dump %t.out --dump-all | FileCheck %s --check-prefix=CHECK-BAT
+
# CHECK-NOT: Assertion `BranchInfo.size() == 2 && "could only be called for blocks with 2 successors"' failed.
# Two tail calls in the same basic block after SCTC:
# CHECK: {{.*}}: ja {{.*}} # TAILCALL # Offset: 7 # CTCTakenCount: 4
# CHECK-NEXT: {{.*}}: jmp {{.*}} # TAILCALL # Offset: 12
+# Confirm that a deleted basic block is emitted at function end offset (0xe)
+# CHECK-NM: 0000000000600000 g .text 000000000000000e _start
+# CHECK-BAT: Function Address: 0x600000, hash: 0xf8bf620b266cdc1b
+# CHECK-BAT: 0xe -> 0xc hash: 0x823623240f000c
+# CHECK-BAT: NumBlocks: 5
+
.globl _start
_start:
je x
|
Created using spr 1.3.4 [skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.5 [skip ci]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you elaborate a bit better on why do we need a deleted block to be present in the table? My memory fails me, aren't we using the translation table just to map samples collected on the bolted binary? Where do the deleted blocks become a problem?
Other than the motivation, the implementation itself looks good to me.
We need deleted blocks to correctly map optimized tail calls to containing basic blocks. Because we expand tail calls during CFG construction, and attach the profile to that CFG, we need those intermediate blocks to be present in the profile. bb-with-two-tail-calls.s test offers a good example:
after SCTC, we normally write out BAT from that state:
So in order to map this branch
to its block Ltmp2 we need to preserve the block entry in BAT, although it's removed from CFG. |
Thanks for the detailed explanation. So essentially the output offset is not important because these deleted blocks are only useful for their input offset, which will be used in BoltAddressTranslation::getFallthroughsInTrace() to create traffic in this to-be deleted block, is that right? |
The entries for deleted basic blocks won't participate in llvm-project/bolt/lib/Profile/BoltAddressTranslation.cpp Lines 529 to 540 in 098c6df
But deleted blocks will be used in llvm-project/bolt/lib/Profile/DataAggregator.cpp Lines 2358 to 2367 in 098c6df
|
Oh I see, thanks! |
Deleted basic blocks are required for correct mapping of branches
modified by SCTC.
Increases BAT size, bytes:
Test Plan: updated bb-with-two-tail-calls.s