Skip to content

[BOLT][BAT] Add entries for deleted basic blocks #91906

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented May 13, 2024

Deleted basic blocks are required for correct mapping of branches
modified by SCTC.

Increases BAT size, bytes:

  • large binary: 8622496 -> 8703244.
  • small binary (X86/bolt-address-translation.test): 928 -> 940.

Test Plan: updated bb-with-two-tail-calls.s

aaupov added 2 commits May 12, 2024 17:03
Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented May 13, 2024

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

Deleted basic blocks are required for correct mapping of branches
modified by SCTC.

Test Plan: updated bb-with-two-tail-calls.s


Full diff: https://github.com/llvm/llvm-project/pull/91906.diff

4 Files Affected:

  • (modified) bolt/docs/BAT.md (+5)
  • (modified) bolt/include/bolt/Profile/BoltAddressTranslation.h (+1)
  • (modified) bolt/lib/Profile/BoltAddressTranslation.cpp (+13)
  • (modified) bolt/test/X86/bb-with-two-tail-calls.s (+9)
diff --git a/bolt/docs/BAT.md b/bolt/docs/BAT.md
index 7ffb5d7c00816..817ad288aa34b 100644
--- a/bolt/docs/BAT.md
+++ b/bolt/docs/BAT.md
@@ -106,9 +106,14 @@ equals output offset.
 `BRANCHENTRY` bit denotes whether a given offset pair is a control flow source
 (branch or call instruction). If not set, it signifies a control flow target
 (basic block offset).
+
 `InputAddr` is omitted for equal offsets in input and output function. In this
 case, `BRANCHENTRY` bits are encoded separately in a `BranchEntries` bitvector.
 
+Deleted basic blocks are emitted as having `OutputOffset` equal to the size of
+the function. They don't affect address translation and only participate in
+input basic block mapping.
+
 ### Secondary Entry Points table
 The table is emitted for hot fragments only. It contains `NumSecEntryPoints`
 offsets denoting secondary entry points, delta encoded, implicitly starting at zero.
diff --git a/bolt/include/bolt/Profile/BoltAddressTranslation.h b/bolt/include/bolt/Profile/BoltAddressTranslation.h
index 68b993ee363cc..16fe0442f10a5 100644
--- a/bolt/include/bolt/Profile/BoltAddressTranslation.h
+++ b/bolt/include/bolt/Profile/BoltAddressTranslation.h
@@ -217,6 +217,7 @@ class BoltAddressTranslation {
     auto begin() const { return Map.begin(); }
     auto end() const { return Map.end(); }
     auto upper_bound(uint32_t Offset) const { return Map.upper_bound(Offset); }
+    auto size() const { return Map.size(); }
   };
 
   /// Map function output address to its hash and basic blocks hash map.
diff --git a/bolt/lib/Profile/BoltAddressTranslation.cpp b/bolt/lib/Profile/BoltAddressTranslation.cpp
index 7cfb9c132c2c6..724b348c7845e 100644
--- a/bolt/lib/Profile/BoltAddressTranslation.cpp
+++ b/bolt/lib/Profile/BoltAddressTranslation.cpp
@@ -108,6 +108,19 @@ void BoltAddressTranslation::write(const BinaryContext &BC, raw_ostream &OS) {
     for (const BinaryBasicBlock *const BB :
          Function.getLayout().getMainFragment())
       writeEntriesForBB(Map, *BB, InputAddress, OutputAddress);
+    // Add entries for deleted blocks. They are still required for correct BB
+    // mapping of branches modified by SCTC. By convention, they would have the
+    // end of the function as output address.
+    const BBHashMapTy &BBHashMap = getBBHashMap(InputAddress);
+    if (BBHashMap.size() != Function.size()) {
+      const uint64_t EndOffset = Function.getOutputSize();
+      std::unordered_set<uint32_t> MappedInputOffsets;
+      for (const BinaryBasicBlock &BB : Function)
+        MappedInputOffsets.emplace(BB.getInputOffset());
+      for (const auto &[InputOffset, _] : BBHashMap)
+        if (!llvm::is_contained(MappedInputOffsets, InputOffset))
+          Map[EndOffset] = InputOffset << 1;
+    }
     Maps.emplace(Function.getOutputAddress(), std::move(Map));
     ReverseMap.emplace(OutputAddress, InputAddress);
 
diff --git a/bolt/test/X86/bb-with-two-tail-calls.s b/bolt/test/X86/bb-with-two-tail-calls.s
index bb2b0cd4cc23a..c0b01e1894a49 100644
--- a/bolt/test/X86/bb-with-two-tail-calls.s
+++ b/bolt/test/X86/bb-with-two-tail-calls.s
@@ -10,11 +10,20 @@
 # RUN: %clang %cflags %t.o -o %t.exe -Wl,-q -nostdlib
 # RUN: llvm-bolt %t.exe -o %t.out --data %t.fdata --lite=0 --dyno-stats \
 # RUN:    --print-sctc --print-only=_start -enable-bat 2>&1 | FileCheck %s
+# RUN: llvm-objdump --syms %t.out | FileCheck %s --check-prefix=CHECK-NM
+# RUN: llvm-bat-dump %t.out --dump-all | FileCheck %s --check-prefix=CHECK-BAT
+
 # CHECK-NOT: Assertion `BranchInfo.size() == 2 && "could only be called for blocks with 2 successors"' failed.
 # Two tail calls in the same basic block after SCTC:
 # CHECK:         {{.*}}:   ja      {{.*}} # TAILCALL # Offset: 7 # CTCTakenCount: 4
 # CHECK-NEXT:    {{.*}}:   jmp     {{.*}} # TAILCALL # Offset: 12
 
+# Confirm that a deleted basic block is emitted at function end offset (0xe)
+# CHECK-NM: 0000000000600000 g       .text  000000000000000e _start
+# CHECK-BAT: Function Address: 0x600000, hash: 0xf8bf620b266cdc1b
+# CHECK-BAT: 0xe -> 0xc hash: 0x823623240f000c
+# CHECK-BAT: NumBlocks: 5
+
   .globl _start
 _start:
     je x

aaupov added 6 commits May 12, 2024 17:32
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
Created using spr 1.3.4
Created using spr 1.3.5

[skip ci]
Created using spr 1.3.5
@aaupov aaupov changed the base branch from users/aaupov/spr/main.boltbat-add-entries-for-deleted-basic-blocks to main May 17, 2024 03:04
Copy link
Contributor

@rafaelauler rafaelauler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you elaborate a bit better on why do we need a deleted block to be present in the table? My memory fails me, aren't we using the translation table just to map samples collected on the bolted binary? Where do the deleted blocks become a problem?

Other than the motivation, the implementation itself looks good to me.

@aaupov
Copy link
Contributor Author

aaupov commented May 24, 2024

Could you elaborate a bit better on why do we need a deleted block to be present in the table? My memory fails me, aren't we using the translation table just to map samples collected on the bolted binary? Where do the deleted blocks become a problem?

Other than the motivation, the implementation itself looks good to me.

We need deleted blocks to correctly map optimized tail calls to containing basic blocks. Because we expand tail calls during CFG construction, and attach the profile to that CFG, we need those intermediate blocks to be present in the profile.

bb-with-two-tail-calls.s test offers a good example:
after building CFG (details omitted):

.LBB00 (1 instructions, align : 1)
    00000000:   je      .Ltmp0 # Offset: 0

.LFT0 (2 instructions, align : 1)
    00000002:   ja      .Ltmp1 # Offset: 2
    00000004:   jmp     .Ltmp2 # Offset: 4

.Ltmp0 (1 instructions, align : 1)
    00000006:   retq # Offset: 6

.Ltmp1 (1 instructions, align : 1)
    00000007:   jmp     e # TAILCALL  # Offset: 7

.Ltmp2 (2 instructions, align : 1)
    0000000c:   nop # Offset: 12 # Size: 1 # NOP: 1
    0000000d:   jmp     f # TAILCALL  # Offset: 13

after SCTC, we normally write out BAT from that state:

.LBB00 (1 instructions, align : 1)
    00000000:   je      .Ltmp0 # Offset: 0

.LFT0 (2 instructions, align : 1)
    00000002:   ja      e # TAILCALL  # Offset: 7 # CTCTakenCount: 4
    00000004:   jmp     f # TAILCALL  # Offset: 13

.Ltmp0 (1 instructions, align : 1)
    00000009:   retq # Offset: 6

So in order to map this branch

    00000004:   jmp     f # TAILCALL  # Offset: 13

to its block Ltmp2 we need to preserve the block entry in BAT, although it's removed from CFG.

Created using spr 1.3.4
@aaupov aaupov merged commit d1d9545 into main May 24, 2024
6 checks passed
@aaupov aaupov deleted the users/aaupov/spr/boltbat-add-entries-for-deleted-basic-blocks branch May 24, 2024 02:19
@rafaelauler
Copy link
Contributor

Thanks for the detailed explanation.

So essentially the output offset is not important because these deleted blocks are only useful for their input offset, which will be used in BoltAddressTranslation::getFallthroughsInTrace() to create traffic in this to-be deleted block, is that right?

@aaupov
Copy link
Contributor Author

aaupov commented May 24, 2024

The entries for deleted basic blocks won't participate in BAT::getFallthroughsInTrace because trace boundary is looked up by output offsets:

auto FromIter = Map.upper_bound(From);
if (FromIter == Map.begin())
return Res;
// Skip instruction entries, to create fallthroughs we are only interested in
// BB boundaries
do {
if (FromIter == Map.begin())
return Res;
--FromIter;
} while (FromIter->second & BRANCHENTRY);
auto ToIter = Map.upper_bound(To);

But deleted blocks will be used in writeBATYAML where we lookup containing blocks by input offsets:

// Lookup containing basic block offset and index
auto getBlock = [&BlockMap](uint32_t Offset) {
auto BlockIt = BlockMap.upper_bound(Offset);
if (LLVM_UNLIKELY(BlockIt == BlockMap.begin())) {
errs() << "BOLT-ERROR: invalid BAT section\n";
exit(1);
}
--BlockIt;
return std::pair(BlockIt->first, BlockIt->second.Index);
};

@rafaelauler
Copy link
Contributor

Oh I see, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants