Skip to content

Commit ff5e2ba

Browse files
authored
[BOLT] Improve handling of relocations targeting specific instructions (#66395)
On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register it without a symbol; - During disassembly, whenever we encounter an instruction with such a relocation, create a symbol for its target and store it in an offset to symbol map (to ensure multiple relocations referencing the same instruction use the same label); - After disassembly, iterate this map to attach labels to instructions via the new annotation type; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.
1 parent aad7e0a commit ff5e2ba

14 files changed

+133
-11
lines changed

bolt/include/bolt/Core/MCPlus.h

+1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ class MCAnnotation {
6666
kTailCall, /// Tail call.
6767
kConditionalTailCall, /// CTC.
6868
kOffset, /// Offset in the function.
69+
kLabel, /// MCSymbol pointing to this instruction.
6970
kGeneric /// First generic annotation.
7071
};
7172

bolt/include/bolt/Core/MCPlusBuilder.h

+7
Original file line numberDiff line numberDiff line change
@@ -1180,6 +1180,13 @@ class MCPlusBuilder {
11801180
/// Remove offset annotation.
11811181
bool clearOffset(MCInst &Inst);
11821182

1183+
/// Return the label of \p Inst, if available.
1184+
std::optional<MCSymbol *> getLabel(const MCInst &Inst) const;
1185+
1186+
/// Set the label of \p Inst. This label will be emitted right before \p Inst
1187+
/// is emitted to MCStreamer.
1188+
bool setLabel(MCInst &Inst, MCSymbol *Label);
1189+
11831190
/// Return MCSymbol that represents a target of this instruction at a given
11841191
/// operand number \p OpNum. If there's no symbol associated with
11851192
/// the operand - return nullptr.

bolt/include/bolt/Core/Relocation.h

+4
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,10 @@ struct Relocation {
9898
/// Return true if relocation type is for thread local storage.
9999
static bool isTLS(uint64_t Type);
100100

101+
/// Return true of relocation type is for referencing a specific instruction
102+
/// (as opposed to a function, basic block, etc).
103+
static bool isInstructionReference(uint64_t Type);
104+
101105
/// Return code for a NONE relocation
102106
static uint64_t getNone();
103107

bolt/lib/Core/BinaryContext.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -1888,6 +1888,8 @@ void BinaryContext::printInstruction(raw_ostream &OS, const MCInst &Instruction,
18881888
}
18891889
if (std::optional<uint32_t> Offset = MIB->getOffset(Instruction))
18901890
OS << " # Offset: " << *Offset;
1891+
if (auto Label = MIB->getLabel(Instruction))
1892+
OS << " # Label: " << **Label;
18911893

18921894
MIB->printAnnotations(Instruction, OS);
18931895

bolt/lib/Core/BinaryEmitter.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,9 @@ void BinaryEmitter::emitFunctionBody(BinaryFunction &BF, FunctionFragment &FF,
498498
BB->getLocSyms().emplace_back(Offset, LocSym);
499499
}
500500

501+
if (auto Label = BC.MIB->getLabel(Instr))
502+
Streamer.emitLabel(*Label);
503+
501504
Streamer.emitInstruction(Instr, *BC.STI);
502505
LastIsPrefix = BC.MIB->isPrefix(Instr);
503506
}

bolt/lib/Core/BinaryFunction.cpp

+30-2
Original file line numberDiff line numberDiff line change
@@ -1173,6 +1173,13 @@ bool BinaryFunction::disassemble() {
11731173
// basic block.
11741174
Labels[0] = Ctx->createNamedTempSymbol("BB0");
11751175

1176+
// Map offsets in the function to a label that should always point to the
1177+
// corresponding instruction. This is used for labels that shouldn't point to
1178+
// the start of a basic block but always to a specific instruction. This is
1179+
// used, for example, on RISC-V where %pcrel_lo relocations point to the
1180+
// corresponding %pcrel_hi.
1181+
LabelsMapType InstructionLabels;
1182+
11761183
uint64_t Size = 0; // instruction size
11771184
for (uint64_t Offset = 0; Offset < getSize(); Offset += Size) {
11781185
MCInst Instruction;
@@ -1329,9 +1336,23 @@ bool BinaryFunction::disassemble() {
13291336
ItrE = Relocations.lower_bound(Offset + Size);
13301337
Itr != ItrE; ++Itr) {
13311338
const Relocation &Relocation = Itr->second;
1339+
MCSymbol *Symbol = Relocation.Symbol;
1340+
1341+
if (Relocation::isInstructionReference(Relocation.Type)) {
1342+
uint64_t RefOffset = Relocation.Value - getAddress();
1343+
LabelsMapType::iterator LI = InstructionLabels.find(RefOffset);
1344+
1345+
if (LI == InstructionLabels.end()) {
1346+
Symbol = BC.Ctx->createNamedTempSymbol();
1347+
InstructionLabels.emplace(RefOffset, Symbol);
1348+
} else {
1349+
Symbol = LI->second;
1350+
}
1351+
}
1352+
13321353
int64_t Value = Relocation.Value;
13331354
const bool Result = BC.MIB->replaceImmWithSymbolRef(
1334-
Instruction, Relocation.Symbol, Relocation.Addend, Ctx.get(), Value,
1355+
Instruction, Symbol, Relocation.Addend, Ctx.get(), Value,
13351356
Relocation.Type);
13361357
(void)Result;
13371358
assert(Result && "cannot replace immediate with relocation");
@@ -1366,6 +1387,13 @@ bool BinaryFunction::disassemble() {
13661387
addInstruction(Offset, std::move(Instruction));
13671388
}
13681389

1390+
for (auto [Offset, Label] : InstructionLabels) {
1391+
InstrMapType::iterator II = Instructions.find(Offset);
1392+
assert(II != Instructions.end() && "reference to non-existing instruction");
1393+
1394+
BC.MIB->setLabel(II->second, Label);
1395+
}
1396+
13691397
// Reset symbolizer for the disassembler.
13701398
BC.SymbolicDisAsm->setSymbolizer(nullptr);
13711399

@@ -4489,7 +4517,7 @@ void BinaryFunction::addRelocation(uint64_t Address, MCSymbol *Symbol,
44894517
uint64_t Offset = Address - getAddress();
44904518
LLVM_DEBUG(dbgs() << "BOLT-DEBUG: addRelocation in "
44914519
<< formatv("{0}@{1:x} against {2}\n", *this, Offset,
4492-
Symbol->getName()));
4520+
(Symbol ? Symbol->getName() : "<undef>")));
44934521
bool IsCI = BC.isAArch64() && isInConstantIsland(Address);
44944522
std::map<uint64_t, Relocation> &Rels =
44954523
IsCI ? Islands->Relocations : Relocations;

bolt/lib/Core/MCPlusBuilder.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,17 @@ bool MCPlusBuilder::clearOffset(MCInst &Inst) {
268268
return true;
269269
}
270270

271+
std::optional<MCSymbol *> MCPlusBuilder::getLabel(const MCInst &Inst) const {
272+
if (auto Label = tryGetAnnotationAs<MCSymbol *>(Inst, MCAnnotation::kLabel))
273+
return *Label;
274+
return std::nullopt;
275+
}
276+
277+
bool MCPlusBuilder::setLabel(MCInst &Inst, MCSymbol *Label) {
278+
getOrCreateAnnotationAs<MCSymbol *>(Inst, MCAnnotation::kLabel) = Label;
279+
return true;
280+
}
281+
271282
bool MCPlusBuilder::hasAnnotation(const MCInst &Inst, unsigned Index) const {
272283
const MCInst *AnnotationInst = getAnnotationInst(Inst);
273284
if (!AnnotationInst)

bolt/lib/Core/Relocation.cpp

+13
Original file line numberDiff line numberDiff line change
@@ -858,6 +858,19 @@ bool Relocation::isTLS(uint64_t Type) {
858858
return isTLSX86(Type);
859859
}
860860

861+
bool Relocation::isInstructionReference(uint64_t Type) {
862+
if (Arch != Triple::riscv64)
863+
return false;
864+
865+
switch (Type) {
866+
default:
867+
return false;
868+
case ELF::R_RISCV_PCREL_LO12_I:
869+
case ELF::R_RISCV_PCREL_LO12_S:
870+
return true;
871+
}
872+
}
873+
861874
uint64_t Relocation::getNone() {
862875
if (Arch == Triple::aarch64)
863876
return ELF::R_AARCH64_NONE;

bolt/lib/Passes/BinaryPasses.cpp

+5
Original file line numberDiff line numberDiff line change
@@ -575,6 +575,7 @@ bool CheckLargeFunctions::shouldOptimize(const BinaryFunction &BF) const {
575575

576576
void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
577577
std::vector<std::pair<MCInst *, uint32_t>> PreservedOffsetAnnotations;
578+
std::vector<std::pair<MCInst *, MCSymbol *>> PreservedLabelAnnotations;
578579

579580
for (auto &It : BC.getBinaryFunctions()) {
580581
BinaryFunction &BF = It.second;
@@ -609,6 +610,8 @@ void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
609610
if (BF.requiresAddressTranslation() && BC.MIB->getOffset(*II))
610611
PreservedOffsetAnnotations.emplace_back(&(*II),
611612
*BC.MIB->getOffset(*II));
613+
if (auto Label = BC.MIB->getLabel(*II))
614+
PreservedLabelAnnotations.emplace_back(&*II, *Label);
612615
BC.MIB->stripAnnotations(*II);
613616
}
614617
}
@@ -625,6 +628,8 @@ void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
625628
// Reinsert preserved annotations we need during code emission.
626629
for (const std::pair<MCInst *, uint32_t> &Item : PreservedOffsetAnnotations)
627630
BC.MIB->setOffset(*Item.first, Item.second);
631+
for (auto [Instr, Label] : PreservedLabelAnnotations)
632+
BC.MIB->setLabel(*Instr, Label);
628633
}
629634

630635
// Check for dirty state in MCSymbol objects that might be a consequence

bolt/lib/Rewrite/RewriteInstance.cpp

+11-1
Original file line numberDiff line numberDiff line change
@@ -2546,7 +2546,17 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
25462546
// Adjust the point of reference to a code location inside a function.
25472547
if (ReferencedBF->containsAddress(Address, /*UseMaxSize = */ true)) {
25482548
RefFunctionOffset = Address - ReferencedBF->getAddress();
2549-
if (RefFunctionOffset) {
2549+
if (Relocation::isInstructionReference(RType)) {
2550+
// Instruction labels are created while disassembling so we just leave
2551+
// the symbol empty for now. Since the extracted value is typically
2552+
// unrelated to the referenced symbol (e.g., %pcrel_lo in RISC-V
2553+
// references an instruction but the patched value references the low
2554+
// bits of a data address), we set the extracted value to the symbol
2555+
// address in order to be able to correctly reconstruct the reference
2556+
// later.
2557+
ReferencedSymbol = nullptr;
2558+
ExtractedValue = Address;
2559+
} else if (RefFunctionOffset) {
25502560
if (ContainingBF && ContainingBF != ReferencedBF) {
25512561
ReferencedSymbol =
25522562
ReferencedBF->addEntryPointAtOffset(RefFunctionOffset);

bolt/test/RISCV/reloc-abs.s

+1-2
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,7 @@ _start:
1717
.option push
1818
.option norelax
1919
1:
20-
// CHECK: .Ltmp0
21-
// CHECK: auipc gp, %pcrel_hi(__global_pointer$)
20+
// CHECK: auipc gp, %pcrel_hi(__global_pointer$) # Label: .Ltmp0
2221
// CHECK-NEXT: addi gp, gp, %pcrel_lo(.Ltmp0)
2322
auipc gp, %pcrel_hi(__global_pointer$)
2423
addi gp, gp, %pcrel_lo(1b)

bolt/test/RISCV/reloc-bb-split.s

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// RUN: %clang %cflags -o %t %s
2+
// RUN: llvm-bolt --print-cfg --print-only=_start -o /dev/null %t \
3+
// RUN: | FileCheck %s
4+
5+
.data
6+
.globl d
7+
.p2align 3
8+
d:
9+
.dword 0
10+
11+
.text
12+
.globl _start
13+
.p2align 1
14+
// CHECK-LABEL: Binary Function "_start" after building cfg {
15+
_start:
16+
/// The local label is used for %pcrel_lo as well as a jump target so a new
17+
/// basic block should start there.
18+
// CHECK-LABEL: {{^}}.LBB00
19+
// CHECK: nop
20+
// CHECK-LABEL: {{^}}.Ltmp0
21+
// CHECK: auipc t0, %pcrel_hi(d) # Label: .Ltmp1
22+
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp1)(t0)
23+
// CHECK-NEXT: j .Ltmp0
24+
nop
25+
1:
26+
auipc t0, %pcrel_hi(d)
27+
ld t0, %pcrel_lo(1b)(t0)
28+
j 1b
29+
30+
/// The local label is used only for %pcrel_lo so no new basic block should
31+
/// start there.
32+
// CHECK-LABEL: {{^}}.LFT0
33+
// CHECK: nop
34+
// CHECK-NEXT: auipc t0, %pcrel_hi(d) # Label: .Ltmp2
35+
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp2)(t0)
36+
// CHECK-NEXT: ret
37+
nop
38+
1:
39+
auipc t0, %pcrel_hi(d)
40+
ld t0, %pcrel_lo(1b)(t0)
41+
ret
42+
.size _start, .-_start

bolt/test/RISCV/reloc-got.s

+1-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ d:
1414
// CHECK: Binary Function "_start" after building cfg {
1515
_start:
1616
nop // Here to not make the _start and .Ltmp0 symbols coincide
17-
// CHECK: .Ltmp0
18-
// CHECK: auipc t0, %pcrel_hi(__BOLT_got_zero+{{[0-9]+}})
17+
// CHECK: auipc t0, %pcrel_hi(__BOLT_got_zero+{{[0-9]+}}) # Label: .Ltmp0
1918
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp0)(t0)
2019
1:
2120
auipc t0, %got_pcrel_hi(d)

bolt/test/RISCV/reloc-pcrel.s

+2-4
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,10 @@ d:
1414
// CHECK: Binary Function "_start" after building cfg {
1515
_start:
1616
nop // Here to not make the _start and .Ltmp0 symbols coincide
17-
// CHECK: .Ltmp0
18-
// CHECK: auipc t0, %pcrel_hi(d)
17+
// CHECK: auipc t0, %pcrel_hi(d) # Label: .Ltmp0
1918
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp0)(t0)
2019
ld t0, d
21-
// CHECK: .Ltmp1
22-
// CHECK: auipc t1, %pcrel_hi(d)
20+
// CHECK-NEXT: auipc t1, %pcrel_hi(d) # Label: .Ltmp1
2321
// CHECK-NEXT: sd t0, %pcrel_lo(.Ltmp1)(t1)
2422
sd t0, d, t1
2523
ret

0 commit comments

Comments
 (0)