Skip to content

Commit 8233eec

Browse files
committed
[BOLT] Improve handling of relocations targeting specific instructions
On RISC-V, there are certain relocations that target a specific instruction instead of a more abstract location like a function or basic block. Take the following example that loads a value from symbol `foo`: ``` nop 1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(1b)(t0) ``` This results in two relocation: - auipc: `R_RISCV_PCREL_HI20` referencing `foo`; - ld: `R_RISCV_PCREL_LO12_I` referencing to local label `1` which points to the auipc instruction. It is of utmost importance that the `R_RISCV_PCREL_LO12_I` keeps referring to the auipc instruction; if not, the program will fail to assemble. However, BOLT currently does not guarantee this. BOLT currently assumes that all local symbols are jump targets and always starts a new basic block at symbol locations. The example above results in a CFG the looks like this: ``` .BB0: nop .BB1: auipc t0, %pcrel_hi(foo) ld t0, %pcrel_lo(.BB1)(t0) ``` While this currently works (i.e., the `R_RISCV_PCREL_LO12_I` relocation points to the correct instruction), it has two downsides: - Too many basic blocks are created (the example above is logically only one yet two are created); - If instructions are inserted in `.BB1` (e.g., by instrumentation), things will break since the label will not point to the auipc anymore. This patch proposes to fix this issue by teaching BOLT to track labels that should always point to a specific instruction. This is implemented as follows: - Add a new annotation type (`kLabel`) that allows us to annotate instructions with an `MCSymbol *`; - Whenever we encounter a relocation type that is used to refer to a specific instruction (`Relocation::isInstructionReference`), we register a new type of label (`InstructionLabels`) with the corresponding `BinaryFunction`; - During disassembly, add these instruction labels to the correct instructions; - During emission, emit these labels right before the instruction. I believe the use of annotations works quite well for this use case as it allows us to reliably track instruction labels. If we were to store them as offsets in basic blocks, it would be error prone to keep them updated whenever instructions are inserted or removed. I have chosen to add labels as first-class annotations (as opposed to a generic one) because the documentation of `MCAnnotation` suggests that generic annotations are to be used for optional metadata that can be discarded without affecting correctness. As this is not the case for labels, a first-class annotation seemed more appropriate.
1 parent f8f934e commit 8233eec

15 files changed

+126
-9
lines changed

bolt/include/bolt/Core/BinaryFunction.h

+13
Original file line numberDiff line numberDiff line change
@@ -468,6 +468,13 @@ class BinaryFunction {
468468
using LabelsMapType = std::map<uint32_t, MCSymbol *>;
469469
LabelsMapType Labels;
470470

471+
/// Map offset in the function to a label that should always point to the
472+
/// corresponding instruction. This is used for labels that shouldn't point to
473+
/// the start of a basic block but always to a specific instruction. This is
474+
/// used, for example, on RISC-V where %pcrel_lo relocations point to the
475+
/// corresponding %pcrel_hi.
476+
LabelsMapType InstructionLabels;
477+
471478
/// Temporary holder of instructions before CFG is constructed.
472479
/// Map offset in the function to MCInst.
473480
using InstrMapType = std::map<uint32_t, MCInst>;
@@ -591,6 +598,11 @@ class BinaryFunction {
591598
/// a global symbol that corresponds to an entry at this address.
592599
MCSymbol *getOrCreateLocalLabel(uint64_t Address, bool CreatePastEnd = false);
593600

601+
/// Return a label for the instruction at a given \p Address in the function.
602+
/// This label will not be used to delineate basic blocks in the CFG but will
603+
/// be attached to the corresponding instruction during disassembly.
604+
MCSymbol *getOrCreateInstructionLabel(uint64_t Address);
605+
594606
/// Register an data entry at a given \p Offset into the function.
595607
void markDataAtOffset(uint64_t Offset) {
596608
if (!Islands)
@@ -722,6 +734,7 @@ class BinaryFunction {
722734
clearList(LSDATypeAddressTable);
723735

724736
clearList(LabelToBB);
737+
clearList(InstructionLabels);
725738

726739
if (!isMultiEntry())
727740
clearList(Labels);

bolt/include/bolt/Core/MCPlus.h

+1
Original file line numberDiff line numberDiff line change
@@ -66,6 +66,7 @@ class MCAnnotation {
6666
kTailCall, /// Tail call.
6767
kConditionalTailCall, /// CTC.
6868
kOffset, /// Offset in the function.
69+
kLabel, /// MCSymbol pointing to this instruction.
6970
kGeneric /// First generic annotation.
7071
};
7172

bolt/include/bolt/Core/MCPlusBuilder.h

+7
Original file line numberDiff line numberDiff line change
@@ -1169,6 +1169,13 @@ class MCPlusBuilder {
11691169
/// Remove offset annotation.
11701170
bool clearOffset(MCInst &Inst);
11711171

1172+
/// Return the label of \p Inst, if available.
1173+
std::optional<MCSymbol *> getLabel(const MCInst &Inst) const;
1174+
1175+
/// Set the label of \p Inst. This label will be emitted right before \p Inst
1176+
/// is emitted to MCStreamer.
1177+
bool setLabel(MCInst &Inst, MCSymbol *Label);
1178+
11721179
/// Return MCSymbol that represents a target of this instruction at a given
11731180
/// operand number \p OpNum. If there's no symbol associated with
11741181
/// the operand - return nullptr.

bolt/include/bolt/Core/Relocation.h

+4
Original file line numberDiff line numberDiff line change
@@ -97,6 +97,10 @@ struct Relocation {
9797
/// Return true if relocation type is for thread local storage.
9898
static bool isTLS(uint64_t Type);
9999

100+
/// Return true of relocation type is for referencing a specific instruction
101+
/// (as opposed to a function, basic block, etc).
102+
static bool isInstructionReference(uint64_t Type);
103+
100104
/// Return code for a NONE relocation
101105
static uint64_t getNone();
102106

bolt/lib/Core/BinaryContext.cpp

+2
Original file line numberDiff line numberDiff line change
@@ -1863,6 +1863,8 @@ void BinaryContext::printInstruction(raw_ostream &OS, const MCInst &Instruction,
18631863
}
18641864
if (std::optional<uint32_t> Offset = MIB->getOffset(Instruction))
18651865
OS << " # Offset: " << *Offset;
1866+
if (auto Label = MIB->getLabel(Instruction))
1867+
OS << " # Label: " << **Label;
18661868

18671869
MIB->printAnnotations(Instruction, OS);
18681870

bolt/lib/Core/BinaryEmitter.cpp

+3
Original file line numberDiff line numberDiff line change
@@ -498,6 +498,9 @@ void BinaryEmitter::emitFunctionBody(BinaryFunction &BF, FunctionFragment &FF,
498498
BB->getLocSyms().emplace_back(Offset, LocSym);
499499
}
500500

501+
if (auto Label = BC.MIB->getLabel(Instr))
502+
Streamer.emitLabel(*Label);
503+
501504
Streamer.emitInstruction(Instr, *BC.STI);
502505
LastIsPrefix = BC.MIB->isPrefix(Instr);
503506
}

bolt/lib/Core/BinaryFunction.cpp

+18
Original file line numberDiff line numberDiff line change
@@ -965,6 +965,20 @@ MCSymbol *BinaryFunction::getOrCreateLocalLabel(uint64_t Address,
965965
return Label;
966966
}
967967

968+
MCSymbol *BinaryFunction::getOrCreateInstructionLabel(uint64_t Address) {
969+
const uint64_t Offset = Address - getAddress();
970+
assert(Offset < getSize() && "Instruction label past function end");
971+
972+
auto LI = InstructionLabels.find(Offset);
973+
if (LI != InstructionLabels.end())
974+
return LI->second;
975+
976+
MCSymbol *Label = BC.Ctx->createNamedTempSymbol();
977+
InstructionLabels[Offset] = Label;
978+
979+
return Label;
980+
}
981+
968982
ErrorOr<ArrayRef<uint8_t>> BinaryFunction::getData() const {
969983
BinarySection &Section = *getOriginSection();
970984
assert(Section.containsRange(getAddress(), getMaxSize()) &&
@@ -1363,6 +1377,10 @@ bool BinaryFunction::disassemble() {
13631377
MIB->addAnnotation(Instruction, "Size", static_cast<uint32_t>(Size));
13641378
}
13651379

1380+
auto InstructionLabel = InstructionLabels.find(Offset);
1381+
if (InstructionLabel != InstructionLabels.end())
1382+
BC.MIB->setLabel(Instruction, InstructionLabel->second);
1383+
13661384
addInstruction(Offset, std::move(Instruction));
13671385
}
13681386

bolt/lib/Core/MCPlusBuilder.cpp

+11
Original file line numberDiff line numberDiff line change
@@ -268,6 +268,17 @@ bool MCPlusBuilder::clearOffset(MCInst &Inst) {
268268
return true;
269269
}
270270

271+
std::optional<MCSymbol *> MCPlusBuilder::getLabel(const MCInst &Inst) const {
272+
if (auto Label = tryGetAnnotationAs<MCSymbol *>(Inst, MCAnnotation::kLabel))
273+
return *Label;
274+
return std::nullopt;
275+
}
276+
277+
bool MCPlusBuilder::setLabel(MCInst &Inst, MCSymbol *Label) {
278+
getOrCreateAnnotationAs<MCSymbol *>(Inst, MCAnnotation::kLabel) = Label;
279+
return true;
280+
}
281+
271282
bool MCPlusBuilder::hasAnnotation(const MCInst &Inst, unsigned Index) const {
272283
const MCInst *AnnotationInst = getAnnotationInst(Inst);
273284
if (!AnnotationInst)

bolt/lib/Core/Relocation.cpp

+13
Original file line numberDiff line numberDiff line change
@@ -797,6 +797,19 @@ bool Relocation::isTLS(uint64_t Type) {
797797
return isTLSX86(Type);
798798
}
799799

800+
bool Relocation::isInstructionReference(uint64_t Type) {
801+
if (Arch != Triple::riscv64)
802+
return false;
803+
804+
switch (Type) {
805+
default:
806+
return false;
807+
case ELF::R_RISCV_PCREL_LO12_I:
808+
case ELF::R_RISCV_PCREL_LO12_S:
809+
return true;
810+
}
811+
}
812+
800813
uint64_t Relocation::getNone() {
801814
if (Arch == Triple::aarch64)
802815
return ELF::R_AARCH64_NONE;

bolt/lib/Passes/BinaryPasses.cpp

+5
Original file line numberDiff line numberDiff line change
@@ -575,6 +575,7 @@ bool CheckLargeFunctions::shouldOptimize(const BinaryFunction &BF) const {
575575

576576
void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
577577
std::vector<std::pair<MCInst *, uint32_t>> PreservedOffsetAnnotations;
578+
std::vector<std::pair<MCInst *, MCSymbol *>> PreservedLabelAnnotations;
578579

579580
for (auto &It : BC.getBinaryFunctions()) {
580581
BinaryFunction &BF = It.second;
@@ -609,6 +610,8 @@ void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
609610
if (BF.requiresAddressTranslation() && BC.MIB->getOffset(*II))
610611
PreservedOffsetAnnotations.emplace_back(&(*II),
611612
*BC.MIB->getOffset(*II));
613+
if (auto Label = BC.MIB->getLabel(*II))
614+
PreservedLabelAnnotations.emplace_back(&*II, *Label);
612615
BC.MIB->stripAnnotations(*II);
613616
}
614617
}
@@ -625,6 +628,8 @@ void LowerAnnotations::runOnFunctions(BinaryContext &BC) {
625628
// Reinsert preserved annotations we need during code emission.
626629
for (const std::pair<MCInst *, uint32_t> &Item : PreservedOffsetAnnotations)
627630
BC.MIB->setOffset(*Item.first, Item.second);
631+
for (auto [Instr, Label] : PreservedLabelAnnotations)
632+
BC.MIB->setLabel(*Instr, Label);
628633
}
629634

630635
// Check for dirty state in MCSymbol objects that might be a consequence

bolt/lib/Rewrite/RewriteInstance.cpp

+3-1
Original file line numberDiff line numberDiff line change
@@ -2545,7 +2545,9 @@ void RewriteInstance::handleRelocation(const SectionRef &RelocatedSection,
25452545
// Adjust the point of reference to a code location inside a function.
25462546
if (ReferencedBF->containsAddress(Address, /*UseMaxSize = */ true)) {
25472547
RefFunctionOffset = Address - ReferencedBF->getAddress();
2548-
if (RefFunctionOffset) {
2548+
if (Relocation::isInstructionReference(RType)) {
2549+
ReferencedSymbol = ReferencedBF->getOrCreateInstructionLabel(Address);
2550+
} else if (RefFunctionOffset) {
25492551
if (ContainingBF && ContainingBF != ReferencedBF) {
25502552
ReferencedSymbol =
25512553
ReferencedBF->addEntryPointAtOffset(RefFunctionOffset);

bolt/test/RISCV/reloc-abs.s

+1-2
Original file line numberDiff line numberDiff line change
@@ -17,8 +17,7 @@ _start:
1717
.option push
1818
.option norelax
1919
1:
20-
// CHECK: .Ltmp0
21-
// CHECK: auipc gp, %pcrel_hi(__global_pointer$)
20+
// CHECK: auipc gp, %pcrel_hi(__global_pointer$) # Label: .Ltmp0
2221
// CHECK-NEXT: addi gp, gp, %pcrel_lo(.Ltmp0)
2322
auipc gp, %pcrel_hi(__global_pointer$)
2423
addi gp, gp, %pcrel_lo(1b)

bolt/test/RISCV/reloc-bb-split.s

+42
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,42 @@
1+
// RUN: %clang %cflags -o %t %s
2+
// RUN: llvm-bolt --print-cfg --print-only=_start -o /dev/null %t \
3+
// RUN: | FileCheck %s
4+
5+
.data
6+
.globl d
7+
.p2align 3
8+
d:
9+
.dword 0
10+
11+
.text
12+
.globl _start
13+
.p2align 1
14+
// CHECK-LABEL: Binary Function "_start" after building cfg {
15+
_start:
16+
/// The local label is used for %pcrel_lo as well as a jump target so a new
17+
/// basic block should start there.
18+
// CHECK-LABEL: {{^}}.LBB00
19+
// CHECK: nop
20+
// CHECK-LABEL: {{^}}.Ltmp1
21+
// CHECK: auipc t0, %pcrel_hi(d) # Label: .Ltmp0
22+
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp0)(t0)
23+
// CHECK-NEXT: j .Ltmp1
24+
nop
25+
1:
26+
auipc t0, %pcrel_hi(d)
27+
ld t0, %pcrel_lo(1b)(t0)
28+
j 1b
29+
30+
/// The local label is used only for %pcrel_lo so no new basic block should
31+
/// start there.
32+
// CHECK-LABEL: {{^}}.LFT0
33+
// CHECK: nop
34+
// CHECK-NEXT: auipc t0, %pcrel_hi(d) # Label: .Ltmp2
35+
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp2)(t0)
36+
// CHECK-NEXT: ret
37+
nop
38+
1:
39+
auipc t0, %pcrel_hi(d)
40+
ld t0, %pcrel_lo(1b)(t0)
41+
ret
42+
.size _start, .-_start

bolt/test/RISCV/reloc-got.s

+1-2
Original file line numberDiff line numberDiff line change
@@ -14,8 +14,7 @@ d:
1414
// CHECK: Binary Function "_start" after building cfg {
1515
_start:
1616
nop // Here to not make the _start and .Ltmp0 symbols coincide
17-
// CHECK: .Ltmp0
18-
// CHECK: auipc t0, %pcrel_hi(__BOLT_got_zero+{{[0-9]+}})
17+
// CHECK: auipc t0, %pcrel_hi(__BOLT_got_zero+{{[0-9]+}}) # Label: .Ltmp0
1918
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp0)(t0)
2019
1:
2120
auipc t0, %got_pcrel_hi(d)

bolt/test/RISCV/reloc-pcrel.s

+2-4
Original file line numberDiff line numberDiff line change
@@ -14,12 +14,10 @@ d:
1414
// CHECK: Binary Function "_start" after building cfg {
1515
_start:
1616
nop // Here to not make the _start and .Ltmp0 symbols coincide
17-
// CHECK: .Ltmp0
18-
// CHECK: auipc t0, %pcrel_hi(d)
17+
// CHECK: auipc t0, %pcrel_hi(d) # Label: .Ltmp0
1918
// CHECK-NEXT: ld t0, %pcrel_lo(.Ltmp0)(t0)
2019
ld t0, d
21-
// CHECK: .Ltmp1
22-
// CHECK: auipc t1, %pcrel_hi(d)
20+
// CHECK-NEXT: auipc t1, %pcrel_hi(d) # Label: .Ltmp1
2321
// CHECK-NEXT: sd t0, %pcrel_lo(.Ltmp1)(t1)
2422
sd t0, d, t1
2523
ret

0 commit comments

Comments
 (0)