Skip to content

Conversation

aaupov
Copy link
Contributor

@aaupov aaupov commented Jul 25, 2025

While registering profiled functions, only handle each address once.
Speeds up DataAggregator::preprocessProfile.

Test Plan:
For intermediate size pre-aggregated profile (10MB), reduces parsing
time from ~0.41s down to ~0.16s.

aaupov added 2 commits July 25, 2025 07:24
Created using spr 1.3.4
@llvmbot
Copy link
Member

llvmbot commented Jul 25, 2025

@llvm/pr-subscribers-bolt

Author: Amir Ayupov (aaupov)

Changes

While registering profiled functions, only handle each address once.
Speeds up DataAggregator::preprocessProfile.

Test Plan:
For intermediate size pre-aggregated profile (10MB), reduces parsing
time from ~0.41s down to ~0.16s.


Full diff: https://github.com/llvm/llvm-project/pull/150622.diff

2 Files Affected:

  • (modified) bolt/include/bolt/Profile/DataAggregator.h (+3)
  • (modified) bolt/lib/Profile/DataAggregator.cpp (+21-18)
diff --git a/bolt/include/bolt/Profile/DataAggregator.h b/bolt/include/bolt/Profile/DataAggregator.h
index db0f6903185b7..cb1b87f8d0d65 100644
--- a/bolt/include/bolt/Profile/DataAggregator.h
+++ b/bolt/include/bolt/Profile/DataAggregator.h
@@ -502,6 +502,9 @@ class DataAggregator : public DataReader {
   /// entries).
   void imputeFallThroughs();
 
+  /// Register profiled functions for lite mode.
+  void registerProfiledFunctions();
+
   /// Debugging dump methods
   void dump() const;
   void dump(const PerfBranchSample &Sample) const;
diff --git a/bolt/lib/Profile/DataAggregator.cpp b/bolt/lib/Profile/DataAggregator.cpp
index 3604fdd3a94b4..c13fa6dbe582b 100644
--- a/bolt/lib/Profile/DataAggregator.cpp
+++ b/bolt/lib/Profile/DataAggregator.cpp
@@ -581,6 +581,26 @@ void DataAggregator::imputeFallThroughs() {
     outs() << "BOLT-INFO: imputed " << InferredTraces << " traces\n";
 }
 
+void DataAggregator::registerProfiledFunctions() {
+  DenseSet<uint64_t> Addrs;
+  for (const auto &Trace : llvm::make_first_range(Traces)) {
+    if (Trace.Branch != Trace::FT_ONLY &&
+        Trace.Branch != Trace::FT_EXTERNAL_ORIGIN)
+      Addrs.insert(Trace.Branch);
+    Addrs.insert(Trace.From);
+  }
+
+  for (const auto [PC, _] : BasicSamples)
+    Addrs.insert(PC);
+
+  for (const PerfMemSample &MemSample : MemSamples)
+    Addrs.insert(MemSample.PC);
+
+  for (const uint64_t Addr : Addrs)
+    if (BinaryFunction *Func = getBinaryFunctionContainingAddress(Addr))
+      Func->setHasProfileAvailable();
+}
+
 Error DataAggregator::preprocessProfile(BinaryContext &BC) {
   this->BC = &BC;
 
@@ -603,6 +623,7 @@ Error DataAggregator::preprocessProfile(BinaryContext &BC) {
       exit(0);
   }
 
+  registerProfiledFunctions();
   return Error::success();
 }
 
@@ -1347,10 +1368,6 @@ std::error_code DataAggregator::parseAggregatedLBREntry() {
   }
 
   const uint64_t FromOffset = Addr[0]->Offset;
-  BinaryFunction *FromFunc = getBinaryFunctionContainingAddress(FromOffset);
-  if (FromFunc)
-    FromFunc->setHasProfileAvailable();
-
   int64_t Count = Counters[0];
   int64_t Mispreds = Counters[1];
 
@@ -1361,11 +1378,6 @@ std::error_code DataAggregator::parseAggregatedLBREntry() {
     return std::error_code();
   }
 
-  const uint64_t ToOffset = Addr[1]->Offset;
-  BinaryFunction *ToFunc = getBinaryFunctionContainingAddress(ToOffset);
-  if (ToFunc)
-    ToFunc->setHasProfileAvailable();
-
   /// For fall-through types, adjust locations to match Trace container.
   if (Type == FT || Type == FT_EXTERNAL_ORIGIN || Type == FT_EXTERNAL_RETURN) {
     Addr[2] = Location(Addr[1]->Offset); // Trace To
@@ -1613,9 +1625,6 @@ std::error_code DataAggregator::parseBranchEvents() {
   Traces.reserve(TraceMap.size());
   for (const auto &[Trace, Info] : TraceMap) {
     Traces.emplace_back(Trace, Info);
-    for (const uint64_t Addr : {Trace.Branch, Trace.From})
-      if (BinaryFunction *BF = getBinaryFunctionContainingAddress(Addr))
-        BF->setHasProfileAvailable();
   }
   clear(TraceMap);
 
@@ -1676,9 +1685,6 @@ std::error_code DataAggregator::parseBasicEvents() {
       continue;
     ++NumTotalSamples;
 
-    if (BinaryFunction *BF = getBinaryFunctionContainingAddress(Sample->PC))
-      BF->setHasProfileAvailable();
-
     ++BasicSamples[Sample->PC];
     EventNames.insert(Sample->EventName);
   }
@@ -1716,9 +1722,6 @@ std::error_code DataAggregator::parseMemEvents() {
     if (std::error_code EC = Sample.getError())
       return EC;
 
-    if (BinaryFunction *BF = getBinaryFunctionContainingAddress(Sample->PC))
-      BF->setHasProfileAvailable();
-
     MemSamples.emplace_back(std::move(Sample.get()));
   }
 

Copy link
Member

@paschalis-mpeis paschalis-mpeis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

fhahn and others added 2 commits July 28, 2025 04:23
Created using spr 1.3.4

[skip ci]
Created using spr 1.3.4
@aaupov aaupov changed the base branch from users/aaupov/spr/main.boltnfc-register-profiled-functions-once to main July 28, 2025 11:23
@aaupov aaupov merged commit 1b657c6 into main Jul 28, 2025
12 of 15 checks passed
@aaupov aaupov deleted the users/aaupov/spr/boltnfc-register-profiled-functions-once branch July 28, 2025 11:29
@paschalis-mpeis
Copy link
Member

Hey @aaupov,

This appears to have broken large bolt-tests: perf2bolt-multiple-edge.test
More: https://lab.llvm.org/staging/#/builders/219/builds/2012

Can you please have a look?

rafaelauler added a commit that referenced this pull request Aug 7, 2025
Summary:
We are observing random crashes where addresses provided by the
hardware (from LBR) randomly clashes with what LLVM uses in DenseSet
as tombstones records (-2 << 12), which causes DenseSet to assert when
ingesting this data while running the recently added
registerProfiledFunctions().
rafaelauler added a commit that referenced this pull request Aug 7, 2025
)

In perf2bolt, we are observing sporadic crashes in the recently added
registerProfiledFunctions from #150622. Addresses provided by the
hardware (from LBR) might be -1, which clashes with what LLVM uses in
DenseSet as empty tombstones records. This causes DenseSet to assert
with "can't insert empty tombstone into map" when ingesting this
data. Revert this change for now to unbreak perf2bolt.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants