Skip to content

[TypeProf][ICP]Allow vtable-comparison as long as vtable count is comparable with function count for each candidate #106260

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 27, 2024

Conversation

mingmingl-llvm
Copy link
Contributor

@mingmingl-llvm mingmingl-llvm commented Aug 27, 2024

The current cost-benefit analysis between vtable comparison and function comparison require the indirect fallback branch to be cold. This is too conservative.

This change allows vtable-comparison as long as vtable count is comparable with function count for each function candidate and removes the cold indirect fallback requirement.

Tested:

  1. Testing this on benchmarks uplifts the measurable performance wins. Counting the (possibly-duplicated) remarks (because of linkonce_odr functions, cross-module import of functions) show the number of vtable remarks increases from ~30k-ish to 50k-ish.
  2. https://gcc.godbolt.org/z/sbGK7Pacn shows vtable-comparison doesn't happen today (using the same IR input)

…not cold, as long as vtable count is comparable with function count for each candidate
@llvmbot llvmbot added PGO Profile Guided Optimizations llvm:transforms labels Aug 27, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 27, 2024

@llvm/pr-subscribers-pgo

@llvm/pr-subscribers-llvm-transforms

Author: Mingming Liu (minglotus-6)

Changes

The current cost-benefit analysis between vtable comparison and function comparison require the indirect fallback branch to be cold. This is too conservative.

This change allows vtable-comparison as long as vtable count is comparable with function count for each function candidate and removes the cold indirect fallback requirement.

Testing this on benchmarks uplifts the measurable performance wins. Counting the (non-duplicated) remarks show the number of vtable remarks increases from ~30k-ish to 50k-ish.


Full diff: https://github.com/llvm/llvm-project/pull/106260.diff

2 Files Affected:

  • (modified) llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp (+1-9)
  • (modified) llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll (+18)
diff --git a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
index 0d1f506986379d..c9007498ee2331 100644
--- a/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
+++ b/llvm/lib/Transforms/Instrumentation/IndirectCallPromotion.cpp
@@ -117,7 +117,7 @@ static cl::opt<bool>
 // Indirect call promotion pass will fall back to function-based comparison if
 // vtable-count / function-count is smaller than this threshold.
 static cl::opt<float> ICPVTablePercentageThreshold(
-    "icp-vtable-percentage-threshold", cl::init(0.99), cl::Hidden,
+    "icp-vtable-percentage-threshold", cl::init(0.995), cl::Hidden,
     cl::desc("The percentage threshold of vtable-count / function-count for "
              "cost-benefit analysis."));
 
@@ -888,14 +888,6 @@ bool IndirectCallPromoter::isProfitableToCompareVTables(
     }
   }
 
-  // If the indirect fallback is not cold, don't compare vtables.
-  if (PSI && PSI->hasProfileSummary() &&
-      !PSI->isColdCount(RemainingVTableCount)) {
-    LLVM_DEBUG(dbgs() << "    Indirect fallback basic block is not cold. Bail "
-                         "out for vtable comparison.\n");
-    return false;
-  }
-
   return true;
 }
 
diff --git a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
index c77be3b1ed2444..b6afce3d7c6d5d 100644
--- a/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
+++ b/llvm/test/Transforms/PGOProfile/icp_vtable_cmp.ll
@@ -120,6 +120,7 @@ declare i32 @Base2_foo(ptr)
 declare i32 @Base1_foo(ptr)
 declare void @Base3_foo(ptr)
 
+!llvm.module.flags = !{!11}
 !0 = !{i64 16, !"Base1"}
 !1 = !{i64 40, !"Base1"}
 !2 = !{i64 16, !"Base2"}
@@ -131,6 +132,23 @@ declare void @Base3_foo(ptr)
 !8 = !{i64 16, !"Derived3"}
 !9 = !{!"VP", i32 2, i64 1600, i64 -4123858694673519054, i64 600, i64 -7211198353767973908, i64 500, i64 -3574436251470806727, i64 200, i64 6288809125658696740, i64 200, i64 12345678, i64 100}
 !10 = !{!"VP", i32 0, i64 1600, i64 3827408714133779784, i64 600, i64 5837445539218476403, i64 500, i64 -9064955852395570538, i64 400,  i64 56781234, i64 100}
+
+!11 = !{i32 1, !"ProfileSummary", !12}
+!12 = !{!13, !14, !15, !16, !17, !18, !19, !20}
+!13 = !{!"ProfileFormat", !"InstrProf"}
+!14 = !{!"TotalCount", i64 10000}
+!15 = !{!"MaxCount", i64 10}
+!16 = !{!"MaxInternalCount", i64 1}
+!17 = !{!"MaxFunctionCount", i64 1000}
+!18 = !{!"NumCounts", i64 3}
+!19 = !{!"NumFunctions", i64 3}
+!20 = !{!"DetailedSummary", !21}
+!21 = !{!22, !23, !24}
+!22 = !{i32 10000, i64 101, i32 1}
+!23 = !{i32 990000, i64 101, i32 1}
+!24 = !{i32 999999, i64 1, i32 2}
+
+
 ;.
 ; VTABLE-COMMON: [[PROF9]] = !{!"VP", i32 2, i64 100, i64 12345678, i64 100}
 ; VTABLE-COMMON: [[PROF10]] = !{!"branch_weights", i32 600, i32 1000}

Copy link
Contributor

@david-xl david-xl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the icpvtablePercentage threshold will need to be tuned for SamplePGO in the future (given the mismatched profiling source). For now it is ok.

@mingmingl-llvm mingmingl-llvm merged commit 511500e into llvm:main Aug 27, 2024
8 of 10 checks passed
@mingmingl-llvm mingmingl-llvm deleted the icpfallback branch August 27, 2024 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
llvm:transforms PGO Profile Guided Optimizations
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants