Skip to content

[LLVM][PassBuilder] Extend the function signature of callback for optimizer pipeline extension point #100953

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

shiltian
Copy link
Contributor

@shiltian shiltian commented Jul 28, 2024

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this.

@shiltian shiltian marked this pull request as ready for review July 28, 2024 23:04
Copy link
Contributor Author

shiltian commented Jul 28, 2024

@llvmbot llvmbot added clang Clang issues not falling into any other category backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. labels Jul 28, 2024
@llvmbot
Copy link
Member

llvmbot commented Jul 28, 2024

@llvm/pr-subscribers-clang-codegen

@llvm/pr-subscribers-backend-amdgpu

Author: Shilei Tian (shiltian)

Changes

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in #66488.


Full diff: https://github.com/llvm/llvm-project/pull/100953.diff

7 Files Affected:

  • (modified) clang/lib/CodeGen/BackendUtil.cpp (+10-9)
  • (modified) llvm/include/llvm/Passes/PassBuilder.h (+7-3)
  • (modified) llvm/lib/Passes/PassBuilderPipelines.cpp (+7-5)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPU.h (+6-1)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp (+7-4)
  • (modified) llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp (+9-6)
  • (modified) llvm/tools/opt/NewPMDriver.cpp (+1-1)
diff --git a/clang/lib/CodeGen/BackendUtil.cpp b/clang/lib/CodeGen/BackendUtil.cpp
index e765bbf637a66..64f0020a170aa 100644
--- a/clang/lib/CodeGen/BackendUtil.cpp
+++ b/clang/lib/CodeGen/BackendUtil.cpp
@@ -643,7 +643,7 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts,
 
   // Ensure we lower KCFI operand bundles with -O0.
   PB.registerOptimizerLastEPCallback(
-      [&](ModulePassManager &MPM, OptimizationLevel Level) {
+      [&](ModulePassManager &MPM, OptimizationLevel Level, ThinOrFullLTOPhase) {
         if (Level == OptimizationLevel::O0 &&
             LangOpts.Sanitize.has(SanitizerKind::KCFI))
           MPM.addPass(createModuleToFunctionPassAdaptor(KCFIPass()));
@@ -662,8 +662,8 @@ static void addKCFIPass(const Triple &TargetTriple, const LangOptions &LangOpts,
 static void addSanitizers(const Triple &TargetTriple,
                           const CodeGenOptions &CodeGenOpts,
                           const LangOptions &LangOpts, PassBuilder &PB) {
-  auto SanitizersCallback = [&](ModulePassManager &MPM,
-                                OptimizationLevel Level) {
+  auto SanitizersCallback = [&](ModulePassManager &MPM, OptimizationLevel Level,
+                                ThinOrFullLTOPhase) {
     if (CodeGenOpts.hasSanitizeCoverage()) {
       auto SancovOpts = getSancovOptsFromCGOpts(CodeGenOpts);
       MPM.addPass(SanitizerCoveragePass(
@@ -749,7 +749,7 @@ static void addSanitizers(const Triple &TargetTriple,
     PB.registerOptimizerEarlyEPCallback(
         [SanitizersCallback](ModulePassManager &MPM, OptimizationLevel Level) {
           ModulePassManager NewMPM;
-          SanitizersCallback(NewMPM, Level);
+          SanitizersCallback(NewMPM, Level, ThinOrFullLTOPhase::None);
           if (!NewMPM.isEmpty()) {
             // Sanitizers can abandon<GlobalsAA>.
             NewMPM.addPass(RequireAnalysisPass<GlobalsAA, llvm::Module>());
@@ -1018,11 +1018,12 @@ void EmitAssemblyHelper::RunOptimizationPipeline(
     // TODO: Consider passing the MemoryProfileOutput to the pass builder via
     // the PGOOptions, and set this up there.
     if (!CodeGenOpts.MemoryProfileOutput.empty()) {
-      PB.registerOptimizerLastEPCallback(
-          [](ModulePassManager &MPM, OptimizationLevel Level) {
-            MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
-            MPM.addPass(ModuleMemProfilerPass());
-          });
+      PB.registerOptimizerLastEPCallback([](ModulePassManager &MPM,
+                                            OptimizationLevel Level,
+                                            ThinOrFullLTOPhase) {
+        MPM.addPass(createModuleToFunctionPassAdaptor(MemProfilerPass()));
+        MPM.addPass(ModuleMemProfilerPass());
+      });
     }
 
     if (CodeGenOpts.FatLTO) {
diff --git a/llvm/include/llvm/Passes/PassBuilder.h b/llvm/include/llvm/Passes/PassBuilder.h
index 474a19531ff5d..4c2763404ff05 100644
--- a/llvm/include/llvm/Passes/PassBuilder.h
+++ b/llvm/include/llvm/Passes/PassBuilder.h
@@ -497,7 +497,8 @@ class PassBuilder {
   /// This extension point allows adding optimizations at the very end of the
   /// function optimization pipeline.
   void registerOptimizerLastEPCallback(
-      const std::function<void(ModulePassManager &, OptimizationLevel)> &C) {
+      const std::function<void(ModulePassManager &, OptimizationLevel,
+                               ThinOrFullLTOPhase)> &C) {
     OptimizerLastEPCallbacks.push_back(C);
   }
 
@@ -630,7 +631,8 @@ class PassBuilder {
   void invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
                                        OptimizationLevel Level);
   void invokeOptimizerLastEPCallbacks(ModulePassManager &MPM,
-                                      OptimizationLevel Level);
+                                      OptimizationLevel Level,
+                                      ThinOrFullLTOPhase Phase);
   void invokeFullLinkTimeOptimizationEarlyEPCallbacks(ModulePassManager &MPM,
                                                       OptimizationLevel Level);
   void invokeFullLinkTimeOptimizationLastEPCallbacks(ModulePassManager &MPM,
@@ -755,7 +757,9 @@ class PassBuilder {
   // Module callbacks
   SmallVector<std::function<void(ModulePassManager &, OptimizationLevel)>, 2>
       OptimizerEarlyEPCallbacks;
-  SmallVector<std::function<void(ModulePassManager &, OptimizationLevel)>, 2>
+  SmallVector<std::function<void(ModulePassManager &, OptimizationLevel,
+                                 ThinOrFullLTOPhase)>,
+              2>
       OptimizerLastEPCallbacks;
   SmallVector<std::function<void(ModulePassManager &, OptimizationLevel)>, 2>
       FullLinkTimeOptimizationEarlyEPCallbacks;
diff --git a/llvm/lib/Passes/PassBuilderPipelines.cpp b/llvm/lib/Passes/PassBuilderPipelines.cpp
index 757b20dcd6693..5827a485879d3 100644
--- a/llvm/lib/Passes/PassBuilderPipelines.cpp
+++ b/llvm/lib/Passes/PassBuilderPipelines.cpp
@@ -365,9 +365,10 @@ void PassBuilder::invokeOptimizerEarlyEPCallbacks(ModulePassManager &MPM,
     C(MPM, Level);
 }
 void PassBuilder::invokeOptimizerLastEPCallbacks(ModulePassManager &MPM,
-                                                 OptimizationLevel Level) {
+                                                 OptimizationLevel Level,
+                                                 ThinOrFullLTOPhase Phase) {
   for (auto &C : OptimizerLastEPCallbacks)
-    C(MPM, Level);
+    C(MPM, Level, Phase);
 }
 void PassBuilder::invokeFullLinkTimeOptimizationEarlyEPCallbacks(
     ModulePassManager &MPM, OptimizationLevel Level) {
@@ -1524,7 +1525,7 @@ PassBuilder::buildModuleOptimizationPipeline(OptimizationLevel Level,
   MPM.addPass(createModuleToFunctionPassAdaptor(std::move(OptimizePM),
                                                 PTO.EagerlyInvalidateAnalyses));
 
-  invokeOptimizerLastEPCallbacks(MPM, Level);
+  invokeOptimizerLastEPCallbacks(MPM, Level, LTOPhase);
 
   // Split out cold code. Splitting is done late to avoid hiding context from
   // other optimizations and inadvertently regressing performance. The tradeoff
@@ -1671,7 +1672,8 @@ PassBuilder::buildThinLTOPreLinkDefaultPipeline(OptimizationLevel Level) {
   // optimization is going to be done in PostLink stage, but clang can't add
   // callbacks there in case of in-process ThinLTO called by linker.
   invokeOptimizerEarlyEPCallbacks(MPM, Level);
-  invokeOptimizerLastEPCallbacks(MPM, Level);
+  invokeOptimizerLastEPCallbacks(MPM, Level,
+                                 ThinOrFullLTOPhase::ThinLTOPreLink);
 
   // Emit annotation remarks.
   addAnnotationRemarksPass(MPM);
@@ -2159,7 +2161,7 @@ ModulePassManager PassBuilder::buildO0DefaultPipeline(OptimizationLevel Level,
   CoroPM.addPass(GlobalDCEPass());
   MPM.addPass(CoroConditionalWrapper(std::move(CoroPM)));
 
-  invokeOptimizerLastEPCallbacks(MPM, Level);
+  invokeOptimizerLastEPCallbacks(MPM, Level, ThinOrFullLTOPhase::None);
 
   if (LTOPreLink)
     addRequiredLTOPreLinkPasses(MPM);
diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.h b/llvm/lib/Target/AMDGPU/AMDGPU.h
index 46cc5f349555a..50aef36724f70 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.h
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.h
@@ -287,8 +287,13 @@ class AMDGPUAttributorPass : public PassInfoMixin<AMDGPUAttributorPass> {
 private:
   TargetMachine &TM;
 
+  /// Asserts whether we can assume whole program visibility.
+  bool HasWholeProgramVisibility = false;
+
 public:
-  AMDGPUAttributorPass(TargetMachine &TM) : TM(TM){};
+  AMDGPUAttributorPass(TargetMachine &TM,
+                       bool HasWholeProgramVisibility = false)
+      : TM(TM), HasWholeProgramVisibility(HasWholeProgramVisibility) {};
   PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM);
 };
 
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
index 51968063e8919..ab98da31b050f 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUAttributor.cpp
@@ -1023,7 +1023,8 @@ static void addPreloadKernArgHint(Function &F, TargetMachine &TM) {
   }
 }
 
-static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM) {
+static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM,
+                    bool HasWholeProgramVisibility) {
   SetVector<Function *> Functions;
   for (Function &F : M) {
     if (!F.isIntrinsic())
@@ -1041,6 +1042,7 @@ static bool runImpl(Module &M, AnalysisGetter &AG, TargetMachine &TM) {
        &AAUnderlyingObjects::ID, &AAIndirectCallInfo::ID});
 
   AttributorConfig AC(CGUpdater);
+  AC.IsClosedWorldModule = HasWholeProgramVisibility;
   AC.Allowed = &Allowed;
   AC.IsModulePass = true;
   AC.DefaultInitializeLiveInternals = false;
@@ -1086,7 +1088,7 @@ class AMDGPUAttributorLegacy : public ModulePass {
 
   bool runOnModule(Module &M) override {
     AnalysisGetter AG(this);
-    return runImpl(M, AG, *TM);
+    return runImpl(M, AG, *TM, /*HasWholeProgramVisibility=*/false);
   }
 
   void getAnalysisUsage(AnalysisUsage &AU) const override {
@@ -1107,8 +1109,9 @@ PreservedAnalyses llvm::AMDGPUAttributorPass::run(Module &M,
   AnalysisGetter AG(FAM);
 
   // TODO: Probably preserves CFG
-  return runImpl(M, AG, TM) ? PreservedAnalyses::none()
-                            : PreservedAnalyses::all();
+  return runImpl(M, AG, TM, HasWholeProgramVisibility)
+             ? PreservedAnalyses::none()
+             : PreservedAnalyses::all();
 }
 
 char AMDGPUAttributorLegacy::ID = 0;
diff --git a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
index c8fb68d1c0b0c..50cc2d871d4ec 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
+++ b/llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
@@ -735,12 +735,15 @@ void AMDGPUTargetMachine::registerPassBuilderCallbacks(PassBuilder &PB) {
       });
 
   // FIXME: Why is AMDGPUAttributor not in CGSCC?
-  PB.registerOptimizerLastEPCallback(
-      [this](ModulePassManager &MPM, OptimizationLevel Level) {
-        if (Level != OptimizationLevel::O0) {
-          MPM.addPass(AMDGPUAttributorPass(*this));
-        }
-      });
+  PB.registerOptimizerLastEPCallback([this](ModulePassManager &MPM,
+                                            OptimizationLevel Level,
+                                            ThinOrFullLTOPhase Phase) {
+    if (Level != OptimizationLevel::O0) {
+      MPM.addPass(AMDGPUAttributorPass(
+          *this, Phase == ThinOrFullLTOPhase::FullLTOPostLink ||
+                     Phase == ThinOrFullLTOPhase::ThinLTOPostLink));
+    }
+  });
 
   PB.registerFullLinkTimeOptimizationLastEPCallback(
       [this](ModulePassManager &PM, OptimizationLevel Level) {
diff --git a/llvm/tools/opt/NewPMDriver.cpp b/llvm/tools/opt/NewPMDriver.cpp
index 374698083763b..522a8c06d83c0 100644
--- a/llvm/tools/opt/NewPMDriver.cpp
+++ b/llvm/tools/opt/NewPMDriver.cpp
@@ -310,7 +310,7 @@ static void registerEPCallbacks(PassBuilder &PB) {
         });
   if (tryParsePipelineText<ModulePassManager>(PB, OptimizerLastEPPipeline))
     PB.registerOptimizerLastEPCallback(
-        [&PB](ModulePassManager &PM, OptimizationLevel) {
+        [&PB](ModulePassManager &PM, OptimizationLevel, ThinOrFullLTOPhase) {
           ExitOnError Err("Unable to parse OptimizerLastEP pipeline: ");
           Err(PB.parsePassPipeline(PM, OptimizerLastEPPipeline));
         });

@jdoerfert
Copy link
Member

I thought I tried this myself. Does it really work, as in, is the mode set properly for full and thin lto? I think I had problems for one of them.

Copy link
Contributor

@nikic nikic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems fine to me in general. The patch stack seems to be messed up though, or at least this seems to contain some unrelated AMDGPU changes.

@jdoerfert Possibly the issue you saw is that this callback just isn't used by the post-link full LTO pipeline at all? It uses invokeFullLinkTimeOptimizationLastEPCallbacks instead. Maybe with this change it would make sense to remove that one and merge it into the main OptimizerLastEPCallback with the appropriate LTOPhase?

The other thing I wonder about is whether this argument should be added to other callbacks as well for consistency.

@shiltian
Copy link
Contributor Author

This seems fine to me in general. The patch stack seems to be messed up though, or at least this seems to contain some unrelated AMDGPU changes.

It has some AMD changes because I'd like to demonstrate how the changes will be used.

The other thing I wonder about is whether this argument should be added to other callbacks as well for consistency.

Yeah, I was wondering that as well. I'm happy to do the changes but not sure if that's necessary.

@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from 76c0d45 to f4f6b93 Compare July 29, 2024 18:24
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 9c94910 to ed46483 Compare July 29, 2024 18:25
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch 2 times, most recently from 2bbf195 to 9b12959 Compare July 30, 2024 18:45
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from ed46483 to 9980c1f Compare July 30, 2024 18:45
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch 2 times, most recently from ec07983 to 76e5c51 Compare July 31, 2024 17:45
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 9980c1f to 6e26b39 Compare July 31, 2024 17:45
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from 76e5c51 to aa74047 Compare July 31, 2024 17:58
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 6e26b39 to 8df9dec Compare July 31, 2024 18:17
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from aa74047 to 52df8e6 Compare July 31, 2024 18:23
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 8df9dec to 913f6a7 Compare July 31, 2024 18:23
@shiltian
Copy link
Contributor Author

shiltian commented Aug 1, 2024

ping

if it is preferred to split the AMDGPU related changes to another PR, I can do that.

@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from 52df8e6 to 5da7378 Compare August 1, 2024 20:29
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 913f6a7 to 842d222 Compare August 1, 2024 20:29
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from 5da7378 to 5d7487b Compare August 2, 2024 17:10
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 842d222 to 225eca0 Compare August 2, 2024 17:10
@shiltian shiltian force-pushed the users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass branch from 5d7487b to ea3f6bb Compare August 2, 2024 17:38
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 225eca0 to 699c144 Compare August 2, 2024 17:39
Base automatically changed from users/shiltian/07-28-_attributor_amd_enable_aaindirectcallinfo_for_amdattributorpass to main August 2, 2024 21:23
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 699c144 to 602fcdf Compare August 2, 2024 21:23
@shiltian
Copy link
Contributor Author

shiltian commented Aug 5, 2024

ping +1

@shiltian
Copy link
Contributor Author

shiltian commented Aug 7, 2024

This PR will not be needed if we decide to move AMDGPUAttributorPass to full LTO stage (#102086).

@shiltian shiltian mentioned this pull request Aug 7, 2024
…imizer pipeline extension point

These callbacks can be invoked in multiple places when building an optimization
pipeline, both in compile time and link time. However, there is no indicator on
what pipeline it is currently building.

In this patch, an extra argument is added to indicate its (Thin)LTO stage such
that the callback can check it if needed. There is no test expected from this,
and the benefit of this change will be demonstrated in #66488.
@shiltian shiltian force-pushed the users/shiltian/extend-callback-signature branch from 602fcdf to 0110da5 Compare August 7, 2024 03:22
@jdoerfert
Copy link
Member

Split it, and maybe add the Phase to the rest as well. One commit to just make it consistent and pass the information along.
Second for the AMDGPU stuff.

@jdoerfert Possibly the issue you saw is that this callback just isn't used by the post-link full LTO pipeline at all? It uses invokeFullLinkTimeOptimizationLastEPCallbacks instead. Maybe with this change it would make sense to remove that one and merge it into the main OptimizerLastEPCallback with the appropriate LTOPhase?

The problem is that the OptimizerLastEPCallback is not called in FullLTO post link, IIRC. So this will alone not solve our issue. However, we still should tell the callbacks what phase we are in, consistently.

@shiltian
Copy link
Contributor Author

shiltian commented Aug 8, 2024

This PR is no longer needed. We decided to move the AMDGPUAttributorPass to full LTO.

@shiltian shiltian closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AMDGPU clang:codegen IR generation bugs: mangling, exceptions, etc. clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants