Skip to content

[C++20] [Modules] Embed all source files for C++20 Modules #102444

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 29, 2024

Conversation

ChuanqiXu9
Copy link
Member

Close #72383

The implementation rationale is, I don't want to pass -fmodules-embed-all-files all the time since we can't test it in lit tests (we're using clang_cc1). So I tried to set it in FrontendActions for modules.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Aug 8, 2024
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:codegen IR generation bugs: mangling, exceptions, etc. labels Aug 8, 2024
@llvmbot
Copy link
Member

llvmbot commented Aug 8, 2024

@llvm/pr-subscribers-clang-codegen
@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Changes

Close #72383

The implementation rationale is, I don't want to pass -fmodules-embed-all-files all the time since we can't test it in lit tests (we're using clang_cc1). So I tried to set it in FrontendActions for modules.


Full diff: https://github.com/llvm/llvm-project/pull/102444.diff

5 Files Affected:

  • (modified) clang/include/clang/CodeGen/CodeGenAction.h (+1-1)
  • (modified) clang/include/clang/Frontend/FrontendActions.h (+3-1)
  • (modified) clang/include/clang/Serialization/ModuleFile.h (+5-5)
  • (modified) clang/lib/CodeGen/CodeGenAction.cpp (+3-2)
  • (modified) clang/lib/Frontend/FrontendActions.cpp (+12-3)
diff --git a/clang/include/clang/CodeGen/CodeGenAction.h b/clang/include/clang/CodeGen/CodeGenAction.h
index 186dbb43f01ef..461450d875ec5 100644
--- a/clang/include/clang/CodeGen/CodeGenAction.h
+++ b/clang/include/clang/CodeGen/CodeGenAction.h
@@ -57,7 +57,7 @@ class CodeGenAction : public ASTFrontendAction {
   bool loadLinkModules(CompilerInstance &CI);
 
 protected:
-  bool BeginSourceFileAction(CompilerInstance &CI) override;
+  bool BeginInvocation(CompilerInstance &CI) override;
 
   /// Create a new code generation action.  If the optional \p _VMContext
   /// parameter is supplied, the action uses it without taking ownership,
diff --git a/clang/include/clang/Frontend/FrontendActions.h b/clang/include/clang/Frontend/FrontendActions.h
index a620ddfc40447..e82f15f89b643 100644
--- a/clang/include/clang/Frontend/FrontendActions.h
+++ b/clang/include/clang/Frontend/FrontendActions.h
@@ -152,11 +152,13 @@ class GenerateModuleFromModuleMapAction : public GenerateModuleAction {
   CreateOutputFile(CompilerInstance &CI, StringRef InFile) override;
 };
 
+bool BeginInvocationForModules(CompilerInstance &CI);
+
 /// Generates full BMI (which contains full information to generate the object
 /// files) for C++20 Named Modules.
 class GenerateModuleInterfaceAction : public GenerateModuleAction {
 protected:
-  bool BeginSourceFileAction(CompilerInstance &CI) override;
+  bool BeginInvocation(CompilerInstance &CI) override;
 
   std::unique_ptr<ASTConsumer> CreateASTConsumer(CompilerInstance &CI,
                                                  StringRef InFile) override;
diff --git a/clang/include/clang/Serialization/ModuleFile.h b/clang/include/clang/Serialization/ModuleFile.h
index 3e920c0f68360..30e7f6b3e57bd 100644
--- a/clang/include/clang/Serialization/ModuleFile.h
+++ b/clang/include/clang/Serialization/ModuleFile.h
@@ -88,13 +88,13 @@ class InputFile {
 
   InputFile(FileEntryRef File, bool isOverridden = false,
             bool isOutOfDate = false) {
-    assert(!(isOverridden && isOutOfDate) &&
-           "an overridden cannot be out-of-date");
     unsigned intVal = 0;
-    if (isOverridden)
-      intVal = Overridden;
-    else if (isOutOfDate)
+    // Make isOutOfDate with higher priority than isOverridden.
+    // It is possible if the recorded hash value mismatches.
+    if (isOutOfDate)
       intVal = OutOfDate;
+    else if (isOverridden)
+      intVal = Overridden;
     Val.setPointerAndInt(&File.getMapEntry(), intVal);
   }
 
diff --git a/clang/lib/CodeGen/CodeGenAction.cpp b/clang/lib/CodeGen/CodeGenAction.cpp
index e87226e60297c..8900faf07eeaf 100644
--- a/clang/lib/CodeGen/CodeGenAction.cpp
+++ b/clang/lib/CodeGen/CodeGenAction.cpp
@@ -969,9 +969,10 @@ CodeGenerator *CodeGenAction::getCodeGenerator() const {
   return BEConsumer->getCodeGenerator();
 }
 
-bool CodeGenAction::BeginSourceFileAction(CompilerInstance &CI) {
+bool CodeGenAction::BeginInvocation(CompilerInstance &CI) {
   if (CI.getFrontendOpts().GenReducedBMI)
-    CI.getLangOpts().setCompilingModule(LangOptions::CMK_ModuleInterface);
+    return BeginInvocationForModules(CI);
+
   return true;
 }
 
diff --git a/clang/lib/Frontend/FrontendActions.cpp b/clang/lib/Frontend/FrontendActions.cpp
index e70210d55fe28..7758746f9a483 100644
--- a/clang/lib/Frontend/FrontendActions.cpp
+++ b/clang/lib/Frontend/FrontendActions.cpp
@@ -262,11 +262,20 @@ GenerateModuleFromModuleMapAction::CreateOutputFile(CompilerInstance &CI,
                                     /*ForceUseTemporary=*/true);
 }
 
-bool GenerateModuleInterfaceAction::BeginSourceFileAction(
-    CompilerInstance &CI) {
+bool clang::BeginInvocationForModules(CompilerInstance &CI) {
+  // Embed all module files for named modules.
+  // See https://github.com/llvm/llvm-project/issues/72383 for discussion.
+  CI.getFrontendOpts().ModulesEmbedAllFiles = true;
   CI.getLangOpts().setCompilingModule(LangOptions::CMK_ModuleInterface);
+  return true;
+}
 
-  return GenerateModuleAction::BeginSourceFileAction(CI);
+bool GenerateModuleInterfaceAction::BeginInvocation(
+    CompilerInstance &CI) {
+  if (!BeginInvocationForModules(CI))
+    return false;
+
+  return GenerateModuleAction::BeginInvocation(CI);
 }
 
 std::unique_ptr<ASTConsumer>

Copy link

github-actions bot commented Aug 8, 2024

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff 3b64ede096ce0a0230c4d3f77782e6fa18f2943a c61641a8901e99841916288d54a4e8d3d3894c89 --extensions cpp,cppm,h -- clang/include/clang/CodeGen/CodeGenAction.h clang/include/clang/Frontend/FrontendActions.h clang/include/clang/Serialization/ModuleFile.h clang/lib/CodeGen/CodeGenAction.cpp clang/lib/Frontend/FrontendActions.cpp clang/test/Modules/no-local-decl-in-reduced-bmi.cppm clang/test/Modules/reduced-bmi-empty-module-purview-std.cppm clang/test/Modules/reduced-bmi-empty-module-purview.cppm clang/test/Modules/unreached-static-entities.cppm
View the diff from clang-format here.
diff --git a/clang/lib/Frontend/FrontendActions.cpp b/clang/lib/Frontend/FrontendActions.cpp
index 8c7b749fe8..0ca122742b 100644
--- a/clang/lib/Frontend/FrontendActions.cpp
+++ b/clang/lib/Frontend/FrontendActions.cpp
@@ -270,8 +270,7 @@ bool clang::BeginInvocationForModules(CompilerInstance &CI) {
   return true;
 }
 
-bool GenerateModuleInterfaceAction::BeginInvocation(
-    CompilerInstance &CI) {
+bool GenerateModuleInterfaceAction::BeginInvocation(CompilerInstance &CI) {
   if (!BeginInvocationForModules(CI))
     return false;
 

@dwblaikie dwblaikie removed their request for review August 8, 2024 18:17
@ChuanqiXu9 ChuanqiXu9 merged commit 2eeeff8 into llvm:main Aug 29, 2024
7 of 8 checks passed
@ChuanqiXu9 ChuanqiXu9 deleted the EmbedAllFiles branch August 29, 2024 08:06
@cor3ntin
Copy link
Contributor

cor3ntin commented Aug 29, 2024

Was this discussed/reviewed/motivated? There are drawbacks to this approach outlined in #72383

@iains @jyknight @AaronBallman @Bigcheese

@llvm-ci
Copy link
Collaborator

llvm-ci commented Aug 29, 2024

LLVM Buildbot has detected a new failure on builder clang-hip-vega20 running on hip-vega20-0 while building clang at step 3 "annotate".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/123/builds/4498

Here is the relevant piece of the build log for the reference
Step 3 (annotate) failure: '../llvm-zorg/zorg/buildbot/builders/annotated/hip-build.sh --jobs=' (failure)
...
[38/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/memmove-hip-6.0.2.dir/memmove.hip.o -o External/HIP/memmove-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/memmove.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/memmove.reference_output-hip-6.0.2
[39/40] /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -DNDEBUG  -O3 -DNDEBUG   -w -Werror=date-time --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --offload-arch=gfx908 --offload-arch=gfx90a --offload-arch=gfx1030 --offload-arch=gfx1100 -xhip -mfma -MD -MT External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -MF External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o.d -o External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -c /buildbot/llvm-test-suite/External/HIP/workload/ray-tracing/TheNextWeek/main.cc
[40/40] : && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/clang++ -O3 -DNDEBUG  External/HIP/CMakeFiles/TheNextWeek-hip-6.0.2.dir/workload/ray-tracing/TheNextWeek/main.cc.o -o External/HIP/TheNextWeek-hip-6.0.2  --rocm-path=/buildbot/Externals/hip/rocm-6.0.2 --hip-link -rtlib=compiler-rt -unwindlib=libgcc -frtlib-add-rpath && cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /usr/local/bin/cmake -E create_symlink /buildbot/llvm-test-suite/External/HIP/TheNextWeek.reference_output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/TheNextWeek.reference_output-hip-6.0.2
+ build_step 'Testing HIP test-suite'
+ echo '@@@BUILD_STEP Testing HIP test-suite@@@'
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 378.40s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
Step 12 (Testing HIP test-suite) failure: Testing HIP test-suite (failure)
@@@BUILD_STEP Testing HIP test-suite@@@
+ ninja -v check-hip-simple
[0/1] cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
-- Testing: 7 tests, 7 workers --
Testing:  0.. 10.. 20.. 30.. 40
FAIL: test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test (4 of 7)
******************** TEST 'test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test' FAILED ********************

/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/timeit-target --timeout 7200 --limit-core 0 --limit-cpu 7200 --limit-file-size 209715200 --limit-rss-size 838860800 --append-exitstatus --redirect-output /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out --redirect-input /dev/null --summary /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.time /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/InOneWeekend-hip-6.0.2
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP ; /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2

+ cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP
+ /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP/Output/InOneWeekend-hip-6.0.2.test.out InOneWeekend.reference_output-hip-6.0.2
/buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/tools/fpcmp-target: Comparison failed, textual difference between 'M' and 'i'

********************
/usr/bin/strip: /bin/bash.stripped: Bad file descriptor
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 
********************
Failed Tests (1):
  test-suite :: External/HIP/InOneWeekend-hip-6.0.2.test


Testing Time: 378.40s

Total Discovered Tests: 7
  Passed: 6 (85.71%)
  Failed: 1 (14.29%)
FAILED: External/HIP/CMakeFiles/check-hip-simple-hip-6.0.2 
cd /buildbot/hip-vega20-0/clang-hip-vega20/test-suite-build/External/HIP && /buildbot/hip-vega20-0/clang-hip-vega20/llvm/bin/llvm-lit -sv empty-hip-6.0.2.test with-fopenmp-hip-6.0.2.test saxpy-hip-6.0.2.test memmove-hip-6.0.2.test InOneWeekend-hip-6.0.2.test TheNextWeek-hip-6.0.2.test blender.test
ninja: build stopped: subcommand failed.
program finished with exit code 1
elapsedTime=547.574445

@ChuanqiXu9
Copy link
Member Author

Was this discussed/reviewed/motivated? There are drawbacks to this approach outlined in #72383

@iains @jyknight @AaronBallman @Bigcheese

The motivation is in #72383 and I comment in #72383 (comment)

This is not reviewed. I wait for several weeks but got no response. And I think it is good. So I choose to land it.

@AaronBallman
Copy link
Collaborator

Was this discussed/reviewed/motivated? There are drawbacks to this approach outlined in #72383
@iains @jyknight @AaronBallman @Bigcheese

The motivation is in #72383 and I comment in #72383 (comment)

This is not reviewed. I wait for several weeks but got no response. And I think it is good. So I choose to land it.

Thank you for the link! (FWIW, there's no problems with these changes having been landed; @ChuanqiXu9 is the code owner for modules and the PR was up for three weeks without discussion. This is just typical post-commit review feedback.)

There's a fair amount of discussion on that thread in opposition to this approach, and a comment thread on an issue is not really visible to many people. I think this warrants an RFC for a broader discussion, so I'd appreciate temporarily reverting this patch.

Some thoughts for the RFC discussion:

  • Given that this is not the default behavior for MSVC, should clang-cl behave differently than clang?
  • What are the impacts on other tooling (debugger, 3rd party static analysis, etc)?
  • Is this the correct default when considering intellectual property security (will people expect their full, original source to be something that can be pulled from these artifacts)?

@mizvekov
Copy link
Contributor

FWIW, I meant to review this, but somehow this disappeared from my waiting for review list and I lost track of it :)

@ChuanqiXu9
Copy link
Member Author

Was this discussed/reviewed/motivated? There are drawbacks to this approach outlined in #72383
@iains @jyknight @AaronBallman @Bigcheese

The motivation is in #72383 and I comment in #72383 (comment)
This is not reviewed. I wait for several weeks but got no response. And I think it is good. So I choose to land it.

Thank you for the link! (FWIW, there's no problems with these changes having been landed; @ChuanqiXu9 is the code owner for modules and the PR was up for three weeks without discussion. This is just typical post-commit review feedback.)

There's a fair amount of discussion on that thread in opposition to this approach, and a comment thread on an issue is not really visible to many people. I think this warrants an RFC for a broader discussion, so I'd appreciate temporarily reverting this patch.

Thank you, Aaron! But I like to not revert this until someone give some complaints on that thread. Since the problem in the high level looks clear to me and feel like the discussion in that thread looks like people there have a pretty good understanding to the issue self.

Some thoughts for the RFC discussion:

  • Given that this is not the default behavior for MSVC, should clang-cl behave differently than clang?

We can discuss this topic in the two aspects, a module specific one and a broader one not limited to modules (the general policy). For the module specific topic, since the BMI generated by clang and MSVC are not compatible, this change won't have any particular impact to the users of MSVC or clang-cl. Beyond the topic of BMI compatibility, this change will make the clang's behavior to match GCC and MSVC's behavior more. Like the OP said in the last paragraph of the 1st floor: #72383:

AFAIK, neither GCC nor MSVC have this restriction for their BMIs.

The internal reason is that, in clang, the source locations are super fundamental and sensitive. So that, if we don't embed the source files, the BMI became invalid after we change the corresponding source file.

And for the broader one, should users expect clang-cl to be a drop-in replacement for MSVC? There is a good post about this (https://www.reddit.com/r/cpp/comments/1e36n0w/c20_modules_with_clangcl_which_way_forward/). I feel this is pretty complex. Not only from the high level expectations (otherwise we'll always say yes) but also the implementations. From the perspective of modules, it looks too hard to implement the MSVC's BMI in clang unless MicroSoft would love to put more people to implement that to clang. But after all, I think this is a good topic to discuss to get more involved.

  • What are the impacts on other tooling (debugger, 3rd party static analysis, etc)?

There shouldn't be any bad impact on other tools. Since all the tools have to consume the BMI via the compiler (in the same version!). Since this patch simply loose the restriction, I don't expect this have any impact on 3rd party tools.

  • Is this the correct default when considering intellectual property security (will people expect their full, original source to be something that can be pulled from these artifacts)?

The consensus of SG15 for BMI is, the BMI is not good for distributed. Since the BMI format are not compatible across compilers (even with different versions!). So we would think BMI as a by-product of a build process only. Although there are thoughts about, we can distribute the BMI in a homogeneous environment where we have stable compiler versions. But such environments are closed ended environment for some specific projects or organizations. So I don't think it may affect any thing for the security as the design doesn't want us to ship the BMIs.

FWIW, I meant to review this, but somehow this disappeared from my waiting for review list and I lost track of it :)

Understood, I have the same experience too. Not meant to blame any one : )

@AaronBallman
Copy link
Collaborator

Was this discussed/reviewed/motivated? There are drawbacks to this approach outlined in #72383
@iains @jyknight @AaronBallman @Bigcheese

The motivation is in #72383 and I comment in #72383 (comment)
This is not reviewed. I wait for several weeks but got no response. And I think it is good. So I choose to land it.

Thank you for the link! (FWIW, there's no problems with these changes having been landed; @ChuanqiXu9 is the code owner for modules and the PR was up for three weeks without discussion. This is just typical post-commit review feedback.)
There's a fair amount of discussion on that thread in opposition to this approach, and a comment thread on an issue is not really visible to many people. I think this warrants an RFC for a broader discussion, so I'd appreciate temporarily reverting this patch.

Thank you, Aaron! But I like to not revert this until someone give some complaints on that thread. Since the problem in the high level looks clear to me and feel like the discussion in that thread looks like people there have a pretty good understanding to the issue self.

Again, I don't think a comment thread on an issue has a wide enough audience. FWIW, I had no idea about that thread until I got pinged here.

Some thoughts for the RFC discussion:

  • Given that this is not the default behavior for MSVC, should clang-cl behave differently than clang?

We can discuss this topic in the two aspects, a module specific one and a broader one not limited to modules (the general policy). For the module specific topic, since the BMI generated by clang and MSVC are not compatible, this change won't have any particular impact to the users of MSVC or clang-cl. Beyond the topic of BMI compatibility, this change will make the clang's behavior to match GCC and MSVC's behavior more. Like the OP said in the last paragraph of the 1st floor: #72383:

AFAIK, neither GCC nor MSVC have this restriction for their BMIs.

The internal reason is that, in clang, the source locations are super fundamental and sensitive. So that, if we don't embed the source files, the BMI became invalid after we change the corresponding source file.

I don't see why that's an issue; if the source code changes, not changing the BMI is a slight quality-of-implementation improvement (e.g., when user changes comments, but the actual semantics remain the same), but that hardly seems worth the concerns I have below.

And for the broader one, should users expect clang-cl to be a drop-in replacement for MSVC? There is a good post about this (https://www.reddit.com/r/cpp/comments/1e36n0w/c20_modules_with_clangcl_which_way_forward/). I feel this is pretty complex. Not only from the high level expectations (otherwise we'll always say yes) but also the implementations. From the perspective of modules, it looks too hard to implement the MSVC's BMI in clang unless MicroSoft would love to put more people to implement that to clang. But after all, I think this is a good topic to discuss to get more involved.

Yeah, that broader discussion is one we should have; and we should have the same conversation about GCC compatibility as we're 1) not compatible, 2) drifting farther apart, 3) still claim we're GCC 4.2, and 4) have never explicitly determined just how much compatibility we need/want as a community (it was an organic thing that grew out of the needs of the early developers of Clang)

  • What are the impacts on other tooling (debugger, 3rd party static analysis, etc)?

There shouldn't be any bad impact on other tools. Since all the tools have to consume the BMI via the compiler (in the same version!). Since this patch simply loose the restriction, I don't expect this have any impact on 3rd party tools.

Good to know; but this is a small part of why I'd like an RFC -- tools vendors should be made aware so they can raise concerns if they have them.

  • Is this the correct default when considering intellectual property security (will people expect their full, original source to be something that can be pulled from these artifacts)?

The consensus of SG15 for BMI is, the BMI is not good for distributed. Since the BMI format are not compatible across compilers (even with different versions!). So we would think BMI as a by-product of a build process only. Although there are thoughts about, we can distribute the BMI in a homogeneous environment where we have stable compiler versions. But such environments are closed ended environment for some specific projects or organizations. So I don't think it may affect any thing for the security as the design doesn't want us to ship the BMIs.

That's SG15. This change has serious implications for IP and I don't feel comfortable making that decision as quietly as you have. This is the primary reason why I'm insisting on an RFC. As a concrete example, large organizations often have contractors who utilize libraries written by the parent company. If the company produces a BMI for their framework (because they control the environment used by the contractor anyway, so this is safe and reasonable to do), they may accidentally ship their sensitive IP along with it.

So I continue to ask for this to be reverted and an RFC posted.

@iains
Copy link
Contributor

iains commented Sep 2, 2024

Sorry I have not made much review here - although did comment on #72383 ..

  1. it seems highly unlikely to me that the designers of the c++ modules facility would have expected it necessary to carry the source along with the BMI (but I am sure that they can comment if otherwise) edit: or, to clarify, because of (2 and 3) it is assumed that consumers always have access to the source.
  2. On the subject of IP; In order to build a BMI one needs the source, so that a BMI cannot (should not?) reflect any source code that would be private. We always expect to need to build BMIs since they are ephemeral in the current plan. Thus (at least in my understanding) the issue of exposing IP should not apply. If the assertions of this para are not true (e.g. somehow clang needs to embed source that would not otherwise be distributed) that would IMO be a show-stopper.
  3. It is agreed that SG15/WG21 does not expect BMIs to be distributed in the current model; however, there is continuing pressure from end users and library authors for some system that would allow it - so we should be reluctant to move further away from that objective.
  4. AFAIK GCC does not embed sources (and I do not recall any case where it would quote source, the diagnostics refer to locations which can have ranges) - I am not closely involved in the GCC modules effort - but have also asked other devs in case I missed something.

From an engineering perspective it seems that we (clang) needs to find a better solution?

I would agree with @AaronBallman that this needs a higher level design discussion (I have no strong personal opinion on whether the merge is reverted - but then a decision should be taken about how this is to be implemented before any release is made - and given that this might be quite heavy lifting - it might be wise to revert it in case it cannot be resolved in time - to avoid churn in user's expectations).

@iains
Copy link
Contributor

iains commented Sep 2, 2024

FWIW: As [clang, GCC, MSVC] are currently implementing modules, there is no question of compatibility at the exchange of BMIs level (so that is currently a non-issue). It does not seem that this is something in the "compatible with gcc-4.2.1" layer....

However, it is a tooling issue if one toolchain does not require sources distributed along with the BMIs but others do - so that is a potential reason to want to coordinate.

ChuanqiXu9 added a commit that referenced this pull request Sep 3, 2024
@ChuanqiXu9
Copy link
Member Author

Thanks for the write-up. I've reverted and I'll try to open a RFC to get a wider discussion.

@ChuanqiXu9
Copy link
Member Author

@AaronBallman @iains I've sent the RFC here: https://discourse.llvm.org/t/rfc-modules-should-we-embed-sources-to-the-bmi/81029

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:codegen IR generation bugs: mangling, exceptions, etc. clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Shouldn't -fmodules-embed-all-files be the default?
7 participants