Skip to content

Support '-fmodule-file-home-is-cwd' for C++ modules. #135147

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 14, 2025

Conversation

mpark
Copy link
Member

@mpark mpark commented Apr 10, 2025

-fmodule-file-home-is-cwd was added in 646e502 to support relocatable PCMs for ObjC and Clang modules. This PR extends the functionality to allow C++ modules to be relocatable.

C++ named modules today do not directly write the imported PCM's path (presumably because we know we have to be able to discover the location of the PCM through one of the -fmodule-file=<name>=<pcm> flags). However, the INPUT_FILE fields of the PCM still contains absolute paths of the input files. @ChuanqiXu9 mentioned that there's been a discussion of -fmodules-embed-all-files in #72383 that would embed the input files into the PCM itself. My understanding is that this effectively ignores the absolute paths of the input files in the PCM. Without -fmodules-embed-all-files though, the input files remain absolute paths today, which can cause unnecessary cache misses in distributed builds.

My actual use case is for C++ header units. It has the same problem of absolute path input files, but has the additional problem that header units do directly write the imported PCM's absolute path. Since header units technically are anonymous and cannot rely on -fmodule-file=<name>=<pcm> to discover the location of the PCM, it doesn't seem like it's feasible for header units to simply omit this like named modules do... but I'm not certain about this.

The proposed fix here is to extend -fmodule-file-home-is-cwd which already exists, such that

  • header units can store relative paths of the imported PCMs, and
  • relative paths of input files can be stored for both header units and named modules

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:modules C++20 modules and Clang Header Modules labels Apr 10, 2025
@mpark mpark requested a review from ChuanqiXu9 April 10, 2025 09:12
@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-clang-modules

Author: Michael Park (mpark)

Changes

-fmodule-file-home-is-cwd was added in 646e502 to support relocatable PCMs for Clang modules. This PR extends the functionality to allow standard C++ modules to be relocatable.


Full diff: https://github.com/llvm/llvm-project/pull/135147.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTWriter.cpp (+35-33)
  • (added) clang/test/Modules/relocatable-modules.cpp (+43)
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index a48c05061626a..ae771f8f014e4 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -1493,42 +1493,44 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
     unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
     RecordData::value_type Record[] = {MODULE_NAME};
     Stream.EmitRecordWithBlob(AbbrevCode, Record, WritingModule->Name);
-  }
 
-  if (WritingModule && WritingModule->Directory) {
-    SmallString<128> BaseDir;
-    if (PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd) {
-      // Use the current working directory as the base path for all inputs.
-      auto CWD = FileMgr.getOptionalDirectoryRef(".");
-      BaseDir.assign(CWD->getName());
-    } else {
-      BaseDir.assign(WritingModule->Directory->getName());
-    }
-    cleanPathForOutput(FileMgr, BaseDir);
-
-    // If the home of the module is the current working directory, then we
-    // want to pick up the cwd of the build process loading the module, not
-    // our cwd, when we load this module.
-    if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd &&
-        (!PP.getHeaderSearchInfo()
-              .getHeaderSearchOpts()
-              .ModuleMapFileHomeIsCwd ||
-         WritingModule->Directory->getName() != ".")) {
-      // Module directory.
-      auto Abbrev = std::make_shared<BitCodeAbbrev>();
-      Abbrev->Add(BitCodeAbbrevOp(MODULE_DIRECTORY));
-      Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Directory
-      unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+    auto BaseDir = [&]() -> std::optional<SmallString<128>> {
+      if (PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd) {
+        // Use the current working directory as the base path for all inputs.
+        auto CWD = FileMgr.getOptionalDirectoryRef(".");
+        return CWD->getName();
+      }
+      if (WritingModule->Directory) {
+        return WritingModule->Directory->getName();
+      }
+      return std::nullopt;
+    }();
+    if (BaseDir) {
+      cleanPathForOutput(FileMgr, *BaseDir);
+      // If the home of the module is the current working directory, then we
+      // want to pick up the cwd of the build process loading the module, not
+      // our cwd, when we load this module.
+      if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd &&
+          (!PP.getHeaderSearchInfo()
+                .getHeaderSearchOpts()
+                .ModuleMapFileHomeIsCwd ||
+           WritingModule->Directory->getName() != ".")) {
+        // Module directory.
+        auto Abbrev = std::make_shared<BitCodeAbbrev>();
+        Abbrev->Add(BitCodeAbbrevOp(MODULE_DIRECTORY));
+        Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Directory
+        unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+
+        RecordData::value_type Record[] = {MODULE_DIRECTORY};
+        Stream.EmitRecordWithBlob(AbbrevCode, Record, *BaseDir);
+      }
 
-      RecordData::value_type Record[] = {MODULE_DIRECTORY};
-      Stream.EmitRecordWithBlob(AbbrevCode, Record, BaseDir);
+      // Write out all other paths relative to the base directory if possible.
+      BaseDirectory.assign(BaseDir->begin(), BaseDir->end());
+    } else if (!isysroot.empty()) {
+      // Write out paths relative to the sysroot if possible.
+      BaseDirectory = std::string(isysroot);
     }
-
-    // Write out all other paths relative to the base directory if possible.
-    BaseDirectory.assign(BaseDir.begin(), BaseDir.end());
-  } else if (!isysroot.empty()) {
-    // Write out paths relative to the sysroot if possible.
-    BaseDirectory = std::string(isysroot);
   }
 
   // Module map file
diff --git a/clang/test/Modules/relocatable-modules.cpp b/clang/test/Modules/relocatable-modules.cpp
new file mode 100644
index 0000000000000..27e7330835b34
--- /dev/null
+++ b/clang/test/Modules/relocatable-modules.cpp
@@ -0,0 +1,43 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+
+// CHECK-NOT: MODULE_DIRECTORY
+
+// RUN: cd %t
+// RUN: %clang_cc1 -std=c++20 -emit-module-interface %t/a.cppm -o %t/a-abs.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-abs.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-abs.pcm \
+// RUN:   | FileCheck %s --check-prefix=INPUT-ABS -DPREFIX=%t
+
+// RUN: %clang_cc1 -std=c++20 -emit-module-interface %t/a.cppm -o %t/a-rel.pcm \
+// RUN:   -fmodule-file-home-is-cwd
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-rel.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-rel.pcm \
+// RUN:   | FileCheck %s --check-prefix=INPUT-REL
+
+// INPUT-ABS: <INPUT_FILE {{.*}}/> blob data = '[[PREFIX]]{{/|\\}}a.cppm'
+// INPUT-REL: <INPUT_FILE {{.*}}/> blob data = 'a.cppm'
+
+//--- a.cppm
+export module a;
+
+// RUN: cd %S
+// RUN: %clang_cc1 -std=c++20 -emit-header-unit -xc++-user-header Inputs/cxx-header.h \
+// RUN:   -o %t/cxx-header-abs.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-abs.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-abs.pcm \
+// RUN:   | FileCheck %s --check-prefix=HU-INPUT-ABS -DPREFIX=%S
+
+// RUN: %clang_cc1 -std=c++20 -emit-header-unit -xc++-user-header Inputs/cxx-header.h \
+// RUN:   -fmodule-file-home-is-cwd -o %t/cxx-header-rel.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-rel.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-rel.pcm \
+// RUN:   | FileCheck %s --check-prefix=HU-INPUT-REL
+
+// HU-INPUT-ABS: <INPUT_FILE {{.*}}/> blob data = '[[PREFIX]]{{/|\\}}Inputs{{/|\\}}cxx-header.h'
+// HU-INPUT-REL: <INPUT_FILE {{.*}}/> blob data = 'Inputs{{/|\\}}cxx-header.h'

@llvmbot
Copy link
Member

llvmbot commented Apr 10, 2025

@llvm/pr-subscribers-clang

Author: Michael Park (mpark)

Changes

-fmodule-file-home-is-cwd was added in 646e502 to support relocatable PCMs for Clang modules. This PR extends the functionality to allow standard C++ modules to be relocatable.


Full diff: https://github.com/llvm/llvm-project/pull/135147.diff

2 Files Affected:

  • (modified) clang/lib/Serialization/ASTWriter.cpp (+35-33)
  • (added) clang/test/Modules/relocatable-modules.cpp (+43)
diff --git a/clang/lib/Serialization/ASTWriter.cpp b/clang/lib/Serialization/ASTWriter.cpp
index a48c05061626a..ae771f8f014e4 100644
--- a/clang/lib/Serialization/ASTWriter.cpp
+++ b/clang/lib/Serialization/ASTWriter.cpp
@@ -1493,42 +1493,44 @@ void ASTWriter::WriteControlBlock(Preprocessor &PP, StringRef isysroot) {
     unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
     RecordData::value_type Record[] = {MODULE_NAME};
     Stream.EmitRecordWithBlob(AbbrevCode, Record, WritingModule->Name);
-  }
 
-  if (WritingModule && WritingModule->Directory) {
-    SmallString<128> BaseDir;
-    if (PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd) {
-      // Use the current working directory as the base path for all inputs.
-      auto CWD = FileMgr.getOptionalDirectoryRef(".");
-      BaseDir.assign(CWD->getName());
-    } else {
-      BaseDir.assign(WritingModule->Directory->getName());
-    }
-    cleanPathForOutput(FileMgr, BaseDir);
-
-    // If the home of the module is the current working directory, then we
-    // want to pick up the cwd of the build process loading the module, not
-    // our cwd, when we load this module.
-    if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd &&
-        (!PP.getHeaderSearchInfo()
-              .getHeaderSearchOpts()
-              .ModuleMapFileHomeIsCwd ||
-         WritingModule->Directory->getName() != ".")) {
-      // Module directory.
-      auto Abbrev = std::make_shared<BitCodeAbbrev>();
-      Abbrev->Add(BitCodeAbbrevOp(MODULE_DIRECTORY));
-      Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Directory
-      unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+    auto BaseDir = [&]() -> std::optional<SmallString<128>> {
+      if (PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd) {
+        // Use the current working directory as the base path for all inputs.
+        auto CWD = FileMgr.getOptionalDirectoryRef(".");
+        return CWD->getName();
+      }
+      if (WritingModule->Directory) {
+        return WritingModule->Directory->getName();
+      }
+      return std::nullopt;
+    }();
+    if (BaseDir) {
+      cleanPathForOutput(FileMgr, *BaseDir);
+      // If the home of the module is the current working directory, then we
+      // want to pick up the cwd of the build process loading the module, not
+      // our cwd, when we load this module.
+      if (!PP.getHeaderSearchInfo().getHeaderSearchOpts().ModuleFileHomeIsCwd &&
+          (!PP.getHeaderSearchInfo()
+                .getHeaderSearchOpts()
+                .ModuleMapFileHomeIsCwd ||
+           WritingModule->Directory->getName() != ".")) {
+        // Module directory.
+        auto Abbrev = std::make_shared<BitCodeAbbrev>();
+        Abbrev->Add(BitCodeAbbrevOp(MODULE_DIRECTORY));
+        Abbrev->Add(BitCodeAbbrevOp(BitCodeAbbrevOp::Blob)); // Directory
+        unsigned AbbrevCode = Stream.EmitAbbrev(std::move(Abbrev));
+
+        RecordData::value_type Record[] = {MODULE_DIRECTORY};
+        Stream.EmitRecordWithBlob(AbbrevCode, Record, *BaseDir);
+      }
 
-      RecordData::value_type Record[] = {MODULE_DIRECTORY};
-      Stream.EmitRecordWithBlob(AbbrevCode, Record, BaseDir);
+      // Write out all other paths relative to the base directory if possible.
+      BaseDirectory.assign(BaseDir->begin(), BaseDir->end());
+    } else if (!isysroot.empty()) {
+      // Write out paths relative to the sysroot if possible.
+      BaseDirectory = std::string(isysroot);
     }
-
-    // Write out all other paths relative to the base directory if possible.
-    BaseDirectory.assign(BaseDir.begin(), BaseDir.end());
-  } else if (!isysroot.empty()) {
-    // Write out paths relative to the sysroot if possible.
-    BaseDirectory = std::string(isysroot);
   }
 
   // Module map file
diff --git a/clang/test/Modules/relocatable-modules.cpp b/clang/test/Modules/relocatable-modules.cpp
new file mode 100644
index 0000000000000..27e7330835b34
--- /dev/null
+++ b/clang/test/Modules/relocatable-modules.cpp
@@ -0,0 +1,43 @@
+// RUN: rm -rf %t
+// RUN: mkdir -p %t
+// RUN: split-file %s %t
+
+// CHECK-NOT: MODULE_DIRECTORY
+
+// RUN: cd %t
+// RUN: %clang_cc1 -std=c++20 -emit-module-interface %t/a.cppm -o %t/a-abs.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-abs.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-abs.pcm \
+// RUN:   | FileCheck %s --check-prefix=INPUT-ABS -DPREFIX=%t
+
+// RUN: %clang_cc1 -std=c++20 -emit-module-interface %t/a.cppm -o %t/a-rel.pcm \
+// RUN:   -fmodule-file-home-is-cwd
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-rel.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/a-rel.pcm \
+// RUN:   | FileCheck %s --check-prefix=INPUT-REL
+
+// INPUT-ABS: <INPUT_FILE {{.*}}/> blob data = '[[PREFIX]]{{/|\\}}a.cppm'
+// INPUT-REL: <INPUT_FILE {{.*}}/> blob data = 'a.cppm'
+
+//--- a.cppm
+export module a;
+
+// RUN: cd %S
+// RUN: %clang_cc1 -std=c++20 -emit-header-unit -xc++-user-header Inputs/cxx-header.h \
+// RUN:   -o %t/cxx-header-abs.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-abs.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-abs.pcm \
+// RUN:   | FileCheck %s --check-prefix=HU-INPUT-ABS -DPREFIX=%S
+
+// RUN: %clang_cc1 -std=c++20 -emit-header-unit -xc++-user-header Inputs/cxx-header.h \
+// RUN:   -fmodule-file-home-is-cwd -o %t/cxx-header-rel.pcm
+
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-rel.pcm | FileCheck %s
+// RUN: llvm-bcanalyzer --dump --disable-histogram %t/cxx-header-rel.pcm \
+// RUN:   | FileCheck %s --check-prefix=HU-INPUT-REL
+
+// HU-INPUT-ABS: <INPUT_FILE {{.*}}/> blob data = '[[PREFIX]]{{/|\\}}Inputs{{/|\\}}cxx-header.h'
+// HU-INPUT-REL: <INPUT_FILE {{.*}}/> blob data = 'Inputs{{/|\\}}cxx-header.h'

Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't take a close look. But I don't feel we shall do it behind a flag. I think we should make BMI of C++20 modules (at least C++20 named modules) relocatable.

In our downstream, IIUC, we make it by always enabling "-fmodules-embed-all-files " . See discussion here: #72383

Correct me if I didn't understand your problem.

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

Hm.. I'm not too sure about C++ named modules but for header units, the absolute path of an imported PCM via -fmodule-file= gets included in the produced PCM. I think that the -fmodule-file-home-is-cwd flag (already existing, I'm not proposing to add a new one) should be respected to produce paths relative to CWD of the imported PCMs.

As far as I can tell, -fmodules-embed-all-files seems to be about embedding source files into the PCMs or something, which doesn't deal with imported PCM paths. Is that correct?

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

Ah, interesting. It looks like indeed, this doesn't apply to C++ named modules.
From: https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1593-L1608

      // We don't want to hard code the information about imported modules
      // in the C++20 named modules.

@mpark mpark changed the title Support '-fmodule-file-home-is-cwd' for C++ modules. Support '-fmodule-file-home-is-cwd' for C++ header units. Apr 10, 2025
@ChuanqiXu9
Copy link
Member

BTW, I think -fmodule-file= is not a suggested way to introduce BMIs. It will load the BMIs eagerly. And for named modules, we will load them lazily according to their name. For header units, may be we can do something with FID. It matters for cases like:

#include <...>
import xxx;

With -fmodule-file=<BMI>, we will load the BMI first and include the file. But with -fmodule=<module-name>=<BMI>, we will include the file first and load the BMI. It will cause different compilation passes. Introduce unnecessary inconsistency, which will be a burden to both developers and users. So I will suggest to load lazily always.

@ChuanqiXu9
Copy link
Member

Ah, interesting. It looks like indeed, this doesn't apply to C++ named modules. From: https://github.com/llvm/llvm-project/blob/main/clang/lib/Serialization/ASTWriter.cpp#L1593-L1608

      // We don't want to hard code the information about imported modules
      // in the C++20 named modules.

If possible, I'll suggest you to do similar things for header units. And I think it is better to avoid using the a lot flags for existing clang header modules. I feel the user interfaces (including flags) of clang header modules is confusing to users. C++20 modules may be a chance to provide a more uniform or more clear interface to users.

@mpark mpark force-pushed the relocatable-modules branch from 828378b to 06b533a Compare April 10, 2025 10:34
@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

BTW, I think -fmodule-file= is not a suggested way to introduce BMIs. It will load the BMIs eagerly. And for named modules, we will load them lazily according to their name. For header units, may be we can do something with FID. It matters for cases like:

#include <...>
import xxx;

With -fmodule-file=<BMI>, we will load the BMI first and include the file. But with -fmodule=<module-name>=<BMI>, we will include the file first and load the BMI. It will cause different compilation passes. Introduce unnecessary inconsistency, which will be a burden to both developers and users. So I will suggest to load lazily always.

Yeah, we do actually use -fmodule-file=<name>=<pcm> downstream to do the lazily loading. Thank you for the information anyway 🙂

@ChuanqiXu9
Copy link
Member

BTW, I think -fmodule-file= is not a suggested way to introduce BMIs. It will load the BMIs eagerly. And for named modules, we will load them lazily according to their name. For header units, may be we can do something with FID. It matters for cases like:

#include <...>
import xxx;

With -fmodule-file=<BMI>, we will load the BMI first and include the file. But with -fmodule=<module-name>=<BMI>, we will include the file first and load the BMI. It will cause different compilation passes. Introduce unnecessary inconsistency, which will be a burden to both developers and users. So I will suggest to load lazily always.

Yeah, we do actually use -fmodule-file=<name>=<pcm> downstream to do the lazily loading. Thank you for the information anyway 🙂

If you did -fmodule-file=<name>=<pcm> for header units, I think it is better to upstream that first. I think it is more fundamental.

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

If you did -fmodule-file=<name>=<pcm> for header units, I think it is better to upstream that first. I think it is more fundamental.

We don't do anything special downstream. As far as I know it already works today.

@ChuanqiXu9
Copy link
Member

If you did -fmodule-file=<name>=<pcm> for header units, I think it is better to upstream that first. I think it is more fundamental.

We don't do anything special downstream. As far as I know it already works today.

If -fmodule-file=<name>=<pcm> already works for header units, it is helpful to update the documents.

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

If you did -fmodule-file=<name>=<pcm> for header units, I think it is better to upstream that first. I think it is more fundamental.

We don't do anything special downstream. As far as I know it already works today.

If -fmodule-file=<name>=<pcm> already works for header units, it is helpful to update the documents.

Well, it "works" but my understanding is that it's not officially supported today in Clang. My understanding is that header units technically are anonymous, and therefore don't have a module name. Is that correct?

What we do downstream roughly is to just give the header unit a module name corresponding to its path.

clang++ -std=c++23 --precompile -xc++-user-header lib.h -fmodule-name=lib.h -o lib.pcm

then to use it, provide: -fmodule-file=lib.h=lib.pcm and -fmodule-map-file=lib.modulemap that looks like:

module "lib.h" {
  header "lib.h"
  export *
}

This makes it such that an import "lib.h" does what we want with respect to lazy loading and such.
Given that we're hacking around an unofficial implementation detail in a sense... I'm not sure it makes sense to document it per se 😕 What do you think?

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

Separately, even with named modules, with or without -fmodules-embed-all-files, the INPUT_FILE paths in the PCM are still absolute. I guess with -fmodules-embed-all-files those paths are not used since the source files are embedded, which I suppose is how it avoids problems with the absolute paths. But if we want relocatable modules without using -fmodules-embed-all-files though, is there a solution for that?

@mpark mpark force-pushed the relocatable-modules branch from 06b533a to 74e3f0a Compare April 10, 2025 19:47
@mpark mpark changed the title Support '-fmodule-file-home-is-cwd' for C++ header units. Support '-fmodule-file-home-is-cwd' for C++ modules. Apr 10, 2025
@zygoloid
Copy link
Collaborator

But if we want relocatable modules without using -fmodules-embed-all-files though, is there a solution for that?

Do you use -no-canonical-prefixes?

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

But if we want relocatable modules without using -fmodules-embed-all-files though, is there a solution for that?

Do you use -no-canonical-prefixes?

Hi Richard! Hm, no we do not. I haven't seen this before. I can try it though 🤔

EDIT: Just tried it out... doesn't seem to do anything for me. Still getting absolute paths in the PCM.

'-fmodule-file-home-is-cwd' was added in llvm@646e502
to support relocatable PCMs for Clang modules. This PR extends the
functionality for standard C++ modules.
@mpark mpark force-pushed the relocatable-modules branch from 74e3f0a to 37a5c57 Compare April 10, 2025 20:22
@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

By the way @zygoloid, it looks like you reviewed https://reviews.llvm.org/D51568 which had a similar goal, back in 2018 that didn't get committed. -fmodule-file-home-is-cwd was added by 646e502 in 2022.

@mpark
Copy link
Member Author

mpark commented Apr 10, 2025

Actually, isn't this a problem even with -fmodules-embed-all-files for caching in distributed build? I think the inputs files containing absolute paths would cause unnecessary cache misses.

@mpark
Copy link
Member Author

mpark commented Apr 11, 2025

Going back to the idea of not writing the paths of imported PCMs at all; currently the condition to not write the paths is if the imported PCM is a named module. Perhaps this condition can be extended to omit the paths if the PCM was found through -fmodule-file=<name>=<pcm>?

@ChuanqiXu9
Copy link
Member

Separately, even with named modules, with or without -fmodules-embed-all-files, the INPUT_FILE paths in the PCM are still absolute. I guess with -fmodules-embed-all-files those paths are not used since the source files are embedded, which I suppose is how it avoids problems with the absolute paths. But if we want relocatable modules without using -fmodules-embed-all-files though, is there a solution for that?

-fmodules-embed-all-files doesn't solve the problem for encoding/decoding file paths. It solves the problem when we want to access file in the BMIs.

@ChuanqiXu9
Copy link
Member

Going back to the idea of not writing the paths of imported PCMs at all; currently the condition to not write the paths is if the imported PCM is a named module. Perhaps this condition can be extended to omit the paths if the PCM was found through -fmodule-file=<name>=<pcm>?

IIRC, this relates to build systems. While not writing the imported BMI paths in the BMI, the cost is, the build system must provide a full set of -fmodule-file=<name>=<pcm> for all directly or indirectly imported modules. This is a convention between named modules and build systems (or cmake, in particular).

I think this is not related to -fmodule-file=<name>=<pcm> or not. In this way, we need to build a new convention for header units.

@mpark
Copy link
Member Author

mpark commented Apr 11, 2025

Synced with @ChuanqiXu9 offline about this. Summarizing the discussion so far:

  • There are some high-level concerns about the ecosystem, header units specifically. The complexity of the Clang interface and lack of support for header units.
  • At Meta we are starting to use header units, but with the named modules scaffolding. More specifically, we give header units "names" with -fmodule-name=, provide -fmodule-file=<name>=<pcm> as well as -fmodule-map-file= to discover the header units and to get lazy loading.
  • -fmodule-embed-all-files embeds the source files into the PCMs, allowing them to be moved to a different machine without relying on absolute filepaths of the local machine. This still leaves the absolute file paths in the PCMs, they are just not accessed. This makes them usable, but different machines do not provide identical PCMs, which causes cache misses on a distributed build system.
  • -fmodule-map-file-home-is-cwd and -fmodule-file-home-is-cwd both already exist, and the fact that -fmodule-file-home-is-cwd does not work is arguably a quirk of how the code is written.

@mpark mpark requested a review from ChuanqiXu9 April 11, 2025 17:24
Copy link
Member

@ChuanqiXu9 ChuanqiXu9 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I have some concern about the interface. But I am in the camp that we'd better to get best practice from practice instead of pure imagination. And given Meta is doing the experiments in practice, I think it is fine to let it go.

@mpark mpark merged commit 63e2963 into llvm:main Apr 14, 2025
11 checks passed
@mpark mpark deleted the relocatable-modules branch April 14, 2025 05:29
@llvm-ci
Copy link
Collaborator

llvm-ci commented Apr 14, 2025

LLVM Buildbot has detected a new failure on builder openmp-s390x-linux running on systemz-1 while building clang at step 6 "test-openmp".

Full details are available at: https://lab.llvm.org/buildbot/#/builders/88/builds/10403

Here is the relevant piece of the build log for the reference
Step 6 (test-openmp) failure: test (failure)
******************** TEST 'libomp :: tasking/issue-94260-2.c' FAILED ********************
Exit Code: -11

Command Output (stdout):
--
# RUN: at line 1
/home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp   -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src  -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic && /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/./bin/clang -fopenmp -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test -L /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/src -fno-omit-frame-pointer -mbackchain -I /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/ompt /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.src/openmp/runtime/test/tasking/issue-94260-2.c -o /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp -lm -latomic
# executed command: /home/uweigand/sandbox/buildbot/openmp-s390x-linux/llvm.build/runtimes/runtimes-bins/openmp/runtime/test/tasking/Output/issue-94260-2.c.tmp
# note: command had no output on stdout or stderr
# error: command failed with exit status: -11

--

********************


facebook-github-bot pushed a commit to facebook/buck2 that referenced this pull request Apr 15, 2025
Summary:
This diff specifies `-fmodule-file-home-is-cwd` flag which allows to produce relocatable PCMs without the internal patch from T32246672.

In the current state and our use of Clang, this flag ends up just being ignored. However, in the rollout of Clang with the internal patch reverted and [llvm#135147](llvm/llvm-project#135147) backported, it will end up having the same effect of storing relative paths in PCMs as before.

Reviewed By: dmpolukhin

Differential Revision: D72999617

fbshipit-source-id: 222857daebed514951c72882e46b0c6478938544
facebook-github-bot pushed a commit to facebook/buck2-prelude that referenced this pull request Apr 15, 2025
Summary:
This diff specifies `-fmodule-file-home-is-cwd` flag which allows to produce relocatable PCMs without the internal patch from T32246672.

In the current state and our use of Clang, this flag ends up just being ignored. However, in the rollout of Clang with the internal patch reverted and [llvm#135147](llvm/llvm-project#135147) backported, it will end up having the same effect of storing relative paths in PCMs as before.

Reviewed By: dmpolukhin

Differential Revision: D72999617

fbshipit-source-id: 222857daebed514951c72882e46b0c6478938544
var-const pushed a commit to ldionne/llvm-project that referenced this pull request Apr 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants