-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[libc++] Per-header header modules #86193
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@llvm/issue-subscribers-clang-modules Author: Vassil Vassilev (vgvassilev)
Commit 571178a introduces per header file module with a main motivation to enable detection of cyclic includes. This means that now, instead of having a single binary artifact `std.pcm` or similar we get hundreds of pcm files for the same content. That's problematic for us due to several reasons:
* Makes modules redistribution harder -- we carry build artifacts across nodes and it makes it difficult to enumerate all stl-related modules including private ones.
* Efficiency -- we care about binary size and in-memory footprint. Going for a many pcm files has an overhead. Consider we have several textual files (such as cassert). The textual content becomes duplicated in every `pcm` file and in all dependent `pcm` files too.
If the motivation is only to be able to diagnose cyclic includes we can do that as part of the build process of libc++. For example, If there is more motivation to the one I understood, I'd like to suggest having a separate modulemap, called @ian-twilightcoder, @ldionne, depending on the direction we get in this issue, I can work on creating the cc: @hahnjo, @dpiparo, @ChuanqiXu9. |
I agree this is broken. I've had success wrapping all the submodules up in a single top level std module. This prevents the build race conditions I run into with LLVM, and I suspect would resolve your issues as well? |
Do you have already a patch? |
My preference here would be to maintain a single modulemap and to simply gradually merge back modules without creating cycles. I suspect it is possible to drive towards a folder organization + modulemap that are consistent, don't introduce cycles and allow having decently sized submodules. If that's feasible, I think that would be the most viable alternative. |
@ldionne, I believe can detect cycles using different approaches (standalone per-header compilation or using the modulemap submodule infrastructure). Do I understand correctly your comment that we will have again a single module for all of stl? |
It's not about include cycles. Each C standard library header has to be in its own top level module to support module layering between libc++, clang, and libc*. Take inttypes.h and stdint.h. Each libc++ header does an Each library has no way of knowing what the others' versions of the C standard library headers will include, so all of the C standard library headers in all libraries must be top level modules. In C++, the C standard library headers are quite intertwined with the C++ standard library headers by standard, and so those too need to be top level modules. libc++ cannot go back to a single module for all of its headers. However, we can greatly reduce the number of modules by grouping the private implementation headers. I had a PR that did that programmatically https://reviews.llvm.org/D155141. That takes the module count from ~1000 to ~200. However, it was poorly received. I tried do something similar by hand, but it was utterly miserable to maintain because the layering is greatly perturbed every time an implementation header is added, or changes what it includes. Trying to break an 8+ module cycle is very difficult, the bad vertex is difficult to identify, and rearranging the modules to eliminate the bad vertex is also very tricky. Louis thinks it all can be done by hand, but if that's true then it requires someone with much more knowledge about how libc++ is implemented, organized, and layered than me. * if your libc does not have a module map then you can't use clang modules. As far as I know, Apple is the only libc that has module maps. I'll leave it to @Bigcheese to explain the gory details of why a modular header can't include a non-modular header if anyone wants to know. |
@ian-twilightcoder, thank you for your post and background.
Most of the times one can get away with that using modules re-exports. It might be a little more tricky to make it work across libraries. However, I am wondering how the single module approach worked until recently.
That was the case before, right? I am wondering what has changed. FWIW, we are able to make libstdc++ having a single module (since version 4.9) and I am pretty sure it was more entangled than libc++ was or will ever be. By the way, if I edit the libc++ modulemap file and wrap everything into a top-level module I get working single module. I am really confused. I am missing something... |
This is what I'd like us to do: #98214 Only the parts of the library that can be re-included from C headers should be tricky to implement and will probably require its own top-level modules, but I can't imagine that everything else also requires its own module. |
I think you'll find that __algorithm, __chrono, __functional, __memory, __ranges, __tuple, __type_traits, __utility (and others) are too intertwined for a simple approach like that, but maybe you can do some more sophisticated cleanup so that they layer better and aren't a continuous source of module cycles as their headers change. |
That's the idea. I expect that only a few things in @ian-twilightcoder If we introduce cycles, would we find out in our CI testing? In particular, would we find out about re-introducing cycles with the C stdlib headers in the SDK if we happened to break that, or would we only find out downstream a while later? |
Yeah, the cycles will show up in the modules CI tests. As long as you keep the c stdlib headers as their own top level modules, you should remain acyclic with the underlying libc. |
But that was my point initially. This mode is mostly for making sure things do not regress in layering but that’s not necessarily the right way to ship a release. You can have that mode in the CI without duplicating too much modulemap infrastructure. |
Modules have to be acyclic. And when libc is covered by a module, the only way to have acyclic modules is for the C stdlib headers like stddef.h to all have their own top level modules. |
I respectfully disagree with this statement. I’ve mentioned that one way to deal with entangled headers is re-exporting them from both modules. Could you give an example with the current libc++ codebase which requires top-level modules? We currently overwrite the libc++ modulemap to build a single module and that works without problems. |
…es (#107638) This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes #86193
…es (llvm#107638) This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193
…es (llvm#107638) This is a re-application of bc6bd3b which was reverted in f11abac because it broke the Clang pre-commit CI. Original commit message: This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193
…es (llvm#107638) This is a re-application of bc6bd3b which was reverted in f11abac because it broke the Clang pre-commit CI. Original commit message: This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes llvm#86193
…es (#110501) This is a re-application of bc6bd3b which was reverted in f11abac because it broke the Clang pre-commit CI. Original commit message: This patch rewrites the modulemap to have fewer top-level modules. Previously, our modulemap had one top level module for each header in the library, including private headers. This had the well-known problem of making compilation times terrible, in addition to being somewhat against the design principles of Clang modules. This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement. To achieve this, one might be tempted to simply put all the headers in a single top-level module. Unfortunately, this doesn't work because libc++ provides C compatibility headers (e.g. stdlib.h) which create cycles when the C Standard Library headers are modularized too. This is especially tricky since base systems are usually not modularized: as far as I know, only Xcode 16 beta contains a modularized SDK that makes this issue visible. To understand it, imagine we have the following setup: // in libc++'s include/c++/v1/module.modulemap module std { header stddef.h header stdlib.h } // in the C library's include/module.modulemap module clib { header stddef.h header stdlib.h } Now, imagine that the C library's <stdlib.h> includes <stddef.h>, perhaps as an implementation detail. When building the `std` module, libc++'s <stdlib.h> header does `#include_next <stdlib.h>` to get the C library's <stdlib.h>, so libc++ depends on the `clib` module. However, remember that the C library's <stdlib.h> header includes <stddef.h> as an implementation detail. Since the header search paths for libc++ are (and must be) before the search paths for the C library, the C library ends up including libc++'s <stddef.h>, which means it depends on the `std` module. That's a cycle. To solve this issue, this patch creates one top-level module for each C compatibility header. The rest of the libc++ headers are located in a single top-level `std` module, with two main exceptions. First, the module containing configuration headers (e.g. <__config>) has its own top-level module too, because those headers are included by the C compatibility headers. Second, we create a top-level std_core module that contains several dependency-free utilities used (directly or indirectly) from the __math subdirectory. This is needed because __math pulls in a bunch of stuff, and __math is used from the C compatibility header <math.h>. As a direct benefit of this change, we don't need to generate an artificial __std_clang_module header anymore to provide a monolithic `std` module, since our modulemap does it naturally by construction. A next step after this change would be to look into whether math.h really needs to include the contents of __math, and if so, whether libc++'s math.h truly needs to include the C library's math.h header. Removing either dependency would break this annoying cycle. Thanks to Eric Fiselier for pointing out this approach during a recent meeting. This wasn't viable before some recent refactoring, but wrapping everything (except the C headers) in a large module is by far the simplest and the most effective way of doing this. Fixes #86193
I would like to reopen this issue because, as explained in #124893 every release keeps breaking things such as root-project/root#16494 |
Let's please centralize the discussion in a single place. This issue was about top-level modules for each libc++ header, and we've moved away from this now. Just as a matter of keeping our issues list tidy, I would like to keep this closed and I am happy to keep discussing the issues you're running into in #124893. |
Commit 571178a introduces per header file module with a main motivation to enable detection of cyclic includes. This means that now, instead of having a single binary artifact
std.pcm
or similar we get hundreds of pcm files for the same content. That's problematic for us due to several reasons:pcm
file and in all dependentpcm
files too.If the motivation is only to be able to diagnose cyclic includes we can do that as part of the build process of libc++. For example,
echo "#include <libcxx/header>" | clang -xc++ -fsyntax-only -
will be a sufficient check ensuring header consistency. Alternatively, we could have a "debug" modulemap with that structure.If there is more motivation to the one I understood, I'd like to suggest having a separate modulemap, called
std.modulemap
which keeps the headers into a single pcm file and clients like us can use it instead. Note that themodule std [system] { header "__std_clang_module" export * }
does not work for us because it triggers building of all gazillion pcms.@ian-twilightcoder, @ldionne, depending on the direction we get in this issue, I can work on creating the
std.modulemap
and opening a pull request.cc: @hahnjo, @dpiparo, @ChuanqiXu9.
The text was updated successfully, but these errors were encountered: