Skip to content

Continue fixing recent regression in clang modules support in libc++ #124893

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
vgvassilev opened this issue Jan 29, 2025 · 3 comments
Open

Continue fixing recent regression in clang modules support in libc++ #124893

vgvassilev opened this issue Jan 29, 2025 · 3 comments
Assignees
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.

Comments

@vgvassilev
Copy link
Contributor

Since years we have had a single top-level module for std in libc++ and it worked flawlessly for years.

Now libc++ decided to move to a single module per header file which generated hundred of modules. The reason is unclear to me but some contributors mentioned that using no_undeclared_includes to resolve issues with non-modular headers was a workaround. I've humbly disagreed with that statement because it is a feature designed to resolve such setups. The downside of moving towards this is we penalize downstream clients whose workflows break with every release now (see root-project/root#16494)

We have made some progress in going back to where we were years ago with #86193 and a few related issues. As a matter of fact ldionne@6bd36bf reports significant performance improvements:

This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including <ccomplex> without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement.

My suggestion was (and still is) to move back to a single module consisting of multiple submodules. There is no fundamental reason why we would not be able to do so, unfortunately, all discussion so far said it was not possible and refused to elaborate and share a reproducing example where we can help with our clang modules expertise.

This is a critical problem for the entire community and we are stuck and I do not understand why.

cc: @zygoloid, @Bigcheese, @ldionne, @ChuanqiXu9.

@vgvassilev vgvassilev added clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. labels Jan 29, 2025
@llvmbot
Copy link
Member

llvmbot commented Jan 29, 2025

@llvm/issue-subscribers-clang-modules

Author: Vassil Vassilev (vgvassilev)

Since years we have had a single top-level module for `std` in libc++ and it worked flawlessly for years.

Now libc++ decided to move to a single module per header file which generated hundred of modules. The reason is unclear to me but some contributors mentioned that using no_undeclared_includes to resolve issues with non-modular headers was a workaround. I've humbly disagreed with that statement because it is a feature designed to resolve such setups. The downside of moving towards this is we penalize downstream clients whose workflows break with every release now (see root-project/root#16494)

We have made some progress in going back to where we were years ago with #86193 and a few related issues. As a matter of fact ldionne@6bd36bf reports significant performance improvements:

This patch provides almost an order of magnitude compilation time improvement when building modularized code (certainly subject to variations). For example, including &lt;ccomplex&gt; without a module cache went from 22.4 seconds to 1.6 seconds, a 14x improvement.

My suggestion was (and still is) to move back to a single module consisting of multiple submodules. There is no fundamental reason why we would not be able to do so, unfortunately, all discussion so far said it was not possible and refused to elaborate and share a reproducing example where we can help with our clang modules expertise.

This is a critical problem for the entire community and we are stuck and I do not understand why.

cc: @zygoloid, @Bigcheese, @ldionne, @ChuanqiXu9.

@ldionne
Copy link
Member

ldionne commented Jan 29, 2025

The reason why a single monolithic module doesn't work is that it creates a cycle with the underlying system headers when those are modularized as well. For example:

// in libc++
module std {
  header "stdint.h"
  header "stddef.h"
  // etc...
}

// in the platform SDK
module Darwin {
  header "stdint.h"
  header "stdlib.h"
  // etc...
}

Given this, including <stdint.h> will find SDKROOT/usr/include/c++/v1/stdint.h since it's first on the search paths, and it will try building module std. However, libc++'s stdint.h contains #include_next <stdint.h>, which goes to the underlying platform's stdint.h header. Thus, while trying to build the std module, we'll start building the Darwin module for the underlying platform's stdint.h header.

However, the platform's underlying stdint.h header contains #include <stddef.h> (it has the right to do that as it sees fit, as an implementation detail). But that #include statement resolves back to SDKROOT/usr/include/c++/v1/stddef.h since it appears before SDKROOT/usr/include/stddef.h in the search paths. Therefore, the compiler tries building the module that SDKROOT/usr/include/c++/v1/stddef.h is a part of, and that's the std module we were already trying to build.

In other words, a cycle c++/v1/stdint.h => /stdint.h => c++/v1/stddef.h is created.

The headers in this example are fictional, but you get the point. This has been explained before at least a few times and I don't understand what's unclear or mysterious about this explanation. I also don't understand why libc++'s current modulemap definition is causing breakage at every release for you. If you can explain that in simple terms, that's definitely something we can try addressing assuming your use-case is valid. In the PR you linked, I see that Cling defines its own interpreter/cling/include/cling/std_darwin.modulemap modulemap and I don't want to be too quick to judge, but that seems like exactly the kind of stuff we can't reasonably support.

@vgvassilev
Copy link
Contributor Author

The reason why a single monolithic module doesn't work is that it creates a cycle with the underlying system headers when those are modularized as well. For example:

// in libc++
module std {
  header "stdint.h"
  header "stddef.h"
  // etc...
}

// in the platform SDK
module Darwin {
  header "stdint.h"
  header "stdlib.h"
  // etc...
}

Given this, including <stdint.h> will find SDKROOT/usr/include/c++/v1/stdint.h since it's first on the search paths, and it will try building module std. However, libc++'s stdint.h contains #include_next <stdint.h>, which goes to the underlying platform's stdint.h header. Thus, while trying to build the std module, we'll start building the Darwin module for the underlying platform's stdint.h header.

However, the platform's underlying stdint.h header contains #include <stddef.h> (it has the right to do that as it sees fit, as an implementation detail). But that #include statement resolves back to SDKROOT/usr/include/c++/v1/stddef.h since it appears before SDKROOT/usr/include/stddef.h in the search paths. Therefore, the compiler tries building the module that SDKROOT/usr/include/c++/v1/stddef.h is a part of, and that's the std module we were already trying to build.

In other words, a cycle c++/v1/stdint.h => /stdint.h => c++/v1/stddef.h is created.

The headers in this example are fictional, but you get the point. This has been explained before at least a few times and I don't understand what's unclear or mysterious about this explanation.

Well, that's the part of the misunderstanding. In this fictional example tagging the definition of module Darwin with [no_undeclared_includes] should fix the described problem. This has been explained when the feature was introduced long time ago in ed84df0 where the test case is very close to your example. That is why I was asking about a concrete reproducing example to see if that feature falls short somewhere and we can improve it. Right now, we are speculating as I cannot get to a reproducer with the problem you intend to fix...

I also don't understand why libc++'s current modulemap definition is causing breakage at every release for you. If you can explain that in simple terms, that's definitely something we can try addressing assuming your use-case is valid. In the PR you linked, I see that Cling defines its own interpreter/cling/include/cling/std_darwin.modulemap modulemap and I don't want to be too quick to judge, but that seems like exactly the kind of stuff we can't reasonably support.

Can you elaborate on "we can't reasonably support"?

We had to introduce this separate modulemap just to bring back the single module file setup, as we redistribute the module files. That basically is something we need to patch with each sdk you guys release. Part of the reason is that in the modulemap we have private files which then disappear and so on...

I think our use-case request is pretty simple. Move all libc++-related headers into a single top-level module consisting of multiple submodules. The include_next cycles can be handled with [no_undeclared_includes]. Other cycles, can be handled with re-exporting the relevant header or making the header file modular. If there are some doubts I can help -- as part of our modularization efforts we made boost.pcm work and that's by far more complicated than getting something like libstdc++ or libc++ and Darwin to work, I'd think...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi.
Projects
None yet
Development

No branches or pull requests

3 participants