Skip to content

[Doc] Update documentation for no-transitive-change #96453

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Jul 19, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions clang/docs/ReleaseNotes.rst
Original file line number Diff line number Diff line change
@@ -157,6 +157,13 @@ here. Generic improvements to Clang as a whole or to its underlying
infrastructure are described first, followed by language-specific
sections with improvements to Clang's support for those languages.

- Implemented improvements to BMIs for C++20 Modules that can reduce
the number of rebuilds during incremental recompilation. We are seeking
feedback from Build System authors and other interested users, especially
when you feel Clang changes the BMI and misses an opportunity to avoid
recompilations or causes correctness issues. See StandardCPlusPlusModules
`StandardCPlusPlusModules <StandardCPlusPlusModules.html>`_ for more details.

- The ``\par`` documentation comment command now supports an optional
argument, which denotes the header of the paragraph started by
an instance of the ``\par`` command comment. The implementation
143 changes: 143 additions & 0 deletions clang/docs/StandardCPlusPlusModules.rst
Original file line number Diff line number Diff line change
@@ -652,6 +652,149 @@ in the future. The expected roadmap for Reduced BMIs as of Clang 19.x is:
comes, the term BMI will refer to the Reduced BMI and the Full BMI will only
be meaningful to build systems which elect to support two-phase compilation.

Experimental Non-Cascading Changes
----------------------------------

This section is primarily for build system vendors. For end compiler users,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been discussed somewhere that I missed, please point me to that if that's the case.

I was putting a little more thought into it and am wondering what would be a way for the build systems to take advantage of that?

When compiling something that dependends on modules, the build system will need to check if any of the (transitive) dependencies of the modules changes, right? Because any BMI that was changed in the transitive set can potentially affect the result of the current compilations.

So even if direct dependencies did not change (because of this optimization), despite a change in the deeper transitive dependency, the build system still can't reuse the result and has to rerun the compile.

We only know that the output BMI wasn't affected by any of the transitive changes after we finish compiling it and can compare the outputs, right?
But the build system needs to know about it before the compilation happens to avoid recompilations.

I suspect I'm missing something, but don't know what...

Copy link
Member Author

@ChuanqiXu9 ChuanqiXu9 Jun 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may have been discussed somewhere that I missed, please point me to that if that's the case.

I was putting a little more thought into it and am wondering what would be a way for the build systems to take advantage of that?

When compiling something that dependends on modules, the build system will need to check if any of the (transitive) dependencies of the modules changes, right? Because any BMI that was changed in the transitive set can potentially affect the result of the current compilations.

So even if direct dependencies did not change (because of this optimization), despite a change in the deeper transitive dependency, the build system still can't reuse the result and has to rerun the compile.

Yes, and this is the reason why it is experimental.

We only know that the output BMI wasn't affected by any of the transitive changes after we finish compiling it and can compare the outputs, right? But the build system needs to know about it before the compilation happens to avoid recompilations.

I suspect I'm missing something, but don't know what...

It is not about to avoid the re-compilation for that unchanged BMI, but for other TUs that only dependent on the unchanged BMI. For example,

// a.cppm
export module a;
// intentional empty
// b.cppm
export module b;
import a;
// intentional empty
// c.cpp
import b;
...

For this example, every time a.cppm changes, we need to recompile b.cppm. But if the BMI of module B doesn't change at all, it should be good to not recompile c.cpp.

So this is the ability provided by the compiler that all the needed changes are propagated to the BMI:

for (Module *M : TouchedTopLevelModules)
Hasher.update(M->Signature);
then the build system can try to ignore the changes from transitively imported things.

The theory of this is that, the users of a build system, can only access the entities in the indirectly imported modules via the directly imported modules. So that the directly imported modules have a full control of accessiable entities in the indirectly imported modules for their (the directly imported modules) users.


Maybe it is a good idea for the build system to provide a verify option that the skippable compilations can have the same result if they are not skipped.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I thought is happening, thanks for confirming my intuition.
I think it'd be great to spell this out in the documentation, because I believe this mode of operation is something that the build system vendors might need to infer from the whole text right now.

Adding something like the following should do the trick:

We encourage build systems to add an experimental mode that
reuses the cached BMI when **direct** dependencies did not change,
even if **transitive** dependencies did change.

PS My 2 cents on Bazel: I suspect this actually goes against its design. Bazel has very strong capabilities to avoid recompilations when inputs don't change, but it relies on stable hashes for all inputs (including transitive dependencies).
At the same time I suspect that other build systems (CMake + ccache / sccache) are more flexible in that regard.

PPS I am not a build system expert, so take that with a grain of salt. I'm also happy to loop in Bazel folks I work with to confirm or rebut my claims about Bazel's design, if that's useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, added suggested text.

For bazel, I think it might be fine since it should be configurable that whether not to recompile a target as far as I know.

if you don't want to read it all, this is helpful to reduce recompilations.
We encourage build system vendors and end users try this out and bring feedback.

Before Clang 19, a change in BMI of any (transitive) dependency would cause the
outputs of the BMI to change. Starting with Clang 19, changes to non-direct
dependencies should not directly affect the output BMI, unless they affect the
results of the compilations. We expect that there are many more opportunities
for this optimization than we currently have realized and would appreaciate
feedback about missed optimization opportunities. For example,

.. code-block:: c++

// m-partA.cppm
export module m:partA;

// m-partB.cppm
export module m:partB;
export int getB() { return 44; }

// m.cppm
export module m;
export import :partA;
export import :partB;

// useBOnly.cppm
export module useBOnly;
import m;
export int B() {
return getB();
}

// Use.cc
import useBOnly;
int get() {
return B();
}

To compile the project (for brevity, some commands are omitted.):

.. code-block:: console

$ clang++ -std=c++20 m-partA.cppm --precompile -o m-partA.pcm
$ clang++ -std=c++20 m-partB.cppm --precompile -o m-partB.pcm
$ clang++ -std=c++20 m.cppm --precompile -o m.pcm -fprebuilt-module-path=.
$ clang++ -std=c++20 useBOnly.cppm --precompile -o useBOnly.pcm -fprebuilt-module-path=.
$ md5sum useBOnly.pcm
07656bf4a6908626795729295f9608da useBOnly.pcm

If the interface of ``m-partA.cppm`` is changed to:

.. code-block:: c++

// m-partA.v1.cppm
export module m:partA;
export int getA() { return 43; }

and the BMI for ``useBOnly`` is recompiled as in:

.. code-block:: console

$ clang++ -std=c++20 m-partA.cppm --precompile -o m-partA.pcm
$ clang++ -std=c++20 m-partB.cppm --precompile -o m-partB.pcm
$ clang++ -std=c++20 m.cppm --precompile -o m.pcm -fprebuilt-module-path=.
$ clang++ -std=c++20 useBOnly.cppm --precompile -o useBOnly.pcm -fprebuilt-module-path=.
$ md5sum useBOnly.pcm
07656bf4a6908626795729295f9608da useBOnly.pcm

then the contents of ``useBOnly.pcm`` remain unchanged.
Consequently, if the build system only bases recompilation decisions on directly imported modules,
it becomes possible to skip the recompilation of ``Use.cc``.
It should be fine because the altered interfaces do not affect ``Use.cc`` in any way;
the changes do not cascade.

When ``Clang`` generates a BMI, it records the hash values of all potentially contributory BMIs
for the BMI being produced. This ensures that build systems are not required to consider
transitively imported modules when deciding whether to recompile.

What is considered to be a potential contributory BMIs is currently unspecified.
However, it is a severe bug for a BMI to remain unchanged following an observable change
that affects its consumers.

Build systems may utilize this optimization by doing an update-if-changed operation to the BMI
that is consumed from the BMI that is output by the compiler.

We encourage build systems to add an experimental mode that
reuses the cached BMI when **direct** dependencies did not change,
even if **transitive** dependencies did change.

Given there are potential compiler bugs, we recommend that build systems
support this feature as a configurable option so that users
can go back to the transitive change mode safely at any time.

Interactions with Reduced BMI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With reduced BMI, non-cascading changes can be more powerful. For example,

.. code-block:: c++

// A.cppm
export module A;
export int a() { return 44; }

// B.cppm
export module B;
import A;
export int b() { return a(); }

.. code-block:: console

$ clang++ -std=c++20 A.cppm -c -fmodule-output=A.pcm -fexperimental-modules-reduced-bmi -o A.o
$ clang++ -std=c++20 B.cppm -c -fmodule-output=B.pcm -fexperimental-modules-reduced-bmi -o B.o -fmodule-file=A=A.pcm
$ md5sum B.pcm
6c2bd452ca32ab418bf35cd141b060b9 B.pcm

And let's change the implementation for ``A.cppm`` into:

.. code-block:: c++

export module A;
int a_impl() { return 99; }
export int a() { return a_impl(); }

and recompile the example:

.. code-block:: console

$ clang++ -std=c++20 A.cppm -c -fmodule-output=A.pcm -fexperimental-modules-reduced-bmi -o A.o
$ clang++ -std=c++20 B.cppm -c -fmodule-output=B.pcm -fexperimental-modules-reduced-bmi -o B.o -fmodule-file=A=A.pcm
$ md5sum B.pcm
6c2bd452ca32ab418bf35cd141b060b9 B.pcm

We should find the contents of ``B.pcm`` remains the same. In this case, the build system is
allowed to skip recompilations of TUs which solely and directly depend on module ``B``.

This only happens with a reduced BMI. With reduced BMIs, we won't record the function body
of ``int b()`` in the BMI for ``B`` so that the module ``A`` doesn't contribute to the BMI of ``B``
and we have less dependencies.

Performance Tips
----------------