Skip to content

[C++20] [Modules] We didn't implement lookup in module level actually #90154

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Tracked by #112295
ChuanqiXu9 opened this issue Apr 26, 2024 · 7 comments · Fixed by #122887 or #123281
Closed
Tracked by #112295

[C++20] [Modules] We didn't implement lookup in module level actually #90154

ChuanqiXu9 opened this issue Apr 26, 2024 · 7 comments · Fixed by #122887 or #123281
Labels
clang:modules C++20 modules and Clang Header Modules

Comments

@ChuanqiXu9
Copy link
Member

Minimal reproducer:

// a.cppm
export module a;
namespace a { int a = 43; }

// use.cc
import a;
namespace a {
    double a = 43.0
}

We should accept the example since the a::a in module a is not visible from use.cc and also the a::a in module a is a different entity with a::a in use.cc so they don't violate the ODR rule.

The root reason is that how do we handle the module's visibility in clang. Now in clang, the process to judge the visibility of declarations is:

lookup  a name -> lookup the name in **all** imported modules -> judge if all the found declarations are valid (ODR Check, etc...) -> check the visibility of found declarations

Then the reason about the failure is clear. We check the visibility after we check the validness of found declarations. Then it booms out in the checking validness phase. So the visibility doesn't get involved here.

The reason why we implement it as this is that:

  • We can only get the visibility of a declaration after deserializing it. So we must check the visibility after deserializing it.
  • If we deserialize something, we'd better to check its validness immediately. Otherwise we may be in unsafe state.

To solve the first problem, we need to be able to get the module file of a declaration before deserializing it. This requires us to change the encodings of serialized declaration ID. See #86912 for example.

For the second problem, maybe we have to refactor the current design of declarations to make a::a@a to be a well-defined different entity with a::a in the global module.

@ChuanqiXu9 ChuanqiXu9 added the clang:modules C++20 modules and Clang Header Modules label Apr 26, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 26, 2024

@llvm/issue-subscribers-clang-modules

Author: Chuanqi Xu (ChuanqiXu9)

Minimal reproducer:
// a.cppm
export module a;
namespace a { int a = 43; }

// use.cc
import a;
namespace a {
    double a = 43.0
}

We should accept the example since the a::a in module a is not visible from use.cc and also the a::a in module a is a different entity with a::a in use.cc so they don't violate the ODR rule.

The root reason is that how do we handle the module's visibility in clang. Now in clang, the process to judge the visibility of declarations is:

lookup  a name -> lookup the name in **all** imported modules -> judge if all the found declarations are valid (ODR Check, etc...) -> check the visibility of found declarations

Then the reason about the failure is clear. We check the visibility after we check the validness of found declarations. Then it booms out in the checking validness phase. So the visibility doesn't get involved here.

The reason why we implement it as this is that:

  • We can only get the visibility of a declaration after deserializing it. So we must check the visibility after deserializing it.
  • If we deserialize something, we'd better to check its validness immediately. Otherwise we may be in unsafe state.

To solve the first problem, we need to be able to get the module file of a declaration before deserializing it. This requires us to change the encodings of serialized declaration ID. See #86912 for example.

For the second problem, maybe we have to refactor the current design of declarations to make a::a@<!-- -->a to be a well-defined different entity with a::a in the global module.

@ChuanqiXu9
Copy link
Member Author

Oh, I need to recheck the wording to make sure if the reproducer is valid. Since it looks like it violates the rules in https://eel.is/c++draft/basic.def.odr#15.3:

Each such definition shall not be attached to a named module

@ChuanqiXu9
Copy link
Member Author

After identified, the reproducer should be accepted. According to https://eel.is/c++draft/basic.pre#10 and https://eel.is/c++draft/basic.link#2, we judge the same definable item by the name, and these name have different linkages, so https://eel.is/c++draft/basic.def.odr#15.3 may not fall in here.

@ChuanqiXu9
Copy link
Member Author

One possible implementation: we should divide a name lookup table for reachable but not visible names than the current lookup table. And due to we've already encoded the module file information in the Decl ID, so we can know whether a Decl belongs to which declaration before deserializing.

So the lookup implementation will become to:

the current process lookup + the module local lookup.

@ChuanqiXu9
Copy link
Member Author

It still looks tricky since there are declaration contexts across multiple modules (e.g, namespace).

@ChuanqiXu9
Copy link
Member Author

One possible implementation: we should divide a name lookup table for reachable but not visible names than the current lookup table. And due to we've already encoded the module file information in the Decl ID, so we can know whether a Decl belongs to which declaration before deserializing.

So the lookup implementation will become to:

the current process lookup + the module local lookup.

And this can't implement it 100% percently. Since it just tried to not load the declaration from other modules. But it doesn't handle the case that both the names are loaded.

After all, we need to face the case that these two names are loaded.

@ChuanqiXu9
Copy link
Member Author

The fundamental reason for this is that: clang didn't attach linkages to names but attach linkages to declarations. But in the spec, the linkages are attached to the names. So here is the root of the mismatch.

One fundamental solution is to refactor DeclarationName to make it tied with modules. This may be pretty foundamental.

The other one is keep the current implementation direction to mimic the standard behavior.

davidstone added a commit to davidstone/operators that referenced this issue Aug 15, 2024
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jan 14, 2025
Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as the key.
And refactored `DeclContext::lookup()` method to take the module
information. So that a lookup in a DeclContext won't load declarations
that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jan 14, 2025
Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as the key.
And refactored `DeclContext::lookup()` method to take the module
information. So that a lookup in a DeclContext won't load declarations
that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jan 15, 2025
Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as the key.
And refactored `DeclContext::lookup()` method to take the module
information. So that a lookup in a DeclContext won't load declarations
that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jan 15, 2025
Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as the key.
And refactored `DeclContext::lookup()` method to take the module
information. So that a lookup in a DeclContext won't load declarations
that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
ChuanqiXu9 added a commit that referenced this issue Jan 15, 2025
Close #90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.

---

On the API side, this patch unfortunately add a maybe-confusing argument
`Module *NamedModule` to
`ExternalASTSource::FindExternalVisibleDeclsByName()`. People may think
we can get the information from the first argument `const DeclContext
*DC`. But sadly there are declarations (e.g., namespace) can appear in
multiple different modules as a single declaration. So we have to add
additional information to indicate this.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Jan 15, 2025
Close llvm/llvm-project#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.

---

On the API side, this patch unfortunately add a maybe-confusing argument
`Module *NamedModule` to
`ExternalASTSource::FindExternalVisibleDeclsByName()`. People may think
we can get the information from the first argument `const DeclContext
*DC`. But sadly there are declarations (e.g., namespace) can appear in
multiple different modules as a single declaration. So we have to add
additional information to indicate this.
ChuanqiXu9 added a commit to ChuanqiXu9/llvm-project that referenced this issue Jan 17, 2025
Close llvm#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.

---

On the API side, this patch unfortunately add a maybe-confusing argument
`Module *NamedModule` to
`ExternalASTSource::FindExternalVisibleDeclsByName()`. People may think
we can get the information from the first argument `const DeclContext
*DC`. But sadly there are declarations (e.g., namespace) can appear in
multiple different modules as a single declaration. So we have to add
additional information to indicate this.
ChuanqiXu9 added a commit that referenced this issue Jan 17, 2025
Close #90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
github-actions bot pushed a commit to arm/arm-toolchain that referenced this issue Jan 17, 2025
…123281)

Close llvm/llvm-project#90154

This patch is also an optimization to the lookup process to utilize the
information provided by `export` keyword.

Previously, in the lookup process, the `export` keyword only takes part
in the check part, it doesn't get involved in the lookup process. That
said, previously, in a name lookup for 'name', we would load all of
declarations with the name 'name' and check if these declarations are
valid or not. It works well. But it is inefficient since it may load
declarations that may not be wanted.

Note that this patch actually did a trick in the lookup process instead
of bring module information to DeclarationName or considering module
information when deciding if two declarations are the same. So it may
not be a surprise to me if there are missing cases. But it is not a
regression. It should be already the case. Issue reports are welcomed.

In this patch, I tried to split the big lookup table into a lookup table
as before and a module local lookup table, which takes a combination of
the ID of the DeclContext and hash value of the primary module name as
the key. And refactored `DeclContext::lookup()` method to take the
module information. So that a lookup in a DeclContext won't load
declarations that are local to **other** modules.

And also I think it is already beneficial to split the big lookup table
since it may reduce the conflicts during lookups in the hash table.

BTW, this patch introduced a **regression** for a reachability rule in
C++20 but it was false-negative. See
'clang/test/CXX/module/module.interface/p7.cpp' for details.

This patch is not expected to introduce any other
regressions for non-c++20-modules users since the module local lookup
table should be empty for them.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:modules C++20 modules and Clang Header Modules
Projects
None yet
2 participants