-
Notifications
You must be signed in to change notification settings - Fork 13.4k
Enable AST mutation in the constant evaluator #115168
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
You can test this locally with the following command:git-clang-format --diff 05b6c2e4b933e7a3606899c72067c92b6077287b 0b2d92e0fed41ee2428c3ef8b8369790a1279a21 --extensions h,cpp -- clang/test/SemaCXX/constexpr-function-instantiation.cpp clang/include/clang/AST/ASTContext.h clang/include/clang/Sema/Sema.h clang/lib/AST/ASTContext.cpp clang/lib/AST/ByteCode/ByteCodeEmitter.cpp clang/lib/AST/ByteCode/ByteCodeEmitter.h clang/lib/AST/ByteCode/Compiler.cpp clang/lib/AST/ByteCode/Compiler.h clang/lib/AST/ByteCode/Context.cpp clang/lib/AST/ByteCode/Context.h clang/lib/AST/ExprConstant.cpp clang/lib/Sema/Sema.cpp View the diff from clang-format here.diff --git a/clang/lib/AST/ASTContext.cpp b/clang/lib/AST/ASTContext.cpp
index c02f3b5389..5aa025fdc3 100644
--- a/clang/lib/AST/ASTContext.cpp
+++ b/clang/lib/AST/ASTContext.cpp
@@ -904,8 +904,8 @@ ASTContext::ASTContext(LangOptions &LOpts, SourceManager &SM,
LangOpts.XRayAttrListFiles, SM)),
ProfList(new ProfileList(LangOpts.ProfileListFiles, SM)),
PrintingPolicy(LOpts),
- SemaProxyPtr(std::make_unique<UnimplementedSemaProxy>()),
- Idents(idents), Selectors(sels), BuiltinInfo(builtins), TUKind(TUKind),
+ SemaProxyPtr(std::make_unique<UnimplementedSemaProxy>()), Idents(idents),
+ Selectors(sels), BuiltinInfo(builtins), TUKind(TUKind),
DeclarationNames(*this), Comments(SM),
CommentCommandTraits(BumpAlloc, LOpts.CommentOpts),
CompCategories(this_()), LastSDM(nullptr, 0) {
|
EvalASTMutator
interface
This PR is still in flux. Discussions are lengthy and typically happen in meetings. |
I added Sema callback into the new interpreter in 0b2d92e. I think it went smoothly, save for the fact that source locations are not always readily available there, but they are need to correctly emit diagnostics from Sema. I invite @tbaederr to provide early feedback on my changes to the new constant interpreter. |
If we do end up needing this (and it's looking increasingly likely that we will), I think the general approach of having a set of callbacks that gets passed into the constant evaluator is the right approach. I think the older approach in this PR (a callback object that is explicitly passed to each AST operation that needs it) is significantly preferable to the newer approach (caching the callback in the
If we're going to allow AST mutation from constant evaluation, we need very strong guard rails around it to make sure it doesn't happen accidentally. We're already passing in information to the evaluator for this distinction (to determine the value of |
@zygoloid I definitely don't disagree with your points. After extensive discussions with other maintainers (Shafik, Corentin, Erich, Aaron), I went from passing the callback explicitly to very implicit approach, because of reasons that lie outside of usage of AST in Clang itself, and revolve around tools like clang-tidy:
So the approach we took is like (3) above, but maxing out on hiding this new callback from the API. If constant evaluation is triggered without We also identified that issues like #73232 can leverage this new callback. To my understanding, this was one of the reasons why idea to limit calls to callback to C++26 and newer language modes didn't have traction among maintainers. I think there's also understanding among maintainers that this PR is strictly worse that status quo whichever approach is taken. No one, including me, enjoys storing a callback in effectively a global variable ( CC @AaronBallman in case I forgot something or misrepresented your opinion on situation about external users of Clang. |
Can you expand on the "to invoke only once" bit? As I understand the position of authors of P2996, calls to |
External callers of the
I don't think so, because callers that don't have access to
I'm not following the argument here. If the default is that you don't get AST mutation, then it seems to me that you would need to explicitly and visibly opt into getting AST mutation by passing the callbacks to the evaluator. |
I think it's short-sighted to imagine that will work for the breadth of compile-time constructs we'll end up providing. People have already asked for things like a consteval function that produces a warning or an error -- is that also going to be idempotent? What about consteval facilities that generate random numbers, or produce a log file on the build machine? What about a consteval utility that generates a new anonymous global? What do you do if you want each call to your "make a new type" facility to guarantee to produce a new, distinct type? Maybe we can say no to some of those things, but I don't think we'll be able to say no to all of them forever. It seems more forward-looking to me to instead guarantee that we will evaluate exactly the things we're required to evaluate in this new mode, exactly once each. |
Consider the following example: struct S;
consteval int f() {
define_aggregate(^^S, {}); // we say this function is not idempotent,
// so we want guarantees that it can't be evaluated more than once
write_char_to_file("output.txt", 'a'); // presumably we want this to evaluated as many times as we get here
return 0;
}
constexpr int a = f();
constexpr int b = f(); I think there's a contradiction between what I'm collecting material to present to WG21 (and for my own understanding, too), because I don't feel we have a strong position there. |
This behavior of As an alternative to consteval int f() {
if (!is_defined(^^S)) {
define_aggregate(^^S, {});
}
write_char_to_file("output.txt", 'a');
return 0;
} seems like it could be workable if that's the behavior you want. (Though that doesn't have the same "check the definition is the same each time" property.) Note that idempotency is not composable if you can observe whether the idempotent action has already happened. For example: struct A;
struct B;
consteval void f() {
if (!is_defined(^^A)) { define_aggregate(^^A, {}); }
else { define_aggregate(^^B, {}); }
} Here, |
Hey Richard - We added idempotency to @Endilll re: the struct S;
consteval int f() {
define_aggregate(^^S, {});
write_char_to_file("output.txt", 'a');
return 0;
}
constexpr int a = f(); // <- this is a plainly constant-evaluated expression.
constexpr int b = f(); // <- so is this one. There is nothing about a call to size_t runtime_fn() {
struct S;
return size_of(define_aggreate(^^S, {}));
// 'size_of(...)' is a manifestly constant-evaluated immediate invocation, but
// not plainly constant-evaluated. Because it produces a declaration, the program
// is ill-formed.
} It's true that if struct S;
consteval int f() {
consteval {
define_aggregate(^^S, {});
} // <- 'define_aggregate' call is evaluated *here*.
write_char_to_file("output.txt", 'a');
return 0;
}
constexpr int a = f(); // initializer is still evaluated only once, but conteval-block is not reevaluated.
constexpr int b = f(); // same for 'b'. |
@zygoloid I think we ended up concluding that
I initially was of the same opinion as you, but I think we would have to modify most call sites, and it's unclear that it would buy us anything (On the other hand, I'm not thrilled about a pointer to Sema being stashed in PS: Note that it seems to still be unclear how we should handle try evaluation (destructors, vlas), when the evaluation fails after side effects are produced |
I'm of the same opinion. My logic when I proposed this design to Vlad was:
This approach should have the least impact on downstream consumers because the AST should be fully formed before tools like clang-tidy, etc are evaluating the AST (so side effects should have already taken place), so the evaluation results should Just Work™ in most cases, so long as the execution path was exercised by the compiler. It's not an ideal design, but I think it strikes the right balance. I wonder if there's a way we can have more of a bright line between the APIs which can mutate state and ones which can't. For example, could we come up with an attribute we require to be written on the API and is checked via a clang-tidy check?
+1, though my understanding was that if evaluation fails, then the program should be ill-formed, and so what happens with destructors, etc is not really important. Did I understand wrong though? |
It seems to me that we'd need to modify those call sites that want to perform an evaluation of a "plainly constant-evaluated expression". Aren't those exactly the call sites we need to modify anyway, in order to enable the "plainly constant-evaluated" mode, as you mention:
Yeah. So it seems like our options are:
In terms of changes to the caller, these seem the same to me. (The few "plainly constant-evaluated" cases -- all of which should be within |
I see. I've given it some thought, and I don't agree that this argument should govern the behavior of struct S;
struct S { ... }; to behave exactly the same in a C++ interpreter / REPL as struct S;
consteval { define_aggregate(^^S, {}); } I think not following that principle would create unnecessary work for the implementation and confusion for users, as well as harming the general idea that code generated by metaprogramming works the same way as code generated by source declarations. If, for a syntactic struct definition, an implementation can "undo" defining the struct and rewind back to before the definition, then I'd expect the same behavior for a metaprogramming struct definition (and your proposal shouldn't try to get in the way of that or dictate the behavior in that case, because it's outside the scope of the C++ standard). Conversely, if the implementation can't undo it in one case, I'd expect it to undo it in neither case. More broadly, if this is a real problem for an interpreter, then it's not unique to Instead of supporting some kind of an "undo" mechanism, it might make sense for an interpreter to treat definitions as idempotent in general (perhaps with a special rule that we use the most recent body for a function definition). For such an implementation that makes that choice, it would seem appropriate for |
@zygoloid would it be correct to say that you want the Standard to leave idempotency of functions with side effects up to implementations, and, consequently, you want the model of how side effects are integrated into constant evaluation to work for both idempotent and non-idempotent functions? |
No, I think the standard should specify the exact semantics of its metaprogramming primitives. And for metaprogramming actions that generate code in particular, I think they should behave the same as the source code they are intended to be equivalent to -- and thus should produce redefinition errors as usual if the same definition is created more than once, regardless of whether one or both of the definitions came from a metaprogram. For other side effects, the rules for that primitive should specify what the behavior is -- and for example, whether we want some kind of deduplication or not. Separately, I think if an interpreter wants to apply some level of idempotency to redefinitions that the standard says are invalid, in order to better support things like re-parsing the same code (but with, say, a function body changed), it should do so consistently across different kinds of entity and regardless of whether they come directly from parsing source code or from metaprogramming. But that's out of scope for current standardization efforts (one could imagine explicit support for REPL-style interpreters in the C++ standard, but we don't have any such thing right now). |
There's a part of me that wonders whether this should actually be part of the function interface (in a viral manner) so that the control is left to the interface designer. e.g., if you have a side effecting operation, the interface designer could have a way to say "side effects happen once regardless of how many calls to this function are made" or "these effects happen each time the call is made". (There could be a default behavior, I would learn towards "these effects happen each time the call is made".) This would allow for both idempotency and not, depending on the programmer's needs. The downside is, it's more complicated and viral annotations are sometimes not appreciated by everyone. |
I think I was the maintainer who mentioned that. :-) I think we want consteval function calls to be idempotent because it means we can memoize calls for improved performance and it matches existing implementation/mental models of what constant evaluation means. Certainly that helps with things like clang-repl too, but clang-repl isn't standards conforming anyway so I don't think it should push the design of the standards feature too much. I think there's no way we can make consteval function calls themselves idempotent once C++ has reflection with side effects because that introduces observable effects outside of the value computations. Same inputs can lead to different outputs. So whether define_aggregate itself is idempotent is kind of moot. What we want is for |
No description provided.