-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[RFC] [clang] [CodeGen] Avoid creating global variable repeatedly when type are not specified #114948
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC] [clang] [CodeGen] Avoid creating global variable repeatedly when type are not specified #114948
Conversation
…n type are not specified
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-codegen Author: Chuanqi Xu (ChuanqiXu9) ChangesThis comes from an internal crash. I know generally it is better to reproduce it first but I do feel the pattern is pretty risky. So I am wondering if we can discuss it first. So maybe this is more of a discussion instead of a pure PR. Then story is, when we try to get or create a LLVM global for a C/C++'s global, we will try to look up the name first for the existing globals. And if we find one, we will perform some checks. If the checks pass, we will return the found one. If not, we will create a new one and replace the previous one. (Why do we want to do this? My instinct reaction is that we should abort here): llvm-project/clang/lib/CodeGen/CodeGenModule.cpp Lines 4966 to 4982 in bf43a13
llvm-project/clang/lib/CodeGen/CodeGenModule.cpp Lines 5017 to 5032 in bf43a13
The problem is, if we store the address of a global variable and the global variable got replaced later, the address we stored became a wild pointer! e.g. llvm-project/clang/lib/CodeGen/CodeGenModule.cpp Lines 2092 to 2097 in 283273f
I feel this is pretty dangerous. And to my knowledge, I think we'd better to not remove things emitted during CodeGen. Then, one of the trigger for the problem is The arguments except llvm-project/clang/lib/CodeGen/CodeGenModule.cpp Lines 5484 to 5564 in 283273f
Then problem happens, sometimes we try to get or create the global variable by the AST type, but sometimes we try to get or create the same global variable by deduced type, and if the two types differs, we may be in the trouble of wild pointer. (the two types are compatible: e.g., one is The solution or one workaround I got is, in WDYT? Full diff: https://github.com/llvm/llvm-project/pull/114948.diff 2 Files Affected:
diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index ba376f9ecfacde..9566cfb8d6e794 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -5233,11 +5233,18 @@ llvm::Constant *CodeGenModule::GetAddrOfGlobalVar(const VarDecl *D,
llvm::Type *Ty,
ForDefinition_t IsForDefinition) {
assert(D->hasGlobalStorage() && "Not a global variable");
+
+ StringRef MangledName = getMangledName(D);
+ llvm::GlobalValue *Entry = GetGlobalValue(MangledName);
QualType ASTTy = D->getType();
+ LangAS AddrSpace = ASTTy.getAddressSpace();
+
+ if (Entry && !Ty && Entry->getAddressSpace() == getContext().getTargetAddressSpace(AddrSpace))
+ return Entry;
+
if (!Ty)
Ty = getTypes().ConvertTypeForMem(ASTTy);
- StringRef MangledName = getMangledName(D);
return GetOrCreateLLVMGlobal(MangledName, Ty, ASTTy.getAddressSpace(), D,
IsForDefinition);
}
diff --git a/clang/test/CodeGen/attr-weakref2.c b/clang/test/CodeGen/attr-weakref2.c
index 114f048a851832..a67f906810faf3 100644
--- a/clang/test/CodeGen/attr-weakref2.c
+++ b/clang/test/CodeGen/attr-weakref2.c
@@ -33,7 +33,7 @@ int test4_h(void) {
}
int test4_f;
-// CHECK: @test5_f = external global i32
+// CHECK: @test5_f = extern_weak global i32
extern int test5_f;
static int test5_g __attribute__((weakref("test5_f")));
int test5_h(void) {
|
You can test this locally with the following command:git-clang-format --diff 70de0b8bea31bb734bce86581574a60a0968d838 5b7def2c1deb4315cd043bc090a7364edbaeb84c --extensions c,cpp -- clang/lib/CodeGen/CodeGenModule.cpp clang/test/CodeGen/attr-weakref2.c View the diff from clang-format here.diff --git a/clang/lib/CodeGen/CodeGenModule.cpp b/clang/lib/CodeGen/CodeGenModule.cpp
index 9566cfb8d6..75c1eb8bfa 100644
--- a/clang/lib/CodeGen/CodeGenModule.cpp
+++ b/clang/lib/CodeGen/CodeGenModule.cpp
@@ -5239,7 +5239,8 @@ llvm::Constant *CodeGenModule::GetAddrOfGlobalVar(const VarDecl *D,
QualType ASTTy = D->getType();
LangAS AddrSpace = ASTTy.getAddressSpace();
- if (Entry && !Ty && Entry->getAddressSpace() == getContext().getTargetAddressSpace(AddrSpace))
+ if (Entry && !Ty &&
+ Entry->getAddressSpace() == getContext().getTargetAddressSpace(AddrSpace))
return Entry;
if (!Ty)
|
Two things are at play here. The first is that it is possible in various ways to instruct CodeGen to try to use or define the same symbol with wildly different types. Users generally expect these things to "just work" by making them resolve to the same entity. Sema used to put CodeGen into this situation all the time with incompatible local extern declarations in C; the checking there has gotten stricter, but I believe in some cases it is still only enforced with a warning. The GNU The second is that you used to not be able to change the type of an LLVM global variable. I'm not sure if this is still true; I know that the pointer type changes added some flexibility here, but I don't know if it got us all the way to what CodeGen needs. Regardless, even if it is no longer true, for the extended period that it was true, CodeGen had no choice but to replace the existing global in order to change its type. It's possible that this can be simplified now. CodeGen needs to be able to change the IR type of a global variable for two reasons:
|
See also #102553 which stopped doing the global replacement for changes to the initializer type. I think it's reasonable to do something similar here, but I believe the change for that should be inside GetOrCreateLLVMGlobal, not in GetAddrOfGlobalVar. I think the main remaining limitation in this area is that we can't change global AS in-place, as that is part of the pointer type. |
Thanks for the quick reply! If we want to change the type of a global variable, maybe we can use llvm-project/llvm/include/llvm/IR/Value.h Lines 809 to 817 in cdfd4cf
I hesitated since its comment say it is dangerous. But @rjmccall 's comments say it is more or less "just works" now. And I feel the wild pointers are dangerous too..
How about only do this only if the AS are the same? e.g.:
|
I don't think there's any situation in which Clang needs to change the address space of a declaration. It can happen if the programmer has declarations that disagree about the address space in which the entity is defined, but it's fair to just emit an error in that situation. |
Trying to mutate the type of a global is still unsafe. The benefit of opaque pointers here is that getValueType() is independent from getType(), so it's safe to rewrite the ValueType. (This is what GlobalVariable::replaceInitializer() does.)
I wouldn't be surprised if there's some GPU stuff that relies on this, maybe by accident. |
The problem to make it in |
This comes from an internal crash. I know generally it is better to reproduce it first but I do feel the pattern is pretty risky. So I am wondering if we can discuss it first. So maybe this is more of a discussion instead of a pure PR.
Then story is, when we try to get or create a LLVM global for a C/C++'s global, we will try to look up the name first for the existing globals. And if we find one, we will perform some checks. If the checks pass, we will return the found one. If not, we will create a new one and replace the previous one. (Why do we want to do this? My instinct reaction is that we should abort here):
llvm-project/clang/lib/CodeGen/CodeGenModule.cpp
Lines 4966 to 4982 in bf43a13
llvm-project/clang/lib/CodeGen/CodeGenModule.cpp
Lines 5017 to 5032 in bf43a13
The problem is, if we store the address of a global variable and the global variable got replaced later, the address we stored became a wild pointer! e.g.
llvm-project/clang/lib/CodeGen/CodeGenModule.cpp
Lines 2092 to 2097 in 283273f
I feel this is pretty dangerous. And to my knowledge, I think we'd better to not remove things emitted during CodeGen.
Then, one of the trigger for the problem is
CodeGenModule::GetAddrOfGlobalVar
:https://github.com/llvm/llvm-project/blob/283273fa1e3be4a03f06a5efd08a8c818be981fd/clang/lib/CodeGen/CodeGenModule.cpp#L5232C17-L5243
The arguments except
D
can be omitted. And if we don't specifyTy
, the function will try to deduce the type fromD
. And use the type to get or create a LLVM global in the above process. And theTy
arguments may not always be omitted, e.g., inllvm-project/clang/lib/CodeGen/CodeGenModule.cpp
Lines 5484 to 5564 in 283273f
Then problem happens, sometimes we try to get or create the global variable by the AST type, but sometimes we try to get or create the same global variable by deduced type, and if the two types differs, we may be in the trouble of wild pointer.
(the two types are compatible: e.g., one is
struct { %another.struct}
with%another.struct = { ptr }
and another type is{ { ptr } }
).The solution or one workaround I got is, in
CodeGenModule::GetAddrOfGlobalVar
, if we didn't specify theTy
and we have the same variable, return the variable directly. I think it makes sense since if theTy
is not specified, it implies the caller doesn't care about it too much.WDYT?