-
Notifications
You must be signed in to change notification settings - Fork 14.5k
[clang][analyzer] Fix the false positive ArgInitializedness warning on unnamed bit-field #145066
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…n unnamed bit-field For the following code: struct B{ int i :2; int :30; // unnamed bit-field }; extern void consume_B(B); void bitfield_B_init(void) { B b1; b1.i = 1; // b1 is initialized consume_B(b1); } The current clang static analyzer gives false positive warning "Passed-by-value struct argument contains uninitialized data (e.g., field: '') [core.CallAndMessage]" when taking the source as C code. However, no such warning is generated when clang takes the source as C++ code. After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found: When FindUninitializedField::Find(const TypedValueRegion *R) is invoked, the concrete type of R is different. In C, 'b1' is considered to be a 'StackLocalsSpaceRegion', which makes 'StoreMgr.getBinding(store, loc::MemRegionVal(FR))' return an 'UndefinedVal'. While in c++, 'b1' is considered to be a 'tackArgumentsSpaceRegion', which finally makes the 'getBinding' return a SymbolVal. I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function? Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such as 'memset'. So a special check FD->isUnnamedBitField() is added in RegionStoreManager::getBindingForField in file RegionStore.cpp. To handle the false warning, a check isUnnamedBitField is also added in FindUninitializedField::Find in file CallAndMessageChecker.cpp. Testcases of unnamed bit-field are added in file call-and-message.c and call-and-message.cpp. I do not know what to do on the hash, so it may be updated?
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
@llvm/pr-subscribers-clang @llvm/pr-subscribers-clang-static-analyzer-1 Author: None (Tedlion) ChangesFor the following code: extern void consume_B(B); void bitfield_B_init(void) { The current clang static analyzer gives false positive warning "Passed-by-value struct argument contains uninitialized data (e.g., field: '') [core.CallAndMessage]" when taking the source as C code. However, no such warning is generated when clang takes the source as C++ code. After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found: When FindUninitializedField::Find(const TypedValueRegion *R) is invoked, the concrete type of R is different. In C, 'b1' is considered to be a 'StackLocalsSpaceRegion', which makes 'StoreMgr.getBinding(store, loc::MemRegionVal(FR))' return an 'UndefinedVal'. While in c++, 'b1' is considered to be a 'tackArgumentsSpaceRegion', which finally makes the 'getBinding' return a SymbolVal. I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function? Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such To handle the false warning, a check isUnnamedBitField is also added in FindUninitializedField::Find in file CallAndMessageChecker.cpp. Testcases of unnamed bit-field are added in file call-and-message.c and call-and-message.cpp. I do not know what to do on the hash, so it may be updated? Full diff: https://github.com/llvm/llvm-project/pull/145066.diff 4 Files Affected:
diff --git a/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
index 86476b32309c3..677cc6ee57ad2 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
@@ -259,7 +259,7 @@ class FindUninitializedField {
if (T->getAsStructureType()) {
if (Find(FR))
return true;
- } else {
+ } else if (!I->isUnnamedBitField()){
SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
if (V.isUndef())
return true;
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 388034b087789..1208036700f32 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -2122,8 +2122,21 @@ SVal RegionStoreManager::getBindingForField(RegionBindingsConstRef B,
if (const std::optional<SVal> &V = B.getDirectBinding(R))
return *V;
- // If the containing record was initialized, try to get its constant value.
+ // UnnamedBitField is always Undefined unless using memory operation such
+ // as 'memset'.
+ // For example, for code
+ // typedef struct {
+ // int i :2;
+ // int :30; // unnamed bit-field
+ // } A;
+ // A a = {1};
+ // The bits of the unnamed bit-field in local variable a can be anything.
const FieldDecl *FD = R->getDecl();
+ if (FD->isUnnamedBitField()) {
+ return UndefinedVal();
+ }
+
+ // If the containing record was initialized, try to get its constant value.
QualType Ty = FD->getType();
const MemRegion* superR = R->getSuperRegion();
if (const auto *VR = dyn_cast<VarRegion>(superR)) {
diff --git a/clang/test/Analysis/call-and-message.c b/clang/test/Analysis/call-and-message.c
index b79ec8c344b6c..e2fba55d3343d 100644
--- a/clang/test/Analysis/call-and-message.c
+++ b/clang/test/Analysis/call-and-message.c
@@ -1,12 +1,19 @@
// RUN: %clang_analyze_cc1 %s -verify \
// RUN: -analyzer-checker=core \
// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=true \
+// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=false \
// RUN: -analyzer-output=plist -o %t.plist
// RUN: cat %t.plist | FileCheck %s
// RUN: %clang_analyze_cc1 %s -verify=no-pointee \
// RUN: -analyzer-checker=core \
-// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false
+// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
+// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=false
+
+// RUN: %clang_analyze_cc1 %s -verify=arg-init \
+// RUN: -analyzer-checker=core \
+// RUN: -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
+// RUN: -analyzer-config core.CallAndMessage:ArgInitializedness=true
// no-pointee-no-diagnostics
@@ -22,3 +29,21 @@ void pointee_uninit(void) {
// checker, as described in the CallAndMessage comments!
// CHECK: <key>issue_hash_content_of_line_in_context</key>
// CHECK-SAME: <string>97a74322d64dca40aa57303842c745a1</string>
+
+typedef struct {
+ int i :2;
+ int :30; // unnamed bit-field
+} B;
+
+extern void consume_B(B);
+
+void bitfield_B_init(void) {
+ B b1;
+ b1.i = 1; // b1 is initialized
+ consume_B(b1);
+}
+
+void bitfield_B_uninit(void) {
+ B b2;
+ consume_B(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
+}
\ No newline at end of file
diff --git a/clang/test/Analysis/call-and-message.cpp b/clang/test/Analysis/call-and-message.cpp
index 25ae23833478b..1e76973f49e13 100644
--- a/clang/test/Analysis/call-and-message.cpp
+++ b/clang/test/Analysis/call-and-message.cpp
@@ -169,4 +169,20 @@ void record_uninit() {
// CHECK-SAME: <string>a46bb5c1ee44d4611ffeb13f7f499605</string>
// CHECK: <key>issue_hash_content_of_line_in_context</key>
// CHECK-SAME: <string>e0e0d30ea5a7b2e3a71e1931fa0768a5</string>
+
+struct B{
+ int i :2;
+ int :30; // unnamed bit-field
+};
+
+void bitfield_B_init(void) {
+ B b1;
+ b1.i = 1; // b1 is initialized
+ consume(b1);
+}
+
+void bitfield_B_uninit(void) {
+ B b2;
+ consume(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
+}
} // namespace uninit_arg
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this PR. The PR summary explains the situation well and the patch also looks correct to me modulo inline comments.
Did you pick up some issue, or just decided to fix this because it was annoying to you?
If we have a related ticket, we should link it here and close it once we are done here.
// UnnamedBitField is always Undefined unless using memory operation such | ||
// as 'memset'. | ||
// For example, for code | ||
// typedef struct { | ||
// int i :2; | ||
// int :30; // unnamed bit-field | ||
// } A; | ||
// A a = {1}; | ||
// The bits of the unnamed bit-field in local variable a can be anything. | ||
const FieldDecl *FD = R->getDecl(); | ||
if (FD->isUnnamedBitField()) { | ||
return UndefinedVal(); | ||
} | ||
|
||
// If the containing record was initialized, try to get its constant value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming the CallAndMessageChecker
is patched, do we need this patch here?
I'd rather not touch this code as it's really sensitive. And btw, reading from the Store by default gives you UndefinedVal
so I'm not sure what case this helps with. For example, a memset(0)
should also zero the padding bytes, thus if we happen to read that padding byte via a char*
the Store should still model it and return the correct value instead of handing back UndefinedVal
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To solve the false warning problem with unnamed bit-field, patch here is unnecessary.
However, I do not think getBinding
returning SymbolVal
is the correct result, which is the current behavior when parsing the source as c++. To my understanding, SymbolVal
means it is initialized, but somehow the static analyzer cannot infer the value, while UndefinedVal
means the value it stores can be anything and reading from it is an UB. Unnamed bit-field is the second case.
I understand patching here may bring influences to other usages, even though the test of check-clang-analysis does not show any. So if you think that current implementation is incorrect but we'd better keep it before fully evaluating the influences, let me leave a FIXME comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your understanding of UndefinedVal
is correct, unlike with SymbolVal
. Symbols (SymbolVal
aka. SymExpr
) we track values. We may or may not know anything about these symbols (most notably the value range that a symbol can hold). More importantly, we can combine such symbols into making larger symbols, basically embedding the history of the computation that the given variable holds at any given point in time. But this is likely not important here.
This is a critical component, so we don't accept patches without tests. Even tests are not enough to demonstrate correctness, thus we frequently ask for "measurements", or running differential analysis with and without a patch and observing the outcomes of many many real-world projects to have a better picture of what the implications are.
Frequently even doing the correct thing reveals untended other bugs that are actually worse than what we initially wanted to fix, thus effectively preventing us from doing the right thing. Don't worry, this is not the case with the CallAndMessageChecker
.
You can propose a FIXME, but without more context it can do more harm than good if put at the wrong place with a misleading content. So to approve that, we will need to do some digging where the Symbol is coming from and why do we have that Symbol instead of Undef there?
Otherwise we are better off not having this FIXME I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can provide some evidence where the UndefinedVal
(in c) and the SymbolVal
(in c++) are from.
After reverting all my changes in CallAndMessageChecker.cpp and RegionStore.cpp, I use watchpoints to find where the return values of the invoking StoreMgr.getBinding(store, loc::MemRegionVal(FR))
from:
Following is the test code:
// unnamed.c
struct B {
int i : 2;
int : 30; // unnamed bit-field
};
extern void consume_B(struct B);
void bitfield_B_init(void) {
struct B b1;
b1.i = 1; // b1 is initialized
consume_B(b1);
}
When analyzing the test code as c, via clang -cc1 -x c -analyze -analyzer-checker=core unnamed.c
, I get the stackframes:
1# RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2312
2# RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
3# RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
4# RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
5# FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263
......
// in FindUninitializedField::Find
for (const auto *I : RD->fields()) {
const FieldRegion *FR = MrMgr.getFieldRegion(I, R);
FieldChain.push_back(I);
T = I->getType();
if (T->getAsStructureType()) {
if (Find(FR))
return true;
} else {
SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
if (V.isUndef())
return true;
}
FieldChain.pop_back();
}
When I switching to frame 5 and print the FR->getRawMemorySpace()
, I get the type clang::ento::StackLocalsSpaceRegion *
While analyzing the test code as c++, via clang -cc1 -x c ++-analyze -analyzer-checker=core unnamed.c
, I get the stackframes:
RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2317
RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263
When I switching to frame 5 and print the FR->getRawMemorySpace()
, I get the type clang::ento::StackArgumentsSpaceRegion *
The raw memory space of b1
is different in c and c++, I wonder it due to the implicit copy constructor in c++?
And the difference on FR->getRawMemorySpace()
finally reaches the if check in RegionStoreManager::getBindingForFieldOrElementCommon
and returns a different result.
// line 2288 in RegionStore.cpp
if (isa<StackLocalsSpaceRegion>(R->getRawMemorySpace())) {
if (isa<ElementRegion>(R)) {
The last statement in getBindingForFieldOrElementCommon
seems a default handle, where the c++ unnamed bit-field gets its binding.
// All other values are symbolic.
return svalBuilder.getRegionValueSymbolVal(R);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From reading this, it seems to me that StackArgumentsSpaceRegion
is wrong, as this is not an argument.
What was R->dump()
in that case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@steakhal I have tested the check-clang-analysis, no more testcases break
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say then let's remove it. CompoundVals and LCVs should behave exactly the same way wrt. the bug reports.
When proposing that change, could you look into git blame to see if we miss something about the history of that continue
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The continue line is added in commit 3720e2f, when the function tryBindSmallStruct
is added. The commit log illustrates the reason for binding a LazyCompoundVal to simple struct by memberwise copy, but discusses nothing about the unnamed bit-field.
I'll make a new pull-request to fix this soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea was that a copy in C++ means a member-wise copy. And there anonym bitfields don't count.
Consequently, the padding bits may or may not get copied in a copy operation, thus they wanted to having it Undef.
I don't think it's important. We can drop this part.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the c++ standard, "Unnamed bit-fields are not members and cannot be initialized(https://eel.is/c++draft/class.bit#2.sentence-2)", and "The implicitly-defined copy/move constructor for a non-union class X performs a memberwise copy/move of its bases and members(https://eel.is/c++draft/class.copy.ctor#14.sentence-1)". So strictly speaking, an unnamed bit-field can be anything, since the memberwise copy is not required for unnamed bits.
Consider the case:
struct B {
int i : 2;
int : 30;
};
void bitfield_B_init(void) {
B b1;
b1.i = 1;
memset(&b1, 0, sizeof(b1));
B b2 = b1;
}
The unnamed bits is set after the memset. So for the statement B b2 = b1
, the unnamed bitfield in b1
is supposed to be SymbolVal
, while the unnamed bitfields in b2
should be UndefinedVal
.
Back to the code, neithor keeping nor removing the continue lines in tryBindSmallStruct
make a perfect handling on unnamed bit-fields. A perfect rule maybe that unnamed bit-fileds must be undefined after being implicitly-defined constructed. Considering the stackframes I got before and the little benefit it could bring, implementing the rule seems to be compilicated and potentially harmful. So perhaps we should leave it unchanged?
@steakhal Thank you for reviewing. It's my first try on contributing to open source community. |
✅ With the latest revision this PR passed the C/C++ code formatter. |
Please fix formatting issues. For first-time contributors, build workflows don't run automatically, members need to start them on each commit manually. P.S. I created issue on this matter #146034 |
…ker.cpp. Fix formatting issues.
remove braces on simple single statement bodies Co-authored-by: Baranov Victor <[email protected]>
remove braces on simple single statement bodies Co-authored-by: Baranov Victor <[email protected]>
@vbvictor Thank you for your help. But the Build and Test Windows " fails: |
I restarted the job, there is already an issue for such fail #145703 |
@steakhal I've reverted my change in RegionStore.cpp, since now I realize it is more complicated than I thought. Considring the costum copy construction and direct memory operations(such as the memset), simply considering the unnamed bit-field is an UndefinedVal may be wrong , even though the UndefinedVal in my case. Thank you a lot for discussion. I may work on this afterwards, but leaving the RegionStore unchanged maybe the best choice now. |
@Tedlion Congratulations on having your first Pull Request (PR) merged into the LLVM Project! Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR. Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues. How to do this, and the rest of the post-merge process, is covered in detail here. If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again. If you don't get any reports, no action is required from you. Your changes are working as expected, well done! |
For the following code in C mode: https://godbolt.org/z/3eo1MeGhe (There is no warning in C++ mode though).
After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found: When
FindUninitializedField::Find(const TypedValueRegion *R)
is invoked, the concrete type ofR
is different.In C mode,
b1
is considered to be aStackLocalsSpaceRegion
, which makesStoreMgr.getBinding(store, loc::MemRegionVal(FR))
return anUndefinedVal
.In C++ mode,
b1
is considered to be aStackArgumentsSpaceRegion
, which finally makes thegetBinding()
return aSymbolVal
.I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function?
Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such
as
memset
. So, a special checkFD->isUnnamedBitField()
is added inRegionStoreManager::getBindingForField
in fileRegionStore.cpp
.To handle the FP, a check
isUnnamedBitField
is also added inFindUninitializedField::Find()
in fileCallAndMessageChecker.cpp
.Testcases of unnamed bit-field are added in file
call-and-message.c
andcall-and-message.cpp
.I do not know what to do on the hash, so it may be updated?