Skip to content

[clang][analyzer] Fix the false positive ArgInitializedness warning on unnamed bit-field #145066

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 3, 2025

Conversation

Tedlion
Copy link
Contributor

@Tedlion Tedlion commented Jun 20, 2025

For the following code in C mode: https://godbolt.org/z/3eo1MeGhe (There is no warning in C++ mode though).

struct B {
  int i : 2;
  int : 30;  // unnamed bit-field
};

extern void consume_B(struct B);

void bitfield_B_init(void) {
  struct B b1;
  b1.i = 1; // b1 is initialized
  consume_B(b1); // FP: Passed-by-value struct argument contains uninitialized data (e.g., field: '') [core.CallAndMessage]
}

After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found: When FindUninitializedField::Find(const TypedValueRegion *R) is invoked, the concrete type of R is different.

In C mode, b1 is considered to be a StackLocalsSpaceRegion, which makes StoreMgr.getBinding(store, loc::MemRegionVal(FR)) return an UndefinedVal.
In C++ mode, b1 is considered to be a StackArgumentsSpaceRegion, which finally makes the getBinding() return a SymbolVal.
I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function?

Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such
as memset. So, a special check FD->isUnnamedBitField() is added in RegionStoreManager::getBindingForField in file RegionStore.cpp.

To handle the FP, a check isUnnamedBitField is also added in FindUninitializedField::Find() in file CallAndMessageChecker.cpp.

Testcases of unnamed bit-field are added in file call-and-message.c and call-and-message.cpp.
I do not know what to do on the hash, so it may be updated?

…n unnamed bit-field

For the following code:
 struct B{
   int i  :2;
   int    :30;  // unnamed bit-field
 };

 extern void consume_B(B);

 void bitfield_B_init(void) {
   B b1;
   b1.i = 1; // b1 is initialized
   consume_B(b1);
 }

The current clang static analyzer gives false positive warning "Passed-by-value struct argument contains uninitialized data (e.g., field: '') [core.CallAndMessage]" when taking the source as C code. However, no such warning is generated when clang takes the source as C++ code.

After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found:
When FindUninitializedField::Find(const TypedValueRegion *R) is invoked, the concrete type of R is different. In C, 'b1' is considered to be a 'StackLocalsSpaceRegion', which makes 'StoreMgr.getBinding(store, loc::MemRegionVal(FR))' return an 'UndefinedVal'. While in c++, 'b1' is considered to be a 'tackArgumentsSpaceRegion', which finally makes the 'getBinding' return a SymbolVal. I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function?

 Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such
as 'memset'. So a special check FD->isUnnamedBitField() is added in RegionStoreManager::getBindingForField in
file RegionStore.cpp.

To handle the false warning, a check isUnnamedBitField is also added in FindUninitializedField::Find in file CallAndMessageChecker.cpp.

Testcases of unnamed bit-field are added in file call-and-message.c and call-and-message.cpp. I do not know what to do on the hash, so it may be updated?
Copy link

Thank you for submitting a Pull Request (PR) to the LLVM Project!

This PR will be automatically labeled and the relevant teams will be notified.

If you wish to, you can add reviewers by using the "Reviewers" section on this page.

If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using @ followed by their GitHub username.

If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers.

If you have further questions, they may be answered by the LLVM GitHub User Guide.

You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums.

@llvmbot llvmbot added clang Clang issues not falling into any other category clang:static analyzer labels Jun 20, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 20, 2025

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: None (Tedlion)

Changes

For the following code:
struct B{
int i :2;
int :30; // unnamed bit-field
};

extern void consume_B(B);

void bitfield_B_init(void) {
B b1;
b1.i = 1; // b1 is initialized
consume_B(b1);
}

The current clang static analyzer gives false positive warning "Passed-by-value struct argument contains uninitialized data (e.g., field: '') [core.CallAndMessage]" when taking the source as C code. However, no such warning is generated when clang takes the source as C++ code.

After comparing the CallAndMessageChecker's different behaviors between C and C++, the reason is found: When FindUninitializedField::Find(const TypedValueRegion *R) is invoked, the concrete type of R is different. In C, 'b1' is considered to be a 'StackLocalsSpaceRegion', which makes 'StoreMgr.getBinding(store, loc::MemRegionVal(FR))' return an 'UndefinedVal'. While in c++, 'b1' is considered to be a 'tackArgumentsSpaceRegion', which finally makes the 'getBinding' return a SymbolVal. I am not quite sure about the region difference, maybe in C++ there is an implicit copy constructor function?

Anyway, the unnamed bit-field is undefined, for it cannot be written unless using memory operation such
as 'memset'. So a special check FD->isUnnamedBitField() is added in RegionStoreManager::getBindingForField in file RegionStore.cpp.

To handle the false warning, a check isUnnamedBitField is also added in FindUninitializedField::Find in file CallAndMessageChecker.cpp.

Testcases of unnamed bit-field are added in file call-and-message.c and call-and-message.cpp. I do not know what to do on the hash, so it may be updated?


Full diff: https://github.com/llvm/llvm-project/pull/145066.diff

4 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp (+1-1)
  • (modified) clang/lib/StaticAnalyzer/Core/RegionStore.cpp (+14-1)
  • (modified) clang/test/Analysis/call-and-message.c (+26-1)
  • (modified) clang/test/Analysis/call-and-message.cpp (+16)
diff --git a/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp b/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
index 86476b32309c3..677cc6ee57ad2 100644
--- a/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/CallAndMessageChecker.cpp
@@ -259,7 +259,7 @@ class FindUninitializedField {
         if (T->getAsStructureType()) {
           if (Find(FR))
             return true;
-        } else {
+        } else if (!I->isUnnamedBitField()){
           SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
           if (V.isUndef())
             return true;
diff --git a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
index 388034b087789..1208036700f32 100644
--- a/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
+++ b/clang/lib/StaticAnalyzer/Core/RegionStore.cpp
@@ -2122,8 +2122,21 @@ SVal RegionStoreManager::getBindingForField(RegionBindingsConstRef B,
   if (const std::optional<SVal> &V = B.getDirectBinding(R))
     return *V;
 
-  // If the containing record was initialized, try to get its constant value.
+  // UnnamedBitField is always Undefined unless using memory operation such
+  // as 'memset'.
+  // For example, for code
+  //    typedef struct {
+  //      int i  :2;
+  //      int    :30;  // unnamed bit-field
+  //    } A;
+  //    A a = {1};
+  // The bits of the unnamed bit-field in local variable a can be anything.
   const FieldDecl *FD = R->getDecl();
+  if (FD->isUnnamedBitField()) {
+      return UndefinedVal();
+  }
+
+  // If the containing record was initialized, try to get its constant value.
   QualType Ty = FD->getType();
   const MemRegion* superR = R->getSuperRegion();
   if (const auto *VR = dyn_cast<VarRegion>(superR)) {
diff --git a/clang/test/Analysis/call-and-message.c b/clang/test/Analysis/call-and-message.c
index b79ec8c344b6c..e2fba55d3343d 100644
--- a/clang/test/Analysis/call-and-message.c
+++ b/clang/test/Analysis/call-and-message.c
@@ -1,12 +1,19 @@
 // RUN: %clang_analyze_cc1 %s -verify \
 // RUN:   -analyzer-checker=core \
 // RUN:   -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=true \
+// RUN:   -analyzer-config core.CallAndMessage:ArgInitializedness=false \
 // RUN:   -analyzer-output=plist -o %t.plist
 // RUN: cat %t.plist | FileCheck %s
 
 // RUN: %clang_analyze_cc1 %s -verify=no-pointee \
 // RUN:   -analyzer-checker=core \
-// RUN:   -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false
+// RUN:   -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
+// RUN:   -analyzer-config core.CallAndMessage:ArgInitializedness=false
+
+// RUN: %clang_analyze_cc1 %s -verify=arg-init \
+// RUN:   -analyzer-checker=core \
+// RUN:   -analyzer-config core.CallAndMessage:ArgPointeeInitializedness=false \
+// RUN:   -analyzer-config core.CallAndMessage:ArgInitializedness=true
 
 // no-pointee-no-diagnostics
 
@@ -22,3 +29,21 @@ void pointee_uninit(void) {
 // checker, as described in the CallAndMessage comments!
 // CHECK: <key>issue_hash_content_of_line_in_context</key>
 // CHECK-SAME: <string>97a74322d64dca40aa57303842c745a1</string>
+
+typedef struct {
+  int i  :2;
+  int    :30;  // unnamed bit-field
+} B;
+
+extern void consume_B(B);
+
+void bitfield_B_init(void) {
+  B b1;
+  b1.i = 1; // b1 is initialized
+  consume_B(b1);
+}
+
+void bitfield_B_uninit(void) {
+  B b2;
+  consume_B(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
+}
\ No newline at end of file
diff --git a/clang/test/Analysis/call-and-message.cpp b/clang/test/Analysis/call-and-message.cpp
index 25ae23833478b..1e76973f49e13 100644
--- a/clang/test/Analysis/call-and-message.cpp
+++ b/clang/test/Analysis/call-and-message.cpp
@@ -169,4 +169,20 @@ void record_uninit() {
 // CHECK-SAME: <string>a46bb5c1ee44d4611ffeb13f7f499605</string>
 // CHECK: <key>issue_hash_content_of_line_in_context</key>
 // CHECK-SAME: <string>e0e0d30ea5a7b2e3a71e1931fa0768a5</string>
+
+struct B{
+  int i  :2;
+  int    :30;  // unnamed bit-field
+};
+
+void bitfield_B_init(void) {
+  B b1;
+  b1.i = 1; // b1 is initialized
+  consume(b1);
+}
+
+void bitfield_B_uninit(void) {
+  B b2;
+  consume(b2); // arg-init-warning{{Passed-by-value struct argument contains uninitialized data (e.g., field: 'i') [core.CallAndMessage]}}
+}
 } // namespace uninit_arg

@steakhal steakhal self-requested a review June 24, 2025 18:18
Copy link
Contributor

@steakhal steakhal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this PR. The PR summary explains the situation well and the patch also looks correct to me modulo inline comments.

Did you pick up some issue, or just decided to fix this because it was annoying to you?
If we have a related ticket, we should link it here and close it once we are done here.

Comment on lines 2125 to 2139
// UnnamedBitField is always Undefined unless using memory operation such
// as 'memset'.
// For example, for code
// typedef struct {
// int i :2;
// int :30; // unnamed bit-field
// } A;
// A a = {1};
// The bits of the unnamed bit-field in local variable a can be anything.
const FieldDecl *FD = R->getDecl();
if (FD->isUnnamedBitField()) {
return UndefinedVal();
}

// If the containing record was initialized, try to get its constant value.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming the CallAndMessageChecker is patched, do we need this patch here?
I'd rather not touch this code as it's really sensitive. And btw, reading from the Store by default gives you UndefinedVal so I'm not sure what case this helps with. For example, a memset(0) should also zero the padding bytes, thus if we happen to read that padding byte via a char* the Store should still model it and return the correct value instead of handing back UndefinedVal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To solve the false warning problem with unnamed bit-field, patch here is unnecessary.
However, I do not think getBinding returning SymbolVal is the correct result, which is the current behavior when parsing the source as c++. To my understanding, SymbolVal means it is initialized, but somehow the static analyzer cannot infer the value, while UndefinedVal means the value it stores can be anything and reading from it is an UB. Unnamed bit-field is the second case.
I understand patching here may bring influences to other usages, even though the test of check-clang-analysis does not show any. So if you think that current implementation is incorrect but we'd better keep it before fully evaluating the influences, let me leave a FIXME comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your understanding of UndefinedVal is correct, unlike with SymbolVal. Symbols (SymbolVal aka. SymExpr) we track values. We may or may not know anything about these symbols (most notably the value range that a symbol can hold). More importantly, we can combine such symbols into making larger symbols, basically embedding the history of the computation that the given variable holds at any given point in time. But this is likely not important here.

This is a critical component, so we don't accept patches without tests. Even tests are not enough to demonstrate correctness, thus we frequently ask for "measurements", or running differential analysis with and without a patch and observing the outcomes of many many real-world projects to have a better picture of what the implications are.

Frequently even doing the correct thing reveals untended other bugs that are actually worse than what we initially wanted to fix, thus effectively preventing us from doing the right thing. Don't worry, this is not the case with the CallAndMessageChecker.

You can propose a FIXME, but without more context it can do more harm than good if put at the wrong place with a misleading content. So to approve that, we will need to do some digging where the Symbol is coming from and why do we have that Symbol instead of Undef there?
Otherwise we are better off not having this FIXME I think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can provide some evidence where the UndefinedVal(in c) and the SymbolVal(in c++) are from.
After reverting all my changes in CallAndMessageChecker.cpp and RegionStore.cpp, I use watchpoints to find where the return values of the invoking StoreMgr.getBinding(store, loc::MemRegionVal(FR)) from:

Following is the test code:

// unnamed.c
struct B {
    int i : 2;
    int : 30;  // unnamed bit-field
};

extern void consume_B(struct B);

void bitfield_B_init(void) {
    struct B b1;
    b1.i = 1; // b1 is initialized
    consume_B(b1);
}

When analyzing the test code as c, via clang -cc1 -x c -analyze -analyzer-checker=core unnamed.c, I get the stackframes:

1# RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2312
2# RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
3# RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
4# RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
5# FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263
......
// in FindUninitializedField::Find      
      for (const auto *I : RD->fields()) {
        const FieldRegion *FR = MrMgr.getFieldRegion(I, R);
        FieldChain.push_back(I);
        T = I->getType();
        if (T->getAsStructureType()) {
          if (Find(FR))
            return true;
        } else {
          SVal V = StoreMgr.getBinding(store, loc::MemRegionVal(FR));
          if (V.isUndef())
            return true;
        }
        FieldChain.pop_back();
      }

When I switching to frame 5 and print the FR->getRawMemorySpace(), I get the type clang::ento::StackLocalsSpaceRegion *

While analyzing the test code as c++, via clang -cc1 -x c ++-analyze -analyzer-checker=core unnamed.c, I get the stackframes:

RegionStoreManager::getBindingForFieldOrElementCommon(const RegionBindingsRef &, const clang::ento::TypedValueRegion *, QualType) RegionStore.cpp:2317
RegionStoreManager::getBindingForField(const RegionBindingsRef &, const clang::ento::FieldRegion *) RegionStore.cpp:2170
RegionStoreManager::getBinding(const RegionBindingsRef &, Loc, QualType) RegionStore.cpp:1617
RegionStoreManager::getBinding(const void *, Loc, QualType) RegionStore.cpp:711
FindUninitializedField::Find(const clang::ento::TypedValueRegion *) CallAndMessageChecker.cpp:263

When I switching to frame 5 and print the FR->getRawMemorySpace(), I get the type clang::ento::StackArgumentsSpaceRegion *

The raw memory space of b1 is different in c and c++, I wonder it due to the implicit copy constructor in c++?
And the difference on FR->getRawMemorySpace() finally reaches the if check in RegionStoreManager::getBindingForFieldOrElementCommon and returns a different result.

// line 2288 in RegionStore.cpp
  if (isa<StackLocalsSpaceRegion>(R->getRawMemorySpace())) {
    if (isa<ElementRegion>(R)) {

The last statement in getBindingForFieldOrElementCommon seems a default handle, where the c++ unnamed bit-field gets its binding.

  // All other values are symbolic.
  return svalBuilder.getRegionValueSymbolVal(R);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From reading this, it seems to me that StackArgumentsSpaceRegion is wrong, as this is not an argument.
What was R->dump() in that case?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steakhal I have tested the check-clang-analysis, no more testcases break

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd say then let's remove it. CompoundVals and LCVs should behave exactly the same way wrt. the bug reports.
When proposing that change, could you look into git blame to see if we miss something about the history of that continue?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue line is added in commit 3720e2f, when the function tryBindSmallStruct is added. The commit log illustrates the reason for binding a LazyCompoundVal to simple struct by memberwise copy, but discusses nothing about the unnamed bit-field.
I'll make a new pull-request to fix this soon.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea was that a copy in C++ means a member-wise copy. And there anonym bitfields don't count.
Consequently, the padding bits may or may not get copied in a copy operation, thus they wanted to having it Undef.
I don't think it's important. We can drop this part.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the c++ standard, "Unnamed bit-fields are not members and cannot be initialized(https://eel.is/c++draft/class.bit#2.sentence-2)", and "The implicitly-defined copy/move constructor for a non-union class X performs a memberwise copy/move of its bases and members(https://eel.is/c++draft/class.copy.ctor#14.sentence-1)". So strictly speaking, an unnamed bit-field can be anything, since the memberwise copy is not required for unnamed bits.

Consider the case:

struct B {
    int i : 2;
    int : 30; 
};

void bitfield_B_init(void) {
    B b1;
    b1.i = 1; 
    memset(&b1, 0, sizeof(b1));
    B b2 = b1;   
}

The unnamed bits is set after the memset. So for the statement B b2 = b1, the unnamed bitfield in b1 is supposed to be SymbolVal, while the unnamed bitfields in b2 should be UndefinedVal.

Back to the code, neithor keeping nor removing the continue lines in tryBindSmallStruct make a perfect handling on unnamed bit-fields. A perfect rule maybe that unnamed bit-fileds must be undefined after being implicitly-defined constructed. Considering the stackframes I got before and the little benefit it could bring, implementing the rule seems to be compilicated and potentially harmful. So perhaps we should leave it unchanged?

@Tedlion
Copy link
Contributor Author

Tedlion commented Jun 26, 2025

@steakhal Thank you for reviewing. It's my first try on contributing to open source community.
I just checked the issues, it seems nothing similar, maybe because there is no false warning when the source is c++. I found the issue when using clang-tidy to check my C code.

Copy link

github-actions bot commented Jun 27, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@vbvictor
Copy link
Contributor

vbvictor commented Jun 27, 2025

Please fix formatting issues.

For first-time contributors, build workflows don't run automatically, members need to start them on each commit manually.

P.S. I created issue on this matter #146034

@Tedlion Tedlion requested a review from steakhal June 27, 2025 11:52
Tedlion and others added 3 commits June 27, 2025 20:58
remove braces on simple single statement bodies

Co-authored-by: Baranov Victor <[email protected]>
remove braces on simple single statement bodies

Co-authored-by: Baranov Victor <[email protected]>
@Tedlion
Copy link
Contributor Author

Tedlion commented Jun 27, 2025

@vbvictor Thank you for your help. But the Build and Test Windows " fails:
llvm-lit.py: C:\_work\llvm-project\llvm-project\llvm\utils\lit\lit\llvm\config.py:312: fatal: Couldn't find the include dir for Clang ('c:\_work\llvm-project\llvm-project\build\bin\clang.exe')
It seems not causing by my commit. Would you please help me with this?

@vbvictor
Copy link
Contributor

I restarted the job, there is already an issue for such fail #145703

@Tedlion Tedlion requested a review from steakhal June 28, 2025 13:26
@Tedlion
Copy link
Contributor Author

Tedlion commented Jun 28, 2025

@steakhal I've reverted my change in RegionStore.cpp, since now I realize it is more complicated than I thought. Considring the costum copy construction and direct memory operations(such as the memset), simply considering the unnamed bit-field is an UndefinedVal may be wrong , even though the UndefinedVal in my case. Thank you a lot for discussion. I may work on this afterwards, but leaving the RegionStore unchanged maybe the best choice now.

@steakhal steakhal merged commit 6504c96 into llvm:main Jul 3, 2025
7 checks passed
Copy link

github-actions bot commented Jul 3, 2025

@Tedlion Congratulations on having your first Pull Request (PR) merged into the LLVM Project!

Your changes will be combined with recent changes from other authors, then tested by our build bots. If there is a problem with a build, you may receive a report in an email or a comment on this PR.

Please check whether problems have been caused by your change specifically, as the builds can include changes from many authors. It is not uncommon for your change to be included in a build that fails due to someone else's changes, or infrastructure issues.

How to do this, and the rest of the post-merge process, is covered in detail here.

If your change does cause a problem, it may be reverted, or you can revert it yourself. This is a normal part of LLVM development. You can fix your changes and open a new PR to merge them again.

If you don't get any reports, no action is required from you. Your changes are working as expected, well done!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants