Skip to content

Changing ArrayBoundV2 callback type increased symbol complexity #89720

Not planned
@NagyDonat

Description

@NagyDonat
Contributor

While investigating #89045 @steakhal noticed that the complexity of symbols seen by the ArrayBoundV2 checker increased significantly when that checker switched to using check::PostStmt callbacks instead of the previously used check::Location.

[...] prior to this change, the maximum Sym->computeComplexity() within getTaintedSymbolsImpl was significantly lower [4-5] than after the change. After the change this maximal complexity was more around the threshold (35), 30-33.

That slowdown was fixed by #89606 which restored the efficiency of getTaintedSymbolsImpl and ensured that it runs quickly even when the symbols are complex. However, it would be still good to investigate the cause of this complexity increase.

Activity

NagyDonat

NagyDonat commented on Apr 23, 2024

@NagyDonat
ContributorAuthor

@steakhal By the way, did you happen to have ArrayBound (V1) enabled during the bisection?

I'm asking this because one very significant effect of my commit #72107 Switch to PostStmt callbacks in ArrayBoundV2 was that it "transferred" the analysis of many out-of-bounds situations from the V1 checker to ArrayBoundV2. (When they were both Location checkers, V1 activated first AFAIK because it appears earlier in Checkers.td; after the change the PostStmt callbacks of ArrayBoundV2 happen before the Location callback of V1.)

If V1 was enabled in the bisection, then perhaps the observed increase in the complexity is just caused by the fact that previously the V2 checker didn't see those complex symbols (because the V1 checker found and handled them first).

(I don't think that this is the most likely explanation of the situation, but it's worth to mention and check.)

llvmbot

llvmbot commented on Apr 23, 2024

@llvmbot
Member

@llvm/issue-subscribers-clang-static-analyzer

Author: None (NagyDonat)

While investigating #89045 @steakhal noticed that the complexity of symbols seen by the ArrayBoundV2 checker increased significantly when that checker switched to using `check::PostStmt` callbacks instead of the previously used `check::Location`.

> [...] prior to this change, the maximum Sym->computeComplexity() within getTaintedSymbolsImpl was significantly lower [4-5] than after the change. After the change this maximal complexity was more around the threshold (35), 30-33.

That slowdown was fixed by #89606 which restored the efficiency of getTaintedSymbolsImpl and ensured that it runs quickly even when the symbols are complex. However, it would be still good to investigate the cause of this complexity increase.

steakhal

steakhal commented on Apr 23, 2024

@steakhal
Contributor

If V1 was enabled in the bisection, then perhaps the observed increase in the complexity is just caused by the fact that previously the V2 checker didn't see those complex symbols (because the V1 checker found and handled them first).

I believe we only use/used OOBv2.

NagyDonat

NagyDonat commented on Apr 23, 2024

@NagyDonat
ContributorAuthor

Thanks, that's what I guessed.

NagyDonat

NagyDonat commented on May 2, 2024

@NagyDonat
ContributorAuthor

I spent some time investigating this issue and I would summarize my research as "I still don't know the answer, but I realized that the question is not interesting".

The increased symbol complexity appeared after my old change #72107 "[analyzer] Switch to PostStmt callbacks in ArrayBoundV2" moved the activation of ArrayBoundV2 to a different point within the process of the path-sensitive analysis -- so the natural conclusion is that those more complex symbols were always there at that different step of the process. (If we are measuring river pollution and notice that the pollution increased when we started to take samples from a different location, the natural conclusion is that the new spot is simply more polluted.)

I have several rough conjectures that could lead to this difference between the Location and the PostStmt callback:

  • The main goal of [analyzer] Switch to PostStmt callbacks in ArrayBoundV2 #72107 was that "after it the checker will check array[index].field while the previous implementation ignored this situation (because here the ElementRegion is wrapped in a FieldRegion object)." Note that sheervideo.c from FFMPEG mixes pointer arithmetic and field access, so it's plausible that the more complex symbols were simply not seen by the Location callback.
  • There was an old abandoned PR D159106 by @steakhal that tried to handle "Turns out check::Location is not always called when storing a value to a location." I don't remember its exact details, but effects like that could imply that some symbols are skipped by Location callbacks, but appear in the PostStmt callback.
  • The PostStmt callback triggers directly after constructing the suitable ElementRegion as the return value of the subscript expression; while the Location callback triggers significantly later, when the ElementRegion is accessed for reading or writing. In the meantime the analyzer parses several other statements and could perform some simplification / cleanup step that reduces the complexity of symbols and/or eliminates the cases with the more complex symbols.

I wasn't able to understand exactly what happens between the PostStmt and the Location (the code is very complex, and I didn't set up a debugger), but I strongly suspect that the more complex symbols were always there, but the Location callback didn't see them.

As the investigation is difficult, the complex symbols don't cause any concrete problems (after #89606) and there is probably an innocent explanation, I don't think that it's worth to spend more time on this issue. @steakhal What do you think about this? Woudl you agree with closing this ticket?

steakhal

steakhal commented on May 2, 2024

@steakhal
Contributor

As the investigation is difficult, the complex symbols don't cause any concrete problems (after #89606) and there is probably an innocent explanation, I don't think that it's worth to spend more time on this issue. @steakhal What do you think about this? Woudl you agree with closing this ticket?

Thanks for your time. I think what you say makes sense. I didn't see yet glaring drawbacks of having these "complicated symbols" other than sometimes their dump took 26 megabytes. Given that ATM I'm not focusing on this checker, I'm fine with closing this.
I'll reopen it if it turns out relevant and I have evidence backing this up.
FYI right now I'm focusing on bounding Z3 refutation times. I've seen cases when Z3 refutation took 90% or more of the total time of an analysis.

added
questionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!
on May 2, 2024
added a commit that references this issue on May 15, 2025
1060514
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    clang:static analyzerquestionA question, not bug report. Check out https://llvm.org/docs/GettingInvolved.html instead!

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @steakhal@EugeneZelenko@NagyDonat@llvmbot

        Issue actions

          Changing ArrayBoundV2 callback type increased symbol complexity · Issue #89720 · llvm/llvm-project