Skip to content

Conversation

steakhal
Copy link
Contributor

Fixes #89045
(cherry picked from commit ce763bf)

We agreed that this slowdown was so serious, that it's basically equivalent with a crash by blocking analysis.
See the comment here from the code-owner.

In this commit we fixed the running time regression, and we want it in clang-18 too.

FYI We don't have a test for this as this only affects running times.
We also don't have an entry for this in the ReleaseNotes, as this regression was introduced by clang-18.

@steakhal steakhal added this to the LLVM 18.X Release milestone Apr 23, 2024
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Apr 23, 2024
@llvmbot
Copy link
Member

llvmbot commented Apr 23, 2024

@llvm/pr-subscribers-clang

@llvm/pr-subscribers-clang-static-analyzer-1

Author: Balazs Benics (steakhal)

Changes

Fixes #89045
(cherry picked from commit ce763bf)

We agreed that this slowdown was so serious, that it's basically equivalent with a crash by blocking analysis.
See the comment here from the code-owner.

In this commit we fixed the running time regression, and we want it in clang-18 too.

FYI We don't have a test for this as this only affects running times.
We also don't have an entry for this in the ReleaseNotes, as this regression was introduced by clang-18.


Full diff: https://github.com/llvm/llvm-project/pull/89725.diff

1 Files Affected:

  • (modified) clang/lib/StaticAnalyzer/Checkers/Taint.cpp (+6-8)
diff --git a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
index 4edb671753bf45..6362c82b009d72 100644
--- a/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
+++ b/clang/lib/StaticAnalyzer/Checkers/Taint.cpp
@@ -216,21 +216,17 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
   std::vector<SymbolRef> TaintedSymbols;
   if (!Reg)
     return TaintedSymbols;
-  // Element region (array element) is tainted if either the base or the offset
-  // are tainted.
+
+  // Element region (array element) is tainted if the offset is tainted.
   if (const ElementRegion *ER = dyn_cast<ElementRegion>(Reg)) {
     std::vector<SymbolRef> TaintedIndex =
         getTaintedSymbolsImpl(State, ER->getIndex(), K, returnFirstOnly);
     llvm::append_range(TaintedSymbols, TaintedIndex);
     if (returnFirstOnly && !TaintedSymbols.empty())
       return TaintedSymbols; // return early if needed
-    std::vector<SymbolRef> TaintedSuperRegion =
-        getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
-    llvm::append_range(TaintedSymbols, TaintedSuperRegion);
-    if (returnFirstOnly && !TaintedSymbols.empty())
-      return TaintedSymbols; // return early if needed
   }
 
+  // Symbolic region is tainted if the corresponding symbol is tainted.
   if (const SymbolicRegion *SR = dyn_cast<SymbolicRegion>(Reg)) {
     std::vector<SymbolRef> TaintedRegions =
         getTaintedSymbolsImpl(State, SR->getSymbol(), K, returnFirstOnly);
@@ -239,6 +235,8 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
       return TaintedSymbols; // return early if needed
   }
 
+  // Any subregion (including Element and Symbolic regions) is tainted if its
+  // super-region is tainted.
   if (const SubRegion *ER = dyn_cast<SubRegion>(Reg)) {
     std::vector<SymbolRef> TaintedSubRegions =
         getTaintedSymbolsImpl(State, ER->getSuperRegion(), K, returnFirstOnly);
@@ -318,4 +316,4 @@ std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
     }
   }
   return TaintedSymbols;
-}
\ No newline at end of file
+}

Previously the function
```
std::vector<SymbolRef> taint::getTaintedSymbolsImpl(ProgramStateRef State,
                                                    const MemRegion *Reg,
                                                    TaintTagType K,
                                                    bool returnFirstOnly)
```
(one of the 4 overloaded variants under this name) was handling element
regions in a highly inefficient manner: it performed the "also examine
the super-region" step twice. (Once in the branch for element regions,
and once in the more general branch for all `SubRegion`s -- note that
`ElementRegion` is a subclass of `SubRegion`.)

As pointer arithmetic produces `ElementRegion`s, it's not too difficult
to get a chain of N nested element regions where this inefficient
recursion would produce 2^N calls.

This commit is essentially NFC, apart from the performance improvements
and the removal of (probably irrelevant) duplicate entries from the
return value of `getTaintedSymbols()` calls.

Fixes #89045

(cherry picked from commit ce763bf)
@tstellar tstellar merged commit 3b4ba72 into llvm:release/18.x Apr 25, 2024
@tstellar
Copy link
Collaborator

tstellar commented May 1, 2024

Hi @steakhal (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

@steakhal steakhal deleted the bb/backport-89606 branch May 2, 2024 16:12
@steakhal
Copy link
Contributor Author

steakhal commented May 2, 2024

Hi @steakhal (or anyone else). If you would like to add a note about this fix in the release notes (completely optional). Please reply to this comment with a one or two sentence description of the fix. When you are done, please add the release:note label to this PR.

I think this would work: "In previous dot releases, we had a critical slowdown on analyzing code hashing or doing many array accesses. This bug did not affect previous major releases. See the details at issue #89045."

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:static analyzer clang Clang issues not falling into any other category release:backport release:note
Projects
Development

Successfully merging this pull request may close these issues.

5 participants