Skip to content

Conversation

philnik777
Copy link
Contributor

@philnik777 philnik777 commented Aug 7, 2025

Instead of just calling the single element erase on every element of the range, we can combine some of the operations in a custom implementation. Specifically, we don't need to search for the previous node or re-link the list every iteration. Removing this unnecessary work results in some nice performance improvements:

-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                                             old           new
-----------------------------------------------------------------------------------------------------------------------
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/0                    457 ns        459 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/32                   995 ns        626 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/1024               18196 ns       7995 ns
std::unordered_set<int>::erase(iterator, iterator) (erase half the container)/8192              124722 ns      70125 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/0            456 ns        461 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/32          1183 ns        769 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/1024       27827 ns      18614 ns
std::unordered_set<std::string>::erase(iterator, iterator) (erase half the container)/8192      266681 ns     226107 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/0               455 ns        462 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/32              996 ns        659 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/1024          15963 ns       8108 ns
std::unordered_map<int, int>::erase(iterator, iterator) (erase half the container)/8192         136493 ns      71848 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/0               454 ns        455 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/32              985 ns        703 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/1024          16277 ns       9085 ns
std::unordered_multiset<int>::erase(iterator, iterator) (erase half the container)/8192         125736 ns      82710 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/0          457 ns        454 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/32        1091 ns        646 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/1024     17784 ns       7664 ns
std::unordered_multimap<int, int>::erase(iterator, iterator) (erase half the container)/8192    127098 ns      72806 ns

Copy link

github-actions bot commented Aug 7, 2025

✅ With the latest revision this PR passed the C/C++ code formatter.

@philnik777 philnik777 marked this pull request as ready for review August 11, 2025 19:53
@philnik777 philnik777 requested a review from a team as a code owner August 11, 2025 19:53
@llvmbot llvmbot added the libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. label Aug 11, 2025
@llvmbot
Copy link
Member

llvmbot commented Aug 11, 2025

@llvm/pr-subscribers-libcxx

Author: Nikolas Klauser (philnik777)

Changes
----------------------------------------------------------------------------------------------------------------
Benchmark                                                                                        old         new
----------------------------------------------------------------------------------------------------------------
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/0          450 ns      446 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/32        1017 ns      614 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/1024     16035 ns     7747 ns
std::unordered_map&lt;int, int&gt;::erase(iterator, iterator) (erase half the container)/8192    122107 ns    73020 ns

Full diff: https://github.com/llvm/llvm-project/pull/152471.diff

1 Files Affected:

  • (modified) libcxx/include/__hash_table (+49-5)
diff --git a/libcxx/include/__hash_table b/libcxx/include/__hash_table
index dacc152030e14..2f0f9457f1416 100644
--- a/libcxx/include/__hash_table
+++ b/libcxx/include/__hash_table
@@ -1848,12 +1848,56 @@ __hash_table<_Tp, _Hash, _Equal, _Alloc>::erase(const_iterator __p) {
 template <class _Tp, class _Hash, class _Equal, class _Alloc>
 typename __hash_table<_Tp, _Hash, _Equal, _Alloc>::iterator
 __hash_table<_Tp, _Hash, _Equal, _Alloc>::erase(const_iterator __first, const_iterator __last) {
-  for (const_iterator __p = __first; __first != __last; __p = __first) {
-    ++__first;
-    erase(__p);
+  if (__first == __last)
+    return iterator(__last.__node_);
+
+  // current node
+  __next_pointer __current = __first.__node_;
+  size_type __bucket_count = bucket_count();
+  size_t __chash = std::__constrain_hash(__current->__hash(), __bucket_count);
+  // find previous node
+  __next_pointer __before_first = __bucket_list_[__chash];
+  for (; __before_first->__next_ != __current; __before_first = __before_first->__next_)
+    ;
+
+  __next_pointer __end = __last.__node_;
+
+  // If __before_first is in the same bucket, clear this bucket first without re-linking it
+  if (__before_first != __first_node_.__ptr() &&
+      std::__constrain_hash(__before_first->__hash(), __bucket_count) == __chash) {
+    while (__current != __end) {
+      if (auto __next_chash = std::__constrain_hash(__current->__hash(), __bucket_count); __next_chash != __chash) {
+        __chash = __next_chash;
+        break;
+      }
+      auto __next = __current->__next_;
+      __node_traits::deallocate(__node_alloc(), __current->__upcast(), 1);
+      __current = __next;
+      --__size_;
+    }
   }
-  __next_pointer __np = __last.__node_;
-  return iterator(__np);
+
+  while (__current != __end) {
+    auto __next = __current->__next_;
+    __node_traits::deallocate(__node_alloc(), __current->__upcast(), 1);
+    __current = __next;
+    --__size_;
+
+    // When switching buckets, set the old bucket to be empty and update the next bucket to have __before_first as its
+    // before-first element
+    if (__next) {
+      if (auto __next_chash = std::__constrain_hash(__next->__hash(), __bucket_count); __next_chash != __chash) {
+        __bucket_list_[__chash] = nullptr;
+        __chash                 = __next_chash;
+        __bucket_list_[__chash] = __before_first;
+      }
+    }
+  }
+
+  // re-link __before_start with __last
+  __before_first->__next_ = __current;
+
+  return iterator(__last.__node_);
 }
 
 template <class _Tp, class _Hash, class _Equal, class _Alloc>

@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch 2 times, most recently from 40ad452 to ff5a8ec Compare August 18, 2025 13:49
@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch from ff5a8ec to e006595 Compare August 25, 2025 07:11
@philnik777 philnik777 force-pushed the optimize_hash_table_erase branch from e006595 to 7bc9893 Compare August 25, 2025 13:33
@philnik777 philnik777 merged commit e4eccd6 into llvm:main Aug 25, 2025
67 of 75 checks passed
@philnik777 philnik777 deleted the optimize_hash_table_erase branch August 25, 2025 19:45
@boomanaiden154
Copy link
Contributor

@philnik777 I'm seeing a regression under ASan (use after free after this patch). The following code snippet throws a use-after-free for me in final call to equal_range:

#include <unordered_map>
#include <utility>

typedef std::unordered_multimap<void*, void*> mapType;
typedef std::pair<mapType::iterator, mapType::iterator> erasePair;

int main(int argc, char** argv) {
  mapType map;
  map.insert(mapType::value_type((void*)0x7e4df9645ab8, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9649868, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9649cf0, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964a200, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964c700, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964cad8, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964cda0, (void*)5));
  // equal_range: 0x7e4df9645ab8
  erasePair pair1 = map.equal_range((void*)0x7e4df9645ab8);
  map.erase(pair1.first, pair1.second);
  map.insert(mapType::value_type((void*)0x7e4df9645f60, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df96505a8, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964c700, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df964c700, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9649cf0, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9650078, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9652258, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9652638, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df96509d0, (void*)5));
  // equal_range: 0x7e4df9649868
  erasePair pair2 = map.equal_range((void*)0x7e4df9649868);
  map.erase(pair2.first, pair2.second);
  map.insert(mapType::value_type((void*)0x7e4df964def0, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df96598b0, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df965a160, (void*)5));
  // equal_range: 0x7e4df96505a8
  erasePair pair3 = map.equal_range((void*)0x7e4df96505a8);
  map.erase(pair3.first, pair3.second);
  map.insert(mapType::value_type((void*)0x7e4df9653810, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df9666340, (void*)5));
  // equal_range: 0x7e4df964c700
  erasePair pair4 = map.equal_range((void*)0x7e4df964c700);
  map.erase(pair4.first, pair4.second);
  map.insert(mapType::value_type((void*)0x7e4df965a880, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df965a880, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df965a880, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df96680c8, (void*)5));
  // equal_range: 0x7e4df9652258
  erasePair pair5 = map.equal_range((void*)0x7e4df9652258);
  map.erase(pair5.first, pair5.second);
  map.insert(mapType::value_type((void*)0x7e4df9666d98, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df966c608, (void*)5));
  map.insert(mapType::value_type((void*)0x7e4df966c9e0, (void*)5));
  erasePair test_pair = map.equal_range((void*)0x7e4df9649cf0);
  if (test_pair.first == test_pair.second) {
    return 1;
  }
  return 0;
}

Are you able to take a look?

boomanaiden154 added a commit to boomanaiden154/llvm-project that referenced this pull request Sep 16, 2025
…vm#152471)"

This reverts commit e4eccd6.

This was causing ASan failures in some situations involving unordered
multimap containers. Details and a reproducer were posted on the
original PR (llvm#152471).
boomanaiden154 added a commit that referenced this pull request Sep 17, 2025
#158769)

…52471)"

This reverts commit e4eccd6.

This was causing ASan failures in some situations involving unordered
multimap containers. Details and a reproducer were posted on the
original PR (#152471).
kimsh02 pushed a commit to kimsh02/llvm-project that referenced this pull request Sep 19, 2025
…vm#1… (llvm#158769)

…52471)"

This reverts commit e4eccd6.

This was causing ASan failures in some situations involving unordered
multimap containers. Details and a reproducer were posted on the
original PR (llvm#152471).
itzexpoexpo pushed a commit to itzexpoexpo/llvm-project that referenced this pull request Sep 21, 2025
…vm#1… (llvm#158769)

…52471)"

This reverts commit e4eccd6.

This was causing ASan failures in some situations involving unordered
multimap containers. Details and a reproducer were posted on the
original PR (llvm#152471).
SeongjaeP pushed a commit to SeongjaeP/llvm-project that referenced this pull request Sep 23, 2025
…vm#1… (llvm#158769)

…52471)"

This reverts commit e4eccd6.

This was causing ASan failures in some situations involving unordered
multimap containers. Details and a reproducer were posted on the
original PR (llvm#152471).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants