Skip to content

Allow Dictionary<K,V>.Remove during enumeration #26314

@danmoseley

Description

@danmoseley

Rationale

Removing all entries in a Dictionary satisfying a predicate is a fairly common requirement but it is unnecessarily inefficient.

Motivation

This came up recently in the context of the efficiency of a 1st party service.

Details

At present to remove all entries in a dictionary satisfying a predicate you must do something like

var removes = new List<object>();
foreach(var entry in dict)
{
    if (predicate(entry.Key))
            removes.Add(entry.Key);
}
foreach(var entry in removes)
{
    dict.Remove(entry);
}

or equivalently in Linq

foreach (var entry in dict.Where(entry => predicate(entry.Key)).ToList() ) 
{
  dict.Remove(entry.Key);
}

This is O(2n) plus the cost of allocating and resizing the list, in about 10 lines. Also, the second pass through the dictionary requires calculating a hashcode for every remove.

If we offered a Remove overload on Dictionary<K,V> that accepted a predicate, we could do it in O(n) with no list, 1 line, with no hash code computation:

dict.Remove(predicate);

Essentially this is because internally we are able to safely remove from the dictionary while iterating over it forwards, while the public iterator will throw if you attempt to do this.

Proposed API

namespace System.Collections.Generic
{
    public class Dictionary<TKey, TValue>
    {
        public int RemoveAll(Predicate<TKey> match) { throw null; }
        public int RemoveAll(Predicate<KeyValuePair<TKey, TValue>> match) { throw null; }
    }
}

Existing API

Here are existing precedents:

    public class List<T> 
    {
        public int RemoveAll(Predicate<T> match) { throw null; } 
    }

    public partial class SortedSet<T>
    {
        public int RemoveWhere(Predicate<T> match) { throw null; }
    }

    public class Hashset<T>
    {
        public int RemoveWhere(Predicate<T> match) { throw null; }
    }

The return value is the number of removes. That's easy for us to compute and potentially useful, and matches the existing method on List<T>.

We do not return a list of the items removed as that means we have the allocation back. (On the immutable collections, the RemoveAll methods do return the list, but they can do it without allocating.)

Open issues

  1. Is the overload that takes key and value worthwhile? I don't have a scenario, but it seems reasonable.
  2. Should this be on any other collections? I suggest not (unless scenario arises).
    * SortedList<TKey, TValue> copies half of its storage on every single remove, with a predicate it could keep a temporary list of holes it removed, and then coalesce in one go. With enough holes, that might be faster than Array.Copy for each remove.
    *SortedDictionary<TKey, TValue> would need investigation to figure out whether it is possible to safely enumerate its backing tree while modifying it.

[edit - changed to add existing API, and to match List<T>]

Metadata

Metadata

Assignees

Labels

api-approvedAPI was approved in API review, it can be implementedarea-System.Collectionsgood first issueIssue should be easy to implement, good for first-time contributors

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions