-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Rationale
Removing all entries in a Dictionary satisfying a predicate is a fairly common requirement but it is unnecessarily inefficient.
Motivation
This came up recently in the context of the efficiency of a 1st party service.
Details
At present to remove all entries in a dictionary satisfying a predicate you must do something like
var removes = new List<object>();
foreach(var entry in dict)
{
if (predicate(entry.Key))
removes.Add(entry.Key);
}
foreach(var entry in removes)
{
dict.Remove(entry);
}
or equivalently in Linq
foreach (var entry in dict.Where(entry => predicate(entry.Key)).ToList() )
{
dict.Remove(entry.Key);
}
This is O(2n) plus the cost of allocating and resizing the list, in about 10 lines. Also, the second pass through the dictionary requires calculating a hashcode for every remove.
If we offered a Remove overload on Dictionary<K,V> that accepted a predicate, we could do it in O(n) with no list, 1 line, with no hash code computation:
dict.Remove(predicate);
Essentially this is because internally we are able to safely remove from the dictionary while iterating over it forwards, while the public iterator will throw if you attempt to do this.
Proposed API
namespace System.Collections.Generic
{
public class Dictionary<TKey, TValue>
{
public int RemoveAll(Predicate<TKey> match) { throw null; }
public int RemoveAll(Predicate<KeyValuePair<TKey, TValue>> match) { throw null; }
}
}
Existing API
Here are existing precedents:
public class List<T>
{
public int RemoveAll(Predicate<T> match) { throw null; }
}
public partial class SortedSet<T>
{
public int RemoveWhere(Predicate<T> match) { throw null; }
}
public class Hashset<T>
{
public int RemoveWhere(Predicate<T> match) { throw null; }
}
The return value is the number of removes. That's easy for us to compute and potentially useful, and matches the existing method on List<T>
.
We do not return a list of the items removed as that means we have the allocation back. (On the immutable collections, the RemoveAll methods do return the list, but they can do it without allocating.)
Open issues
- Is the overload that takes key and value worthwhile? I don't have a scenario, but it seems reasonable.
- Should this be on any other collections? I suggest not (unless scenario arises).
*SortedList<TKey, TValue>
copies half of its storage on every single remove, with a predicate it could keep a temporary list of holes it removed, and then coalesce in one go. With enough holes, that might be faster than Array.Copy for each remove.
*SortedDictionary<TKey, TValue>
would need investigation to figure out whether it is possible to safely enumerate its backing tree while modifying it.
[edit - changed to add existing API, and to match List<T>
]