-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Background and motivation
CollectionsMarshal
class has been introduced to expose the collection class's underlying data by reference in order to reduce unnecessary copy of large struct values when accessing only a small part of the value. For Dictionary<TKey, TValue>
, there are methods that manipulate individual key-value pairs. However, these methods mainly requires accessing from know keys. When the user wants to do batch actions to many elements in the dictionary, enumerating keys and then doing a hash table look up would not be the most efficient way.
Consider a very common use case of Dictionary<,>
class: the user wants to use it like an in-memory database table, to store large struct values, each containing the data of one row, and all values are indexed by a primary key, which is itself part of the row data but immutable. When reading and writing data from one row with known primary key, existing methods like GetValueRefOrNullRef
is sufficient. But if the user wants to read all rows, doing some work to each of them (or to some of them that meets specific condition), it would be better if there is a way to enumerate the dictionary returning ref
to each value. This issue mainly happens to Dictionary<,>
because List<T>
already exposes its Span<T>
, which can be used to enumerate by a for (int i = 0; ...
loop to get the references.
Another common use pattern of dictionary is to remove specific items while enumerating. For List<T>
, this can be done by a sequential enumeration. Dictionary<,>
class already allows removing items while enumerating, but to find the removed item, the dictionary will need additional hash table look-ups. This can be avoided if we provide an enumerator that itself supports removing.
In summary, using a new enumerator, we should able to achieve:
- Get the
ref TValue
without additional hash calculation and look-ups. - Remove the enumerated item without additional hash calculation and look-ups.
To maximize performance, these two objectives do not necessarily share the same implementation. It's possibly better to address the separately, as mentioned in the discussion below. However, here I will still provide an alternatives that aims at solving both together, because the two objectives can actually be seen as avoiding unnecessary hash calculation and look-ups during enumeration.
API Proposal
Method 1: sequentially enumerate each entry, returning reference to the entry struct.
This will maximize performance, but will not allow removing while enumerating.
namespace System.Runtime.InteropServices
{
public static class CollectionsMarshal
{
public static DictionaryReferenceEnumerable<TKey, TValue> EnumerateByReference<TKey, TValue>(Dictionary<TKey, TValue> dictionary) => new(dictionary);
public readonly struct DictionaryReferenceEnumerable<TKey, TValue>
{
private readonly Dictionary<TKey, TValue> _dictionary;
internal DictionaryReferenceEnumerable(Dictionary<TKey, TValue> dictionary) => _dictionary = dictionary;
public Enumerator GetEnumerator() => new(_dictionary);
public struct Enumerator
{
public Enumerator(Dictionary<TKey, TValue> dictionary) { ... }
public bool MoveNext() { ... }
public ref Entry Current
{
get => ref Unsafe.As<Entry, Dictionary<TKey, TValue>.Entry>(ref _dictionary._entries[...]);
}
}
public struct Entry
{
private readonly uint _hashCode;
private readonly int _next;
public readonly TKey Key;
public TValue Value;
}
}
}
}
Method 2: enumerate each bucket, returning a struct that holds the internal state of enumerating (the current bucket and the last entry index).
This will sacrifice some performance, but it allows a Remove
method to be added.
namespace System.Runtime.InteropServices
{
public static class CollectionsMarshal
{
public static DictionaryReferenceEnumerable<TKey, TValue> EnumerateByReference<TKey, TValue>(Dictionary<TKey, TValue> dictionary) => new(dictionary);
public readonly struct DictionaryReferenceEnumerable<TKey, TValue>
{
private readonly Dictionary<TKey, TValue> _dictionary;
internal DictionaryReferenceEnumerable(Dictionary<TKey, TValue> dictionary) { ... }
public Enumerator GetEnumerator() => new(_dictionary);
public struct Enumerator
{
private EnumeratorData _data;
internal Enumerator(Dictionary<TKey, TValue> dictionary) => _data = new(dictionary);
public bool MoveNext() { ... }
public EnumeratorData Current
{
get => _data;
}
}
public struct EnumeratorData
{
//internal state of enumerator: bucket index, last node index, etc.
internal EnumeratorData(Dictionary<TKey, TValue> dictionary) { ... }
internal bool MoveNext() { ... }
public TKey Key => ...;
public ref TValue Value => ...;
public void Remove() { ... }
}
}
}
}
API Usage
Method 1:
struct Data
{
public int SomeValue;
//Other fields.
}
Dictionary<int, Data> dictionary = new();
//...
foreach (ref var entry : CollectionsMarshal.DictionaryReferenceEnumerable(dictionary))
{
if (entry.Value.SomeValue != -1)
{
entry.Value.SomeValue += 1;
}
}
Method 2:
foreach (var entry : CollectionsMarshal.DictionaryReferenceEnumerable(dictionary))
{
if (entry.Value.SomeValue != -1)
{
entry.Value.SomeValue += 1;
}
else
{
entry.Remove(); //Method 2 allows removing.
}
}
Possible implementations and performances
I have done some preliminary investigation on the implementation and the performance. The conclusion:
Enumerating:
- Baseline: enumerate
dictionary.Keys
followed byCollectionsMarshal.GetValueRefOrNullRef
. - Method 1 provides maximal speed (3x for int keys, ~10x for
(int, int, int)
keys, 6x for string keys). - Method 2 is much slower than Method 1 (Method 1 is 2-3x as fast as Method 2).
Deleting while enumerating:
- Baseline: enumerate
dictionary.Keys
followed byCollectionsMarshal.GetValueRefOrNullRef
. Use aList<TKey>
to record keys to be removed, and after enumeration of the dictionary finishes, enumerate this list to remove them. - Method 1 does not support deleting.
- Method 2 is usually faster than the baseline, except for int keys (1x for int keys, 3x for
(int, int, int)
keys, and 2x for string keys). - The benchamrk removes 50% of the items from the dictionary.
Since I am not an expert in this area, I think there should still be much space for optimization.
Benchmark results
BenchmarkDotNet=v0.13.1, OS=Windows 10.0.19043.1165 (21H1/May2021Update)
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET SDK=6.0.100-preview.7.21379.14
[Host] : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT [AttachedDebugger]
DefaultJob : .NET 6.0.0 (6.0.21.37719), X64 RyuJIT
int->int, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 112.26 ns | 1.893 ns | 1.678 ns |
Enum_ByRefSequential | 35.51 ns | 0.250 ns | 0.221 ns |
Enum_ByRefBucket | 83.95 ns | 0.797 ns | 0.707 ns |
Remove_ByKey | 52.89 ns | 0.464 ns | 0.434 ns |
Remove_ByRefBucket | 57.43 ns | 1.166 ns | 1.145 ns |
int->int, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 9.787 us | 0.1115 us | 0.1043 us |
Enum_ByRefSequential | 3.468 us | 0.0381 us | 0.0356 us |
Enum_ByRefBucket | 6.456 us | 0.0440 us | 0.0411 us |
Remove_ByKey | 4.910 us | 0.0613 us | 0.0543 us |
Remove_ByRefBucket | 5.261 us | 0.0621 us | 0.0551 us |
int->big struct, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 119.59 ns | 1.249 ns | 1.043 ns |
Enum_ByRefSequential | 37.79 ns | 0.254 ns | 0.212 ns |
Enum_ByRefBucket | 85.65 ns | 0.838 ns | 0.743 ns |
Remove_ByKey | 58.28 ns | 1.190 ns | 1.323 ns |
Remove_ByRefBucket | 58.50 ns | 1.026 ns | 1.098 ns |
int->big struct, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 10.412 us | 0.1571 us | 0.1470 us |
Enum_ByRefSequential | 3.587 us | 0.0370 us | 0.0346 us |
Enum_ByRefBucket | 6.827 us | 0.0470 us | 0.0367 us |
Remove_ByKey | 5.386 us | 0.0798 us | 0.0747 us |
Remove_ByRefBucket | 5.646 us | 0.1064 us | 0.0888 us |
(int, int, int)->int, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 339.03 ns | 4.299 ns | 4.021 ns |
Enum_ByRefSequential | 35.28 ns | 0.431 ns | 0.382 ns |
Enum_ByRefBucket | 84.11 ns | 1.497 ns | 1.400 ns |
Remove_ByKey | 191.94 ns | 2.186 ns | 1.938 ns |
Remove_ByRefBucket | 60.92 ns | 0.480 ns | 0.688 ns |
(int, int, int)->int, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 38.774 us | 0.3048 us | 0.2380 us |
Enum_ByRefSequential | 3.369 us | 0.0655 us | 0.0613 us |
Enum_ByRefBucket | 9.389 us | 0.0894 us | 0.0792 us |
Remove_ByKey | 16.413 us | 0.1372 us | 0.1216 us |
Remove_ByRefBucket | 5.799 us | 0.0806 us | 0.0754 us |
(int, int, int)->big struct, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 395.39 ns | 3.212 ns | 3.004 ns |
Enum_ByRefSequential | 34.97 ns | 0.622 ns | 0.581 ns |
Enum_ByRefBucket | 93.08 ns | 1.364 ns | 1.276 ns |
Remove_ByKey | 186.41 ns | 2.713 ns | 2.538 ns |
Remove_ByRefBucket | 61.45 ns | 0.555 ns | 0.492 ns |
(int, int, int)->big struct, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 35.810 us | 0.4707 us | 0.4403 us |
Enum_ByRefSequential | 3.352 us | 0.0360 us | 0.0337 us |
Enum_ByRefBucket | 12.960 us | 0.1137 us | 0.1063 us |
Remove_ByKey | 16.516 us | 0.1926 us | 0.1802 us |
Remove_ByRefBucket | 6.410 us | 0.0557 us | 0.0521 us |
string->int, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 245.04 ns | 4.859 ns | 5.596 ns |
Enum_ByRefSequential | 33.07 ns | 0.697 ns | 1.145 ns |
Enum_ByRefBucket | 70.59 ns | 0.833 ns | 0.779 ns |
Remove_ByKey | 108.80 ns | 2.205 ns | 2.451 ns |
Remove_ByRefBucket | 58.70 ns | 0.812 ns | 0.759 ns |
string->int, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 22.106 us | 0.1307 us | 0.1223 us |
Enum_ByRefSequential | 3.564 us | 0.0194 us | 0.0162 us |
Enum_ByRefBucket | 7.655 us | 0.0743 us | 0.0621 us |
Remove_ByKey | 11.218 us | 0.1740 us | 0.1627 us |
Remove_ByRefBucket | 5.861 us | 0.0703 us | 0.0623 us |
string->big struct, 10 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 239.33 ns | 3.121 ns | 2.767 ns |
Enum_ByRefSequential | 36.20 ns | 0.258 ns | 0.242 ns |
Enum_ByRefBucket | 72.04 ns | 0.826 ns | 0.773 ns |
Remove_ByKey | 108.35 ns | 1.004 ns | 0.939 ns |
Remove_ByRefBucket | 60.00 ns | 0.820 ns | 0.767 ns |
string->big struct, 1000 items
Method | Mean | Error | StdDev |
---|---|---|---|
Enum_ByKey | 23.894 us | 0.4075 us | 0.3613 us |
Enum_ByRefSequential | 3.586 us | 0.0638 us | 0.0597 us |
Enum_ByRefBucket | 8.318 us | 0.1107 us | 0.1036 us |
Remove_ByKey | 11.708 us | 0.2322 us | 0.2385 us |
Remove_ByRefBucket | 6.301 us | 0.1231 us | 0.1317 us |
Implementation of Method 1 used in the test
public readonly struct DictionaryReferenceEnumerable<TKey, TValue> where TKey : notnull
{
private readonly Dictionary<TKey, TValue> _dictionary;
public DictionaryReferenceEnumerable(Dictionary<TKey, TValue> dictionary)
{
_dictionary = dictionary;
}
public Enumerator GetEnumerator() => new(_dictionary);
public ref struct Enumerator
{
private readonly Dictionary_<TKey, TValue> _dictionary;
private int _index;
private int _currentIndex;
public Enumerator(Dictionary<TKey, TValue> dictionary)
{
_dictionary = Unsafe.As<Dictionary_<TKey, TValue>>(dictionary);
_index = 0;
_currentIndex = default;
}
public bool MoveNext()
{
while ((uint)_index < (uint)_dictionary._count)
{
ref var entry = ref _dictionary._entries![_index++];
if (entry.next >= -1)
{
_currentIndex = _index - 1;
return true;
}
}
_index = _dictionary._count + 1;
_currentIndex = default;
return false;
}
public ref Entry Current
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => ref Unsafe.As<Dictionary_<TKey, TValue>.Entry, Entry>(ref _dictionary._entries![_currentIndex]);
}
}
public struct Entry
{
private readonly uint hashCode;
private readonly int next;
public readonly TKey Key;
public TValue Value;
}
}
Implementation of Method 2 used in the test
public readonly struct DictionaryReferenceEnumerable<TKey, TValue> where TKey : notnull
{
private readonly Dictionary<TKey, TValue> _dictionary;
public DictionaryReferenceEnumerable(Dictionary<TKey, TValue> dictionary)
{
_dictionary = dictionary;
}
public Enumerator GetEnumerator() => new(_dictionary);
public struct EnumeratorData
{
internal readonly Dictionary_<TKey, TValue> _dictionary;
internal int _bucketIndex;
internal int _lastIndex;
internal int _currentIndex;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal EnumeratorData(Dictionary<TKey, TValue> dictionary)
{
_dictionary = Unsafe.As<Dictionary_<TKey, TValue>>(dictionary);
_bucketIndex = -1;
_lastIndex = -1;
_currentIndex = -1;
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
internal bool MoveNext()
{
_lastIndex = _currentIndex;
if (_currentIndex < 0 || (_currentIndex = _dictionary._entries![_currentIndex].next) < 0)
{
int bucketHead = 0;
while (bucketHead == 0 && ++_bucketIndex < _dictionary._buckets!.Length)
{
bucketHead = _dictionary._buckets![_bucketIndex];
}
if (bucketHead == 0)
{
return false;
}
_currentIndex = bucketHead - 1;
_lastIndex = -1;
}
return true;
}
public TKey Key
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => _dictionary._entries![_currentIndex].Key;
}
public ref TValue Value
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
get => ref _dictionary._entries![_currentIndex].Value;
}
public void Remove()
{
int i = _currentIndex;
var d = _dictionary;
ref var entry = ref d._entries![i];
var next = entry.next;
_currentIndex = next;
if (_lastIndex < 0)
{
d._buckets![_bucketIndex] = 1 + next;
}
else
{
d._entries![_lastIndex].next = next;
}
entry.next = (-3) - d._freeList;
if (RuntimeHelpers.IsReferenceOrContainsReferences<TKey>())
{
Unsafe.AsRef(in entry.Key) = default!;
}
if (RuntimeHelpers.IsReferenceOrContainsReferences<TValue>())
{
entry.Value = default!;
}
d._freeList = i;
d._freeCount++;
}
}
public struct Enumerator
{
private EnumeratorData _data;
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public Enumerator(Dictionary<TKey, TValue> dictionary)
{
_data = new(dictionary);
}
public bool MoveNext()
{
return _data.MoveNext();
}
public EnumeratorData Current => _data;
}
}
Alternative designs
Linq-like extension method:
As mentioned below, the same effect can be achieved by providing a RemoveWhere
method that passes TKey
and ref TValue
to a delegate:
public bool RemoveDictionaryEntryPredicate<TKey, TValue>(TKey key, ref TValue value);
public static void RemoveWhere<TKey, TValue>(this Dictionary<TKey, TValue> dictionary, RemoveDictionaryEntryPredicate predicate);
such that the delegate will be able to access the value by reference and specify whether each item should be removed.
Advantages: 1) seems more natrual to C#; 2) the dictionary has chance to do rehashing after finishes enumerating.
Disadvantages: 1) the runtime is currently unable to fully inline delegate calls; 2) if the delegate need to access other variables, creating the closure requires additional allocation at each call.
Risks
Some design considerations:
- Same safety concerns shared by other
CollectionsMarshal
methods, but this is by design. User should make sure the use is valid, e.g., by guaranteeing no modification is made to the collection during enumerating. - The returned struct enumerable can only be enumerated directly, and cannot be converted to the
IEnumerable<T>
interface, because the T here is a by-ref type. This should be fine since usually converting to interfaces causes boxing and GC allocation, which is against high-performance requirements.