Skip to content

Developers can use CollectionsMarshal ref accessors for Dictionary<TKey, TValue> #27062

@benaadams

Description

@benaadams
Edited by @layomia. Original proposal by @benaadams (click to view) They aren't "safe" operations as the backing store can change if items are added, removed, etc and the changes aren't tracked

namespace System.Runtime.InteropServices
{
    public partial static class CollectionsMarshal
    {
        public ref TValue ValueRef<TKey, TValue>(Dictionary<TKey, TValue> dict, TKey key);
    }
}

However it is useful to get a reference to modify struct TValue entries without a full copy and replace update.

/cc @jkotas

/cc @stephentoub is it valid for ConcurrentDictionary?


CollectionsMarshal ref accessors for Dictionary<TKey, TValue>

Attempting to obtain a value in a dictionary, and adding it if not present, is a common scenario in .NET applications. The existing mechanism for achieving this today is by using Dictionary<TKey, TValue>.TryGetValue(Tkey key, out TValue value), and adding the value to the dictionary if not present. This causes duplicate hash lookups:

if (!dictionary.TryGetValue(key, out MyType myValue)) // hash lookup
{
    myValue = CreateValue();
    dictionary.Add(key, myValue); // duplicate hash lookup
}

Another scenario is updating struct values in dictionaries. The existing pattern for achieving this causes struct copies and duplicate hash lookups, which potentially have non-trivial performance costs for large structs:

struct LargeStruct
{
    // Other members...
    
    public int MyInt { get; set; }

    // Other members...
}

LargeStruct myValue = dictionary[key]; // hash lookup, struct copy
myValue.MyInt++;
dictionary[key] = myValue; // another hash lookup, another struct copy

Motivation

  • Provide a mechanism which avoids duplicate lookups when obtaining dictionary values which may not be present.
  • Provide a mechanism which avoids copies and duplicate lookups when mutating struct dictionary values.

API proposal

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or throws a KeyNotFoundException
        /// </summary>
        public static ref TValue GetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key);

        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or returns Unsafe.NullRef<TValue>
        /// </summary>
        public static ref TValue TryGetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key, out bool exists);

        /// <summary>
        ///   Gets a reference to the value associated with the specified key, or inserts it with value default(TValue).
        /// </summary>
        public static ref TValue GetValueRefOrAddDefault<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key, out bool exists);
    }
}

CollectionsMarshal is an unsafe class that provides a set of methods to access the underlying data representations of collections.

API usages

Updating struct value in dictionary: KeyNotFoundException thrown when key not present in dictionary

This pattern is helpful when caller wants to optimally update a struct in a scenario where the key being absent is an error state. Creating a value and adding it to the dictionary, if not already present, is not desired.

try
{
    ref MyType value = CollectionsMarshal.GetValueRef(dictionary, key);
    value.MyInt++;
}
catch (KeyNotFoundException exception)
{
    // Handle exception
}

Unsafe.NullRef<TValue>() returned when key not present in dictionary

This pattern satisfies both the optimal struct value update and optimal "get or add" value scenarios.

ref MyType value = CollectionsMarshal.TryGetValueRef(dictionary, key, out bool exists);

if (exists)
{
    value.MyInt++;
}
else
{
    ref value = new MyType() { MyInt = 1 };
}

default(TValue) returned when key not present in dictionary

This pattern also satisfies both the optimal struct value update and optimal "get or add" value scenarios. A struct default value always being instantiated may cause the TryGetValueRef to be preferred, depending on the perf scenario.

ref MyType value = CollectionsMarshal.GetValueRefOrDefault(dictionary, key, out bool exists);
value.MyInt++;

if (exists)
{
    // Do something if I care that the key already existed.
}

Alternative design

GetOrAdd methods, similar to those on ConcurrentDictionary<TKey, TValue>, were proposed in #15059.

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd([NotNull] TKey key, Func<TKey, TValue> valueFactory);
}

/* OR */

public class Dictionary<TKey, TValue>
{
    public TValue GetOrAdd<TState>([NotNull] TKey key, TState state, Func<TKey, TState, TValue> valueFactory);
}

Upsides

  • More discoverable/friendly than the relatively advanced APIs in this proposal.
  • Follows precendent in the BCL i.e. ConcurrentDictionary<TKey, TValue>.

Downsides

  • Not pay-for-play. Since the new methods would live in System.Private.CoreLib, generic expansion issues due to struct-based generics will affect all users of Dictionary<TKey, TValue>, not just those that use them.
    • An argument against this point is that the methods can live in an extensions class, but that won't provide any perf benefits since they won't be able to access non-public members.
  • Doesn't address the issue of copying when mutating large struct values.

Open questions

Should the out bool exists parameter on TryGetValueRef be removed?

Since a call to Unsafe.IsNullRef<T>(ref value) can indicate whether the value exists in the dictionary, the second method could simply be:

public static class CollectionsMarshal
{
    /// <summary>
    ///   Gets a reference to the value associated with the specified key, or returns Unsafe.NullRef<TValue>
    /// </summary>
    public static ref TValue TryGetValueRef<TKey, TValue>(Dictionary<TKey, TValue> dictionary, [NotNull] TKey key);
}

Usage

ref MyType value = CollectionsMarshal.TryGetValueRef(dictionary, key);

if (!Unsafe.IsNullRef(ref value))
{
    value.MyInt++;
}
else
{
    ref value = new MyType() { MyInt = 1 };
}

Any concerns about API bloat in the CollectionsMarshal type?

The generic expansion highlighted in the GetOrAdd-based alternative doesn't apply much here since the new methods will live in CollectionsMarshal and will be pay-for-play. However, are there concerns about bloating the type?

One answer here states as follows:

I'd think to the same extent as any other/similar type such as MemoryMarshal, Unsafe, etc.

As it is today, we have 1 method in it and this proposal is adding 2-3 more. We only have a limited number of collection types in S.P.Corelib and a more limited set where returning a ref makes sense, so I don't think we are at risk of overloading it.

Metadata

Metadata

Assignees

Labels

Bottom Up WorkNot part of a theme, epic, or user storyCost:SWork that requires one engineer up to 1 weekPriority:3Work that is nice to haveTeam:LibrariesUser StoryA single user-facing feature. Can be grouped under an epic.api-approvedAPI was approved in API review, it can be implementedarea-System.Collectionsin-prThere is an active PR which will close this issue when it is merged

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions