Skip to content

Conversation

huoyaoyuan
Copy link
Member

I have measured the size impact for each commit. Opening as draft for discussion.

@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 6, 2025
Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.


// Handle the array case
if (value is IEnumerable enumerableValue && !(value is string))
if (value is Array arr)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This single change has biggest impact (-7.6 KB in hello world with reflection!). It causes enumerator of every collection considered as potentially boxed. By removing this, many of the enumerators can be trimmed. It also reduces the size gap between Dictionary implementations.

However, this can introduce a trap for user, that using weakly-type IEnumerbale can have increased size penalty.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted to #117345 for higher confidence.

** behavior.)
**
===========================================================*/
internal class LowLevelDictionary<TKey, TValue> where TKey : IEquatable<TKey>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary<TKey, TValue> has been optimized for speed and have about twice of size impact than LowLevelDictionary. Although it can reduced by sharing generic instantiations, the rare instantiations (other than Dictionary<__Canon, __Canon>) will be pure overhead. Using Dictionary more can also amplify any size impact in the shared implementation, which we are aggressively optimizing for speed.

Are we interested to keep a "size-optimized Dictionary" implementation?


uint index = 0;
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index))
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists);
Copy link
Member Author

@huoyaoyuan huoyaoyuan Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CollectionsMarshal.GetValueRefOrAddDefault has non-trivial size win. It's the only modification method that doesn't bring in the throwing path. Even worse, the throw helpers uses generic method for keys to reduce for success path, but also increases size impact more.

Should GetValueRefOrAddDefault be preferred whenever possible?

Copy link
Member

@jkotas jkotas Jul 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should GetValueRefOrAddDefault be preferred whenever possible?

I do not think we want to make the code unreadable by preferring GetValueRefOrAddDefault whenever possible. I think it is fine to use where it reduces number of required lookups like in this case.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It reduces lookups in the majority of cases. Most usages serve as a cache and follows the TryGet-Add-return structure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before continuing on detailed usages, I want to confirm the right strategy first. The most optimal usage of Dictionary is about 1.5x size of LowLevelDictionary per instantiation. Should we preserve or overhaul LowLevelDictionary as size-optimized alternative for rare instantiations?

Copy link
Member

@jkotas jkotas Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on the numbers. I do not think we want to get to write unnatural code to get a bit of extra sharing here and there. What would the total regression look like for naturally written code?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better sharing of code between generic instantiations is a problem for the compiler to solve. These regressions are just a tip of the iceberg.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cost of instantiation of LowLevelDictionary is 1.6KB, including interface metadata and array enumerator of KVP etc. Lookup & resize code is 800B. A wrapper IEquatable struct costs additional 0.3KB.

Cost of instantiation of Dictionary: 2.6KB in most optimal usage, lookup & resize costs 1.2KB. The equality determination path costs 0.6KB. IDictionary is not preserved, but ICollection<KVP> is preserved, costing more for interface method table. Also keeps Equals(object) for key.
The traditional lookup method of Dictionary will cost 1.3KB, +0.4 KB of basic bookkeeping. This increases the cost of instantiation become 3.1KB, or 3.9KB if both lookup logics are used.
If enumerator, Keys and Values get preserved, they will introduce 1.1KB, 1.3KB and 0.7KB for each instantiation.

The broad interface implementation of Dictionary increases the chance for members getting preserved. The grand total can be 7KB in measured workload. Even shared Dictionary<object, object> can cost ~1KB due to the interface method tables.

The unnatural trick reduces non-foldable instantiations from 13 to 7. We may also achieve this by make hashcode implementations identical.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better sharing of code between generic instantiations is a problem for the compiler to solve. These regressions are just a tip of the iceberg.

Submitted #117411 for that, we can see how much that helps.

/// Map of module handles to indices within the Modules array.
/// </summary>
public readonly LowLevelDictionary<TypeManagerHandle, int> HandleToModuleIndex;
public readonly Dictionary<IntPtr, IntPtr> HandleToModuleIndex;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tricky part: sharing 3 instantiations (<IntPtr, int>, uint, IntPtr, IntPtr, uint) into one.

private const int FatFunctionPointerOffset = 2;
#endif

private struct GenericMethodDescriptorInfo : IEquatable<GenericMethodDescriptorInfo>
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The overhead of using tuples instead of custom struct:

  • ToString implementation (pure overhead)
  • String.Concat (surprisingly not used in hello world, very likely to be used otherwhere)
  • Larger implementation of GetHashCode, involving HashCode.Combine. Will be eliminated once HashCode.Combine is used elsewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The conventional wisdom is that using Tuples in low-level code is bad for trimming. Have you verified that it is a net saving to use Tuples in this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's net (significant) saving to fold multiple different structs into the same tuple, and net (little) regression to replace one struct with one tuple. The most-saving option should be creating sharable tuple-like struct, however it will be a regression again if user uses the same tuple.

@huoyaoyuan
Copy link
Member Author

It seems that local measurement only covers the reflection paths. Generic instantiation of Dictionary is really heavy, with dependencies like SZGenericArrayEnumerator<KVP> that's not foldable. This increases the interest of size-optimized dictionary.

{
// Capture new index value
index = s_genericFunctionPointerNextIndex;
descriptor = (IntPtr)NativeMemory.Alloc((uint)sizeof(GenericMethodDescriptor));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this throws, we will end up with an entry without a value. You can fix this by checking whether descriptor is null instead of whether the entry exists.

uint index = 0;
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index))
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists);
if (!exists)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!exists)
// Check for null descriptor instead of `exists` to handle the situation where `NativeMemory.Alloc` below threw
// an exception during earlier attempt to this descriptor
if (descriptor == IntPtr.Zero)


uint index = 0;
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index))
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists);
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool _);

returnValue = (IntPtr)NativeMemory.Alloc((nuint)sizeof(OpenMethodResolver));
*((OpenMethodResolver*)returnValue) = this;
s_internedResolverHash.Add(this, returnValue);
ref IntPtr returnValue = ref CollectionsMarshal.GetValueRefOrAddDefault(s_internedResolverHash, this, out bool exists);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Copy link
Contributor

Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it.

@github-actions github-actions bot locked and limited conversation to collaborators Sep 6, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants