-
Notifications
You must be signed in to change notification settings - Fork 5.2k
[NativeAOT] Cleanup usages of LowLevelDictionary #117344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas |
|
||
// Handle the array case | ||
if (value is IEnumerable enumerableValue && !(value is string)) | ||
if (value is Array arr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This single change has biggest impact (-7.6 KB in hello world with reflection!). It causes enumerator of every collection considered as potentially boxed. By removing this, many of the enumerators can be trimmed. It also reduces the size gap between Dictionary implementations.
However, this can introduce a trap for user, that using weakly-type IEnumerbale
can have increased size penalty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extracted to #117345 for higher confidence.
** behavior.) | ||
** | ||
===========================================================*/ | ||
internal class LowLevelDictionary<TKey, TValue> where TKey : IEquatable<TKey> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Dictionary<TKey, TValue>
has been optimized for speed and have about twice of size impact than LowLevelDictionary. Although it can reduced by sharing generic instantiations, the rare instantiations (other than Dictionary<__Canon, __Canon>
) will be pure overhead. Using Dictionary
more can also amplify any size impact in the shared implementation, which we are aggressively optimizing for speed.
Are we interested to keep a "size-optimized Dictionary" implementation?
|
||
uint index = 0; | ||
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index)) | ||
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CollectionsMarshal.GetValueRefOrAddDefault
has non-trivial size win. It's the only modification method that doesn't bring in the throwing path. Even worse, the throw helpers uses generic method for keys to reduce for success path, but also increases size impact more.
Should GetValueRefOrAddDefault
be preferred whenever possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should GetValueRefOrAddDefault be preferred whenever possible?
I do not think we want to make the code unreadable by preferring GetValueRefOrAddDefault
whenever possible. I think it is fine to use where it reduces number of required lookups like in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It reduces lookups in the majority of cases. Most usages serve as a cache and follows the TryGet-Add-return structure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before continuing on detailed usages, I want to confirm the right strategy first. The most optimal usage of Dictionary
is about 1.5x size of LowLevelDictionary
per instantiation. Should we preserve or overhaul LowLevelDictionary
as size-optimized alternative for rare instantiations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on the numbers. I do not think we want to get to write unnatural code to get a bit of extra sharing here and there. What would the total regression look like for naturally written code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better sharing of code between generic instantiations is a problem for the compiler to solve. These regressions are just a tip of the iceberg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cost of instantiation of LowLevelDictionary
is 1.6KB, including interface metadata and array enumerator of KVP etc. Lookup & resize code is 800B. A wrapper IEquatable
struct costs additional 0.3KB.
Cost of instantiation of Dictionary
: 2.6KB in most optimal usage, lookup & resize costs 1.2KB. The equality determination path costs 0.6KB. IDictionary
is not preserved, but ICollection<KVP>
is preserved, costing more for interface method table. Also keeps Equals(object)
for key.
The traditional lookup method of Dictionary
will cost 1.3KB, +0.4 KB of basic bookkeeping. This increases the cost of instantiation become 3.1KB, or 3.9KB if both lookup logics are used.
If enumerator, Keys
and Values
get preserved, they will introduce 1.1KB, 1.3KB and 0.7KB for each instantiation.
The broad interface implementation of Dictionary
increases the chance for members getting preserved. The grand total can be 7KB in measured workload. Even shared Dictionary<object, object>
can cost ~1KB due to the interface method tables.
The unnatural trick reduces non-foldable instantiations from 13 to 7. We may also achieve this by make hashcode implementations identical.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better sharing of code between generic instantiations is a problem for the compiler to solve. These regressions are just a tip of the iceberg.
Submitted #117411 for that, we can see how much that helps.
/// Map of module handles to indices within the Modules array. | ||
/// </summary> | ||
public readonly LowLevelDictionary<TypeManagerHandle, int> HandleToModuleIndex; | ||
public readonly Dictionary<IntPtr, IntPtr> HandleToModuleIndex; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tricky part: sharing 3 instantiations (<IntPtr, int>
, uint, IntPtr
, IntPtr, uint
) into one.
private const int FatFunctionPointerOffset = 2; | ||
#endif | ||
|
||
private struct GenericMethodDescriptorInfo : IEquatable<GenericMethodDescriptorInfo> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overhead of using tuples instead of custom struct:
ToString
implementation (pure overhead)String.Concat
(surprisingly not used in hello world, very likely to be used otherwhere)- Larger implementation of
GetHashCode
, involvingHashCode.Combine
. Will be eliminated onceHashCode.Combine
is used elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The conventional wisdom is that using Tuples in low-level code is bad for trimming. Have you verified that it is a net saving to use Tuples in this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's net (significant) saving to fold multiple different structs into the same tuple, and net (little) regression to replace one struct with one tuple. The most-saving option should be creating sharable tuple-like struct, however it will be a regression again if user uses the same tuple.
It seems that local measurement only covers the reflection paths. Generic instantiation of Dictionary is really heavy, with dependencies like |
{ | ||
// Capture new index value | ||
index = s_genericFunctionPointerNextIndex; | ||
descriptor = (IntPtr)NativeMemory.Alloc((uint)sizeof(GenericMethodDescriptor)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this throws, we will end up with an entry without a value. You can fix this by checking whether descriptor
is null instead of whether the entry exists.
uint index = 0; | ||
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index)) | ||
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists); | ||
if (!exists) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (!exists) | |
// Check for null descriptor instead of `exists` to handle the situation where `NativeMemory.Alloc` below threw | |
// an exception during earlier attempt to this descriptor | |
if (descriptor == IntPtr.Zero) |
|
||
uint index = 0; | ||
if (!s_genericFunctionPointerDictionary.TryGetValue(key, out index)) | ||
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool exists); | |
ref IntPtr descriptor = ref CollectionsMarshal.GetValueRefOrAddDefault(s_genericFunctionPointerDictionary, (canonFunctionPointer, instantiationArgument), out bool _); |
returnValue = (IntPtr)NativeMemory.Alloc((nuint)sizeof(OpenMethodResolver)); | ||
*((OpenMethodResolver*)returnValue) = this; | ||
s_internedResolverHash.Add(this, returnValue); | ||
ref IntPtr returnValue = ref CollectionsMarshal.GetValueRefOrAddDefault(s_internedResolverHash, this, out bool exists); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here
Draft Pull Request was automatically closed for 30 days of inactivity. Please let us know if you'd like to reopen it. |
I have measured the size impact for each commit. Opening as draft for discussion.