-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Constant fold dictionary lookup #117332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Constant fold dictionary lookup #117332
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
@EgorBot -amd -arm using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
public class Bench
{
static readonly Dictionary<string, int> Dictionary = new()
{
{ "DOTNET_ROOT", 1 },
{ "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
};
[Benchmark]
public int Lookup() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];
} |
src/coreclr/vm/jitinterface.cpp
Outdated
if (pPinnedString != nullptr) | ||
{ | ||
GCX_COOP(); | ||
PREPARE_NONVIRTUAL_CALLSITE(METHOD__STRING__GETNONRANDOMIZEDHASHCODE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoking CoreLib at runtime won't work for AOT. What's your plan there?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just include & use the managed impl directly? (probably by moving that method to a separate cs file)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For cross-compilation, it only works if the implementation does not have dependencies on architecture specific properties. It is almost true for current implementation of GetNonRandomizedHashCode
, except of endianness. It would break further if the implementation is changed e.g. to hash native int at a time on 64-bit architectures.
We have mirrored a lot of corelib implementation details in the JIT. Is this one (~20 lines) going over the threshold how much we want to mirror?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is almost true for current implementation of GetNonRandomizedHashCode, except of endianness.
In an unlikely event of supporting BE in R2R/NAOT, we can just limit the opt for Host being LE and target being LE for simplicity?
It would break further if the implementation is changed e.g. to hash native int at a time on 64-bit architectures.
Same here - limit to x64 on x64? Also, native-int might not be a good idea due to 4-byte alignment of string data.
If NAOT/R2R support raises concerns we can probably leave it to be JIT only? Since the latter have no concerns. Although, we still might want to leave a note in the method that it might be triggered in JIT time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can always make the shared file work for NAOT/R2R with enough if checks and ifdefs. My point is that it is de-facto a second implementation once you do that.
we can just limit the opt for Host being LE and target being LE for simplicity
Nit: Shortcuts like this violate invariants that we try to maintain for cross-compilation. We want our cross-compilers to produce byte-to-byte identical output for given target, irrespective of the host that the compiler is running on.
NAOT/R2R support raises concerns we can probably leave it to be JIT only?
Every time folks bring up complicated NAOT specific JIT optimizations, I am pointing out that it would be great if we just make the JIT optimizations work for R2R/AOT where possible - we are not there today that's unfortunate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should go extra mile to properly enable optimizations for AOT where possible. I do not have a problem with having JIT specific optimizations that are fundamentally incompatible with AOT (not the case here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if I read this all correctly - NAOT impl can't be done without introducing coupling. We can't land without NAOT. the extra coupling needs real world justification. To be honest, I don't have any specific example, I've vibe-coded a simple analyzer that tries to catch lookups (Constains, TryGetValue) with literals and named constants as keys - https://gist.github.com/EgorBo/e3ee6c711741956b423fb0ccdd0b51c4
Roslyn: 84 matches
runtime: 54 matches
aspnetcore: 25 matches
OrchardCore: 54 matches
(ignoring folders like 'tests' etc.)
I guess I'll put this on pause till some use-case appears.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any in your list that look hot?
We can't land without NAOT.
I would not say that. I can accept it with a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I totally forgot to mention that on NativeAOT it can only work with proper static PGO data, which is, presumably, never supplied on practice. Because this relies on:
- FindValue is expected to be inlined (on JIT Dynamic PGO boosts it)
- _comparer must be devirtualized under a guard (GDV)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I guess this one can be added on top of the existing NativeAOT optimizations gap then.
@EgorBot -amd -arm using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
public class Bench
{
static readonly Dictionary<string, int> Dictionary = new(StringComparer.OrdinalIgnoreCase)
{
{ "Hi", 1 },
{ "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
};
[Benchmark]
public int LookupLong() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];
[Benchmark]
public int LookupShort() => Dictionary["Hi"];
} |
{ | ||
// This may trigger ICU loading for non-ASCII output, but it should be fine. | ||
// In most cases this is called for an optimized code for a hot block, so, presumably | ||
// ICU has already been loaded (e.g by Tier0 code). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jkotas it seems that it's beneficial to do for the IgnoreCase, but I'm not sure it's a good idea to trigger ICU here. Perhaps, I should just walk the string and give up if it contains non-ASCII?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Invoking ICU is probably ok for JIT, but it is not ok for AOT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
give up if it contains non-ASCII?
To extend it to 0xFFFF range without ICU, there is minipal_toupper_invariant(CHAR16_T)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think minipal_toupper_invariant
is guaranteed to produce the exact same result as ICU in all situations (across all ICU versions).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For invariant culture, they are updated simultaneously with managed implementation. For invariant case change, managed implementation also has frozen data, which I think is applicable to all platforms alike. (e.g. last update 112fa51 modified both ReadOnlySpan<byte> CategoryCasingLevel1Index
and minipal in one go per the instructions in readme).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this depends on our local copy of casing data, the comment about loading ICU should be invalid. (I have not verified whether it depends on our local copy of casing data only. It is quite a bit of code...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought that Ordinal and OrdinalIgnoreCase comparisons for strings didn't involve ICU or globalization data? I'd assume this optimization would be restricted to only such cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If invariant culture uses local copy of casing data even when ICU is available (which I think is the case for consistent outputs across platforms, perhaps @tarekgh could confirm this), can we restrict the 0x7F .. 0xFFFF
range optimization to InvariantCulture{Ignore,}Case
when it is known? Minipal implementation is shared with all runtimes (coreclr, nativeaot and mono); just #include <minipal/strings.h>
.
I wonder if it's a good idea to generalize the JIT API to invoke any managed method with e.g. |
I do not think we would want to depend on users to mark methods with Pure attribute. If we wanted to generalize this, I would teach the JIT to prove on its own that a method is a pure function and it safe to evaluate at JIT/AOT-time. (The AOT time evaluation can use interpreter.) |
Are these two specific strings representative of some real word scenario? I would expect these specific keys to used only once. Also, |
Those are just some random values I used for benchmarks, it seems it's beneficial for any length, even 1-2 chars. Or were you asking about benefits of this transformation in general? To me it's not a lot of changes in the runtime (mostly, just JIT-EE noise) + a few unrelated changes that I'll extract to a separate PR; and I think it's not unpopular to do lookups with constant keys, isn't? |
I am not sure how popular are Dictionary lookups with constant keys on hot paths. It is useful to have example of real-world code that is going to benefit from new optimizations.
The changes as currently implemented are introducing more coupling. It is the main "cost" that I see. |
This can speed-up a lot of code with custom Dictionary-based header handling logic where the values are looked up by constant keys every time. The only caveat here is that the dictionary instance may not be in a readonly static at all so if the logic can benefit all standard dictionary scenarios (e.g. pick up on if even a non-jit-constant dict. instance happens to have a non-randomized ordinal or ordinal ignore case comparer), this will likely have a lot of hits (even if not necessarily in reference repos tested against here). I thought this kind of special-casing was against the rules though 😄 |
Is there a code on github that does this?
We are ok with special-casing like this for a good reason. |
A typical use scenario is reading HTTP headers with a given key, for example, |
Our application architecture relies heavily on Dictionary<String, V> for managing configurations and implementing business logic across different layers. Click here: These are just some of the keys I could quickly find in our source code. {
"CustomerID",
"CustomerName",
"CustomerEmail",
"CustomerPhone",
"CustomerAddress",
"CustomerCity",
"CustomerState",
"CustomerZipCode",
"CustomerCountry",
"AccountCreationDate",
"LastLoginDate",
"PreferredLanguage",
"NewsletterSubscription",
"LoyaltyPoints",
"ReferralCode",
"AccountStatus",
"AccountType",
"MembershipLevel",
"TotalSpent",
"AverageOrderValue",
"LastOrderDate",
"OrderHistory",
"Feedback",
"CustomerSatisfactionScore",
"SupportTickets",
"LastSupportInteraction",
"PreferredContactMethod",
"PaymentMethod",
"PaymentStatus",
"PaymentGateway",
"TransactionID",
"InvoiceNumber",
"RefundStatus",
"ShippingAddress",
"BillingAddress",
"ShippingMethod",
"DeliveryInstructions",
"OrderNotes",
"OrderSource",
"OrderDate",
"DeliveryDate",
"Status",
"OrderID",
"ProductID",
"ProductName",
"ProductDescription",
"ProductCategory",
"ProductPrice",
"ProductQuantity",
"ProductSKU",
"ProductWeight",
"ProductDimensions",
"ProductImageURL",
"ProductRating",
"ProductReviews",
"ProductAvailability",
"WarehouseLocation",
"SupplierName",
"SupplierContact",
"SupplierID",
"PurchaseOrderID",
"PurchaseOrderDate",
"PurchaseOrderStatus",
"ShippingCost",
"Tax",
"Discount",
"TotalAmount",
"Subtotal",
"GrandTotal",
"GiftWrap",
"GiftMessage",
"OrderTrackingNumber",
"DeliveryService",
"ReturnPolicy",
"Warranty",
"ExchangePolicy",
"CustomerSegments",
"MarketingPreferences",
"CampaignID",
"CampaignName",
"CampaignStartDate",
"CampaignEndDate",
"AdSpend",
"ClickThroughRate",
"ConversionRate",
"CustomerAcquisitionCost",
"SocialMediaProfile",
"WebsiteURL",
"LastProfileUpdate",
"SecurityQuestions",
"TwoFactorAuthenticationEnabled",
"PasswordResetDate",
"ProfileCompletionPercentage",
"DataPrivacyConsent",
"MarketingConsent",
"UserRoles",
"AccessLevel",
"LoginAttempts",
"AccountLockStatus",
"SessionHistory",
"DeviceHistory",
"IPAddresses",
"GeolocationData",
"UserAgent",
"LastPasswordChange",
"AccountRecoveryEmail",
"SecurityAlertsEnabled",
"LastProfileVisit",
"LastPurchaseDate",
"RecentlyViewedProducts",
"Wishlist",
"SavedItems",
"ProductRecommendations",
"CustomerEngagementScore",
"PendingOrders",
"CompletedOrders",
"CancelledOrders",
"ReturnRequests",
"ExchangeRequests",
"CustomerType",
"AccountBalance",
"PaymentHistory",
"LastTransactionDate",
"LastTransactionAmount",
"PaymentMethodDetails",
"CreditCardExpiryDate",
"BankAccountNumber",
"BankRoutingNumber",
"PayPalEmail",
"CryptocurrencyWallet",
"SubscriptionStartDate",
"SubscriptionEndDate",
"SubscriptionStatus",
"SubscriptionPlan",
"SubscriptionRenewalDate",
"SubscriptionCancellationDate",
"TrialStartDate",
"TrialEndDate",
"TrialStatus",
"UserFeedback",
"UserSuggestions",
"UserComplaints",
"UserRatings",
"UserReviews",
"UserEngagementMetrics",
"UserActivityLog",
"UserSessionDuration",
"UserLastActiveDate",
"UserLastActiveTime",
"UserLastPurchaseAmount",
"UserLastPurchaseDate",
"UserLastLoginIP",
"UserLastLoginDevice",
"UserLastLoginLocation",
"UserLastPasswordChangeDate",
"UserLastProfileUpdateDate",
"UserLastEmailSentDate",
"UserLastSMSNotificationDate",
"UserLastPushNotificationDate",
"UserLastAppOpenDate",
"UserLastAppVersion",
"UserLastAppUpdateDate",
"UserLastAppCrashDate",
"UserLastAppCrashCount",
"UserLastAppSessionCount",
"UserLastAppSessionDuration",
"UserLastAppFeatureUsed",
"UserLastAppFeatureUsageDate",
"UserLastAppFeatureUsageCount",
"UserLastAppFeatureUsageDate",
"UserLastAppFeatureUsageTime",
"UserLastAppFeatureUsageDuration",
"UserLastAppFeatureUsageFrequency",
"UserLastAppFeatureRating",
"UserLastAppFeatureFeedback",
"UserLastAppFeatureComments",
"UserLastAppFeatureSuggestions",
"UserLastAppFeatureImprovements",
"UserLastAppFeatureEngagement",
"UserLastAppFeatureClicks",
"UserLastAppFeatureViews",
"UserLastAppFeatureInteractions",
"UserLastAppFeatureConversions",
"UserLastAppFeatureRevenue",
"UserLastAppFeatureCost",
}
// And for each of these keys, there is often another query written in snake_case format. |
I'm going to leave this for .net 11 |
[EXPERIMENT] Constant fold
String.GetNonRandomizedHashCode()
for const strings. It should basically speed up allDictionary<string, ..>
accesses with keys being string literalsDefault (Ordinal)
AMD EPYC 9V74:
Cobalt 100 arm:
Ordinal Ignore Case
AMD EPYC 9V74:
Cobalt 100 arm:
As a bonus, it now also unrolls half-constant Equals for the key (previously it was just
SpanHelpers.SequenceEquals
call)