Skip to content

Conversation

EgorBo
Copy link
Member

@EgorBo EgorBo commented Jul 5, 2025

[EXPERIMENT] Constant fold String.GetNonRandomizedHashCode() for const strings. It should basically speed up all Dictionary<string, ..> accesses with keys being string literals

Default (Ordinal)

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class Bench
{
    static readonly Dictionary<string, int> Dictionary = new()
    {
        { "DOTNET_ROOT", 1 },
        { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
    };

    [Benchmark]
    public int Lookup() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];
}

AMD EPYC 9V74:

Method Toolchain Mean Error Ratio
Lookup Main 8.247 ns 0.0043 ns 1.00
Lookup PR 3.935 ns 0.0486 ns 0.48

Cobalt 100 arm:

Method Toolchain Mean Error Ratio
Lookup Main 11.222 ns 0.0080 ns 1.00
Lookup PR 6.324 ns 0.0083 ns 0.56

Ordinal Ignore Case

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    static readonly Dictionary<string, int> Dictionary = new(StringComparer.OrdinalIgnoreCase)
    {
        { "Hi", 1 },
        { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
    };

    [Benchmark]
    public int Lookup() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];
}

AMD EPYC 9V74:

Method Toolchain Mean Error Ratio
Lookup Main 19.550 ns 0.0076 ns 1.00
Lookup PR 4.519 ns 0.0395 ns 0.23

Cobalt 100 arm:

Method Toolchain Mean Error Ratio
Lookup Main 23.079 ns 0.0039 ns 1.00
Lookup PR 7.107 ns 0.0148 ns 0.31

As a bonus, it now also unrolls half-constant Equals for the key (previously it was just SpanHelpers.SequenceEquals call)

@github-actions github-actions bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 5, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2025

@EgorBot -amd -arm

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;

public class Bench
{
    static readonly Dictionary<string, int> Dictionary = new()
    {
        { "DOTNET_ROOT", 1 },
        { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
    };

    [Benchmark]
    public int Lookup() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];
}

if (pPinnedString != nullptr)
{
GCX_COOP();
PREPARE_NONVIRTUAL_CALLSITE(METHOD__STRING__GETNONRANDOMIZEDHASHCODE);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoking CoreLib at runtime won't work for AOT. What's your plan there?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just include & use the managed impl directly? (probably by moving that method to a separate cs file)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For cross-compilation, it only works if the implementation does not have dependencies on architecture specific properties. It is almost true for current implementation of GetNonRandomizedHashCode, except of endianness. It would break further if the implementation is changed e.g. to hash native int at a time on 64-bit architectures.

We have mirrored a lot of corelib implementation details in the JIT. Is this one (~20 lines) going over the threshold how much we want to mirror?

Copy link
Member Author

@EgorBo EgorBo Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is almost true for current implementation of GetNonRandomizedHashCode, except of endianness.

In an unlikely event of supporting BE in R2R/NAOT, we can just limit the opt for Host being LE and target being LE for simplicity?

It would break further if the implementation is changed e.g. to hash native int at a time on 64-bit architectures.

Same here - limit to x64 on x64? Also, native-int might not be a good idea due to 4-byte alignment of string data.

If NAOT/R2R support raises concerns we can probably leave it to be JIT only? Since the latter have no concerns. Although, we still might want to leave a note in the method that it might be triggered in JIT time.

Copy link
Member

@jkotas jkotas Jul 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can always make the shared file work for NAOT/R2R with enough if checks and ifdefs. My point is that it is de-facto a second implementation once you do that.

we can just limit the opt for Host being LE and target being LE for simplicity

Nit: Shortcuts like this violate invariants that we try to maintain for cross-compilation. We want our cross-compilers to produce byte-to-byte identical output for given target, irrespective of the host that the compiler is running on.

NAOT/R2R support raises concerns we can probably leave it to be JIT only?

Every time folks bring up complicated NAOT specific JIT optimizations, I am pointing out that it would be great if we just make the JIT optimizations work for R2R/AOT where possible - we are not there today that's unfortunate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should go extra mile to properly enable optimizations for AOT where possible. I do not have a problem with having JIT specific optimizations that are fundamentally incompatible with AOT (not the case here).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I read this all correctly - NAOT impl can't be done without introducing coupling. We can't land without NAOT. the extra coupling needs real world justification. To be honest, I don't have any specific example, I've vibe-coded a simple analyzer that tries to catch lookups (Constains, TryGetValue) with literals and named constants as keys - https://gist.github.com/EgorBo/e3ee6c711741956b423fb0ccdd0b51c4

Roslyn: 84 matches
runtime: 54 matches
aspnetcore: 25 matches
OrchardCore: 54 matches

(ignoring folders like 'tests' etc.)

I guess I'll put this on pause till some use-case appears.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any in your list that look hot?

We can't land without NAOT.

I would not say that. I can accept it with a good reason.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally forgot to mention that on NativeAOT it can only work with proper static PGO data, which is, presumably, never supplied on practice. Because this relies on:

  1. FindValue is expected to be inlined (on JIT Dynamic PGO boosts it)
  2. _comparer must be devirtualized under a guard (GDV)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I guess this one can be added on top of the existing NativeAOT optimizations gap then.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2025

@EgorBot -amd -arm

using System;
using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;

public class Bench
{
    static readonly Dictionary<string, int> Dictionary = new(StringComparer.OrdinalIgnoreCase)
    {
        { "Hi", 1 },
        { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }
    };

    [Benchmark]
    public int LookupLong() => Dictionary["DOTNET_SKIP_FIRST_TIME_EXPERIENCE"];

    [Benchmark]
    public int LookupShort() => Dictionary["Hi"];
}

{
// This may trigger ICU loading for non-ASCII output, but it should be fine.
// In most cases this is called for an optimized code for a hot block, so, presumably
// ICU has already been loaded (e.g by Tier0 code).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jkotas it seems that it's beneficial to do for the IgnoreCase, but I'm not sure it's a good idea to trigger ICU here. Perhaps, I should just walk the string and give up if it contains non-ASCII?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Invoking ICU is probably ok for JIT, but it is not ok for AOT.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

give up if it contains non-ASCII?

To extend it to 0xFFFF range without ICU, there is minipal_toupper_invariant(CHAR16_T).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think minipal_toupper_invariant is guaranteed to produce the exact same result as ICU in all situations (across all ICU versions).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For invariant culture, they are updated simultaneously with managed implementation. For invariant case change, managed implementation also has frozen data, which I think is applicable to all platforms alike. (e.g. last update 112fa51 modified both ReadOnlySpan<byte> CategoryCasingLevel1Index and minipal in one go per the instructions in readme).

Copy link
Member

@jkotas jkotas Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this depends on our local copy of casing data, the comment about loading ICU should be invalid. (I have not verified whether it depends on our local copy of casing data only. It is quite a bit of code...)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that Ordinal and OrdinalIgnoreCase comparisons for strings didn't involve ICU or globalization data? I'd assume this optimization would be restricted to only such cases.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If invariant culture uses local copy of casing data even when ICU is available (which I think is the case for consistent outputs across platforms, perhaps @tarekgh could confirm this), can we restrict the 0x7F .. 0xFFFF range optimization to InvariantCulture{Ignore,}Case when it is known? Minipal implementation is shared with all runtimes (coreclr, nativeaot and mono); just #include <minipal/strings.h>.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2025

I wonder if it's a good idea to generalize the JIT API to invoke any managed method with e.g. [Pure] attribute on it if its arguments are jit-time known constants. So then we just ask VM to eval it and then bake in the codegen whatever it outputs. Obviously, this would be JIT-only improvements, although, maybe we can re-use NAOT's interpreter for cctors for that on NAOT side.

@jkotas
Copy link
Member

jkotas commented Jul 5, 2025

I wonder if it's a good idea to generalize the JIT API to invoke any managed method with e.g. [Pure] attribute on it if its arguments are jit-time known constants.

I do not think we would want to depend on users to mark methods with Pure attribute. If we wanted to generalize this, I would teach the JIT to prove on its own that a method is a pure function and it safe to evaluate at JIT/AOT-time. (The AOT time evaluation can use interpreter.)

@jkotas
Copy link
Member

jkotas commented Jul 5, 2025

   { "DOTNET_ROOT", 1 },
   { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }

Are these two specific strings representative of some real word scenario? I would expect these specific keys to used only once. Also, DOTNET_SKIP_FIRST_TIME_EXPERIENCE was deprecated years ago.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 5, 2025

   { "DOTNET_ROOT", 1 },
   { "DOTNET_SKIP_FIRST_TIME_EXPERIENCE", 2 }

Are these two specific strings representative of some real word scenario? I would expect these specific keys to used only once. Also, DOTNET_SKIP_FIRST_TIME_EXPERIENCE was deprecated years ago.

Those are just some random values I used for benchmarks, it seems it's beneficial for any length, even 1-2 chars. Or were you asking about benefits of this transformation in general? To me it's not a lot of changes in the runtime (mostly, just JIT-EE noise) + a few unrelated changes that I'll extract to a separate PR; and I think it's not unpopular to do lookups with constant keys, isn't?

@jkotas
Copy link
Member

jkotas commented Jul 5, 2025

I think it's not unpopular to do lookups with constant keys, isn't?

I am not sure how popular are Dictionary lookups with constant keys on hot paths. It is useful to have example of real-world code that is going to benefit from new optimizations.

it's not a lot of changes in the runtime

The changes as currently implemented are introducing more coupling. It is the main "cost" that I see.

@neon-sunset
Copy link
Contributor

neon-sunset commented Jul 5, 2025

I am not sure how popular are Dictionary lookups with constant keys on hot paths. It is useful to have example of real-world code that is going to benefit from new optimizations.

This can speed-up a lot of code with custom Dictionary-based header handling logic where the values are looked up by constant keys every time. The only caveat here is that the dictionary instance may not be in a readonly static at all so if the logic can benefit all standard dictionary scenarios (e.g. pick up on if even a non-jit-constant dict. instance happens to have a non-randomized ordinal or ordinal ignore case comparer), this will likely have a lot of hits (even if not necessarily in reference repos tested against here).

I thought this kind of special-casing was against the rules though 😄

@jkotas
Copy link
Member

jkotas commented Jul 6, 2025

This can speed-up a lot of code with custom Dictionary-based header handling logic where the values are looked up by constant keys every time.

Is there a code on github that does this?

I thought this kind of special-casing was against the rules though

We are ok with special-casing like this for a good reason.

@hez2010
Copy link
Contributor

hez2010 commented Jul 6, 2025

Is there a code on github that does this?

A typical use scenario is reading HTTP headers with a given key, for example, Headers["Content-Type"]

@samsosa
Copy link

samsosa commented Jul 6, 2025

I think it's not unpopular to do lookups with constant keys, isn't?

I am not sure how popular are Dictionary lookups with constant keys on hot paths. It is useful to have example of real-world code that is going to benefit from new optimizations.

Our application architecture relies heavily on Dictionary<String, V> for managing configurations and implementing business logic across different layers.
We currently have over 300 string keys in total, but individual processing layers may only query around 30 of these keys at a time. But this means that optimizing the hash code generation for these frequently accessed constant strings can yield significant performance improvements.
The reason we do not use fixed data structures is that our approach has historically evolved this way. We utilize Dictionary<String, V> as "just a bag of data", leveraging it as the smallest common structure available in every .NET environment. This flexibility allows us to manage various types of data without being constrained by the limitations of more rigid data structures, making it an ideal choice for our diverse application needs.

Click here: These are just some of the keys I could quickly find in our source code.
        {
            "CustomerID",
            "CustomerName",
            "CustomerEmail",
            "CustomerPhone",
            "CustomerAddress",
            "CustomerCity",
            "CustomerState",
            "CustomerZipCode",
            "CustomerCountry",
            "AccountCreationDate",
            "LastLoginDate",
            "PreferredLanguage",
            "NewsletterSubscription",
            "LoyaltyPoints",
            "ReferralCode",
            "AccountStatus",
            "AccountType",
            "MembershipLevel",
            "TotalSpent",
            "AverageOrderValue",
            "LastOrderDate",
            "OrderHistory",
            "Feedback",
            "CustomerSatisfactionScore",
            "SupportTickets",
            "LastSupportInteraction",
            "PreferredContactMethod",
            "PaymentMethod",
            "PaymentStatus",
            "PaymentGateway",
            "TransactionID",
            "InvoiceNumber",
            "RefundStatus",
            "ShippingAddress",
            "BillingAddress",
            "ShippingMethod",
            "DeliveryInstructions",
            "OrderNotes",
            "OrderSource",
            "OrderDate",
            "DeliveryDate",
            "Status",
            "OrderID",
            "ProductID",
            "ProductName",
            "ProductDescription",
            "ProductCategory",
            "ProductPrice",
            "ProductQuantity",
            "ProductSKU",
            "ProductWeight",
            "ProductDimensions",
            "ProductImageURL",
            "ProductRating",
            "ProductReviews",
            "ProductAvailability",
            "WarehouseLocation",
            "SupplierName",
            "SupplierContact",
            "SupplierID",
            "PurchaseOrderID",
            "PurchaseOrderDate",
            "PurchaseOrderStatus",
            "ShippingCost",
            "Tax",
            "Discount",
            "TotalAmount",
            "Subtotal",
            "GrandTotal",
            "GiftWrap",
            "GiftMessage",
            "OrderTrackingNumber",
            "DeliveryService",
            "ReturnPolicy",
            "Warranty",
            "ExchangePolicy",
            "CustomerSegments",
            "MarketingPreferences",
            "CampaignID",
            "CampaignName",
            "CampaignStartDate",
            "CampaignEndDate",
            "AdSpend",
            "ClickThroughRate",
            "ConversionRate",
            "CustomerAcquisitionCost",
            "SocialMediaProfile",
            "WebsiteURL",
            "LastProfileUpdate",
            "SecurityQuestions",
            "TwoFactorAuthenticationEnabled",
            "PasswordResetDate",
            "ProfileCompletionPercentage",
            "DataPrivacyConsent",
            "MarketingConsent",
            "UserRoles",
            "AccessLevel",
            "LoginAttempts",
            "AccountLockStatus",
            "SessionHistory",
            "DeviceHistory",
            "IPAddresses",
            "GeolocationData",
            "UserAgent",
            "LastPasswordChange",
            "AccountRecoveryEmail",
            "SecurityAlertsEnabled",
            "LastProfileVisit",
            "LastPurchaseDate",
            "RecentlyViewedProducts",
            "Wishlist",
            "SavedItems",
            "ProductRecommendations",
            "CustomerEngagementScore",
            "PendingOrders",
            "CompletedOrders",
            "CancelledOrders",
            "ReturnRequests",
            "ExchangeRequests",
            "CustomerType",
            "AccountBalance",
            "PaymentHistory",
            "LastTransactionDate",
            "LastTransactionAmount",
            "PaymentMethodDetails",
            "CreditCardExpiryDate",
            "BankAccountNumber",
            "BankRoutingNumber",
            "PayPalEmail",
            "CryptocurrencyWallet",
            "SubscriptionStartDate",
            "SubscriptionEndDate",
            "SubscriptionStatus",
            "SubscriptionPlan",
            "SubscriptionRenewalDate",
            "SubscriptionCancellationDate",
            "TrialStartDate",
            "TrialEndDate",
            "TrialStatus",
            "UserFeedback",
            "UserSuggestions",
            "UserComplaints",
            "UserRatings",
            "UserReviews",
            "UserEngagementMetrics",
            "UserActivityLog",
            "UserSessionDuration",
            "UserLastActiveDate",
            "UserLastActiveTime",
            "UserLastPurchaseAmount",
            "UserLastPurchaseDate",
            "UserLastLoginIP",
            "UserLastLoginDevice",
            "UserLastLoginLocation",
            "UserLastPasswordChangeDate",
            "UserLastProfileUpdateDate",
            "UserLastEmailSentDate",
            "UserLastSMSNotificationDate",
            "UserLastPushNotificationDate",
            "UserLastAppOpenDate",
            "UserLastAppVersion",
            "UserLastAppUpdateDate",
            "UserLastAppCrashDate",
            "UserLastAppCrashCount",
            "UserLastAppSessionCount",
            "UserLastAppSessionDuration",
            "UserLastAppFeatureUsed",
            "UserLastAppFeatureUsageDate",
            "UserLastAppFeatureUsageCount",
            "UserLastAppFeatureUsageDate",
            "UserLastAppFeatureUsageTime",
            "UserLastAppFeatureUsageDuration",
            "UserLastAppFeatureUsageFrequency",
            "UserLastAppFeatureRating",
            "UserLastAppFeatureFeedback",
            "UserLastAppFeatureComments",
            "UserLastAppFeatureSuggestions",
            "UserLastAppFeatureImprovements",
            "UserLastAppFeatureEngagement",
            "UserLastAppFeatureClicks",
            "UserLastAppFeatureViews",
            "UserLastAppFeatureInteractions",
            "UserLastAppFeatureConversions",
            "UserLastAppFeatureRevenue",
            "UserLastAppFeatureCost",
        }

         // And for each of these keys, there is often another query written in snake_case format.

@EgorBo
Copy link
Member Author

EgorBo commented Jul 17, 2025

I'm going to leave this for .net 11

@EgorBo EgorBo closed this Jul 17, 2025
@github-actions github-actions bot locked and limited conversation to collaborators Aug 17, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants