Skip to content

Make the key ring cache expire shortly after an immediately-activated key is generated #54517

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 8 commits into from

Conversation

amcasey
Copy link
Member

@amcasey amcasey commented Mar 12, 2024

If a key is required and none is available, one will be generated and immediately activated. Immediately-activated keys are problematic because they may be used before they have propagated to other app instances. What's more, in a scenario where one instance needs to generate an immediately-activated key, it's likely that other instances will need to do the same. A given instance has no control over when other instances pick up its immediately-activated keys, but it can do its best to pick up theirs. Therefore, every time it generates an immediately-activated key, it assumes other instances may have as well and pre-emptively schedules a refresh of its own key ring cache.

This problem is already partially (and complementarily) addressed by KeyRingProvider.InAutoRefreshWindow, which allows missing keys to be re-fetched from the backing repository for two minutes after app-startup. This will cover the case where the short expiration period is too short to catch all updates but doesn't help with cases where an immediately-activated key is required after initial startup.

Sadly, this still does not completely address the issue: instances could still use keys unknown to their peers during the 15 second refresh window. To fix the issue properly, we'd need to know the order in which the keys had been persisted to the repository.

Part of #52678, which is part of #36157
Should benefit from #54490

… key is generated

If a key is required and none is available, one will be generated and immediately activated.  Immediately-activated keys are problematic because they may be used before they have propagated to other app instances.  What's more, in a scenario where one instance needs to generate an immediately-activated key, it's likely that other instances will need to do the same.  A given instance has no control over when other instances pick up its immediately-activated keys, but it can do its best to pick up theirs.  Therefore, every time it generates an immediately-activated key, it assumes other instances may have as well and pre-emptively schedules a refresh of its own key ring cache.

This problem is already partially (and complementarily) addressed by `KeyRingProvider.InAutoRefreshWindow`, which allows missing keys to be re-fetched from the backing repository for two minutes after app-startup.  This will cover the case where the short expiration period is too short to catch all updates but doesn't help with cases where an immediately-activated key is required after initial startup.

Sadly, this still does not completely address the issue: instances could still use keys unknown to their peers during the 15 second refresh window.  To fix the issue properly, we'd need to know the order in which the keys had been persisted to the repository.

Part of dotnet#52678, which is part of dotnet#36157
@ghost ghost added the area-dataprotection Includes: DataProtection label Mar 12, 2024
@amcasey amcasey requested a review from captainsafia March 21, 2024 21:58
@amcasey
Copy link
Member Author

amcasey commented Mar 21, 2024

@adityamandaleeka I'm tempted to add an appcontext switch for this. Thoughts?

// (in which case, other instances may have done the same)
? (generatedKey.ActivationDate < now + KeyManagementOptions.KeyPropagationWindow) // No clock skew on a key we generated
// Or we selected a key that has yet to propagate (presumably, from another instance)
: (defaultKey.CreationDate > now - KeyManagementOptions.KeyPropagationWindow);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this happen in a loop? What if this is still the default after the next refresh?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed it can. 😢

@amcasey
Copy link
Member Author

amcasey commented Mar 21, 2024

Idea: what if, instead of refreshing the key ring, we entered/extended an auto-refresh window.

We'd have no guarantee of an unknown key triggering a refresh within the window, so probably not ideal.

@adityamandaleeka
Copy link
Member

@adityamandaleeka I'm tempted to add an appcontext switch for this. Thoughts?

Sounds good to me. Under a "new data protection" switch that includes other stuff I assume? Feels like a separate switch for this might be overkill.

amcasey added 4 commits March 21, 2024 16:21
We don't want to refresh every 15 seconds until the propagation window elapses.  Storing the flag on the CacheableKeyRing simplifies thread-safe state management, but requires changes to some public test hooks.  This version of the change should be source- and binary-compatible with previous versions of .net.
/// of its own key ring cache. This property controls how long after an immediately-
/// activated key is generated the key ring cache will be refreshed.
/// </summary>
internal static TimeSpan ShortKeyRingRefreshPeriod { get; set; } = TimeSpan.FromSeconds(15);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any guidance here on what the minimum refresh period should be or why 15 seconds is a sane default?

// of downtime may discover that it has a valid, but soon-to-be-expired key. The replacement will not
// be immediately-activated, but may be activated before it has propagated.
var useShortRefreshPeriod = allowShortRefreshPeriod &&
(generatedKey is not null
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Move the second clause of this Boolean to a well-defined method (e.g. GeneratedKeyWithinPropogationWindow or something).

@@ -241,7 +262,7 @@ internal IKeyRing GetCurrentKeyRingCore(DateTime utcNow, bool forceRefresh = fal

try
{
newCacheableKeyRing = CacheableKeyRingProvider.GetCacheableKeyRing(utcNow);
newCacheableKeyRing = CacheableKeyRingProvider.GetCacheableKeyRing(utcNow, allowShortRefreshPeriod: existingCacheableKeyRing?.HasShortRefreshPeriod != true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does existingCacheableKeyRing?.HasShortRefreshPeriod != true functionally mean here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a nullable bool

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AppContext switch here?

@dotnet-policy-service dotnet-policy-service bot added the pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun label Apr 8, 2024
@amcasey
Copy link
Member Author

amcasey commented Apr 12, 2024

In the absence of races (e.g. as in the deterministic simulator), this provides essentially no benefit and slightly increases the amount of work done (e.g. network calls). Since races tend to occur on startup and we already have a startup mitigation (the auto-reload period), this change probably does not justify the complexity it adds.

@amcasey amcasey marked this pull request as draft April 12, 2024 19:26
@amcasey
Copy link
Member Author

amcasey commented Apr 12, 2024

@captainsafia Your feedback is still welcome, but this is now low priority.

@amcasey amcasey closed this May 2, 2024
@dotnet-policy-service dotnet-policy-service bot added this to the 9.0-preview5 milestone May 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-dataprotection Includes: DataProtection pending-ci-rerun When assigned to a PR indicates that the CI checks should be rerun
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants