-
Notifications
You must be signed in to change notification settings - Fork 103
WIP: OCPBUGS-57474: Authorization Cache V2 #530
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…orizationCacheV2 implementation and switch all internal usage to an interface. Move the original resource-version-based cache to cachev1.go, update wiring and tests, and ensure all consumers use the new interface type.
Skipping CI for Draft Pull Request. |
/test all |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: jacobsee The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/retest |
…eV2 for event-driven cache updates
/retest |
/retest |
1 similar comment
/retest |
…uthorizationCacheV2 to use it. Switch to Workqueue instead of custom queueing logic. Use LastSyncResourceVersion in global rbac cache to short circuit hashing.
/retest |
/retest |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all your work on this, it looks really promising!
@@ -0,0 +1,645 @@ | |||
package auth |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting the cache.go/cache_test.go renames in a clean commit would make it obvious that there are no little changes mixed in.
informers.RoleBindings().Informer(), | ||
} | ||
|
||
globalRBACCache := NewGlobalRBACCache(scrLister, scrbLister) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to pass this in rather than constructing it here? You'd be able to write tests that simulate varying the global RBAC without setting up phony informers.
} | ||
|
||
// Run begins watching and synchronizing the cache | ||
func (ac *AuthorizationCacheV2) Run(period time.Duration) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can the periodic "syncs" be entirely removed in favor of the informer event handlers with periodic resync intervals?
// Start periodic synchronization | ||
go func() { | ||
// Initial sync | ||
rac.synchronize() | ||
|
||
ticker := time.NewTicker(period) | ||
defer ticker.Stop() | ||
|
||
for { | ||
select { | ||
case <-ticker.C: | ||
rac.synchronize() | ||
case <-rac.stopCh: | ||
return | ||
} | ||
} | ||
}() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this still needed?
const ( | ||
globalSyncKey = "global" | ||
namespaceKeyPrefix = "namespace:" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, it would be at least as clear to use ""
as a sentinel key for all namespaces and get rid of the string prefixing.
refs = append(refs, rbacResourceRef{ | ||
uid: string(cr.UID), | ||
resourceVersion: cr.ResourceVersion, | ||
kind: "ClusterRole", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is kind
necessary? We shouldn't have any two objects with the same UID.
// If we don't have previous versions cached, consider it changed | ||
if g.lastClusterRoleVersion == "" || g.lastClusterRoleBindingVersion == "" { | ||
return true | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks redundant. When would we return true from this path but not the path that immediately follows?
// GetQueueStatus returns information about the queue | ||
func (rac *ReactiveAuthorizationCacheV2) GetQueueStatus() (queued int) { | ||
return rac.queue.Len() | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems brittle to have tests that directly depend on this... can't we test the observable results of the workqueue processor without knowing about it?
} | ||
|
||
// GetCurrentHash returns the current global RBAC hash | ||
func (g *GlobalRBACCache) GetCurrentHash() string { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do callers need to be responsible for detecting that this hash might be stale and for updating it? I think correctness depends on always getting the latest hash. Caching based on the last sync RV makes sense as an optimization, but I think it's an implementation detail and not something callers care about.
@@ -129,7 +54,7 @@ func TestSyncNamespace(t *testing.T) { | |||
nsIndexer := cache.NewIndexer(cache.MetaNamespaceKeyFunc, cache.Indexers{}) | |||
nsLister := corev1listers.NewNamespaceLister(nsIndexer) | |||
|
|||
authorizationCache := NewAuthorizationCache( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there tests that can be run against both implementations? Or a shim that runs both implementations for real and detects behavior differences?
@jacobsee: This pull request references Jira Issue OCPBUGS-57474, which is invalid:
Comment The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@jacobsee: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
AuthorizationCache
fromcache.go
toAuthorizationCacheV1
incachev1.go
(no longer instantiated anywhere currently)AuthorizationCache
tocache.go
AuthorizationCacheV2
, which uses periodic cluster and namespace-specific hashing to track changes and invalidate the cache/notify watchersReactiveAuthorizationCacheV2
, which wrapsAuthorizationCacheV2
in event-handling so that heavy sync checks can happen less frequently and changes are still processed in a timely mannerReactiveAuthorizationCacheV2
Motivated by the fact that the current auth cache does not appear to be notifying properly when permissions have been removed during incremental synchronizations (https://issues.redhat.com/browse/OCPBUGS-57474), and it looks to be an old issue in a system that is rather difficult to reason around.