forked from kubernetes/kubernetes
-
Notifications
You must be signed in to change notification settings - Fork 0
cos patch #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
upodroid
wants to merge
589
commits into
master
Choose a base branch
from
cos-patch
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…n pkg/controller/
refactor(DRA validation): Add granular controls for declarative validation migration
fix: use iifname for input interface name matches
…aks-in-kubelet-podcertsmanager fix race condition in kubelet's PodCertsManager
fix incorrect warning whenever headless service is created/updated
Add reviewers and approvers to api/testing
…edcachesync-with-context Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/
…rnamedcachesync refactor(controller): Use context-aware WaitForNamedCacheSync in resourcequota and HPA tests
Ensure consistent key schema requirements between cacher and etcd3
Fix the spelling error of grpc in the log
…s#133818) * Update kitten base image from agnhost:2.33 to agnhost:2.56 - Update BASEIMAGE from agnhost:2.33 to agnhost:2.56 for all platforms - Bump VERSION from 1.7 to 1.8 - Addresses issue kubernetes#131874 for updating outdated base images agnhost:2.56 is the latest version providing updated functionality and security fixes for the kitten test image. * Update kitten base image from agnhost:2.56 to agnhost:2.57 - Update BASEIMAGE to use latest agnhost:2.57 across all platforms
This change refactors the cloud-specific versions of the node lifecycle and node IPAM controllers to use a context.Context for cancellation and contextual logging, replacing the legacy stopCh pattern. This is a follow-up to PR kubernetes#133985, where these controllers were separated out due to their use in the legacy Cloud Controller Manager (CCM). It is a known issue that the CCM's startup logic does not pass the controller name via the context. This change proceeds with the refactoring to unify the cancellation logic across controllers, while acknowledging that contextual logs will be less detailed when these controllers are run in the CCM. Signed-off-by: Aditi Gupta <[email protected]>
…130951) * adding maxunavailable_violation metric added metric to list of stable metrics changed when metric gets incremented addressed comments fixed stable metrics list * Update pkg/controller/statefulset/metrics/metrics.go Co-authored-by: Filip Křepinský <[email protected]> * Update the metric and log verbosity level * Address false positives metric count Signed-off-by: Heba Elayoty <[email protected]> * Implement maxUnavailable and UnavailableReplicas metrics Signed-off-by: Heba Elayoty <[email protected]> * fix lint fmt Signed-off-by: Heba Elayoty <[email protected]> * update tests Signed-off-by: Heba Elayoty <[email protected]> * se metrics to 1 as a default * log for true validation only and update func sig. * Move maxUnavailable metric to the updateStatefulSetStatus Signed-off-by: Heba Elayoty <[email protected]> * change metrics stability level to Alpha Signed-off-by: Heba Elayoty <[email protected]> * fix unit test Signed-off-by: Heba Elayoty <[email protected]> * fix linting issue Signed-off-by: Heba Elayoty <[email protected]> * Address code review feedback Signed-off-by: Heba Elayoty <[email protected]> --------- Signed-off-by: Heba Elayoty <[email protected]> Co-authored-by: Filip Křepinský <[email protected]> Co-authored-by: Heba Elayoty <[email protected]>
Extract the prepareKey function
Added e2e_node test to verify that the Kubelet establishes only a single gRPC connection with the DRA plugin for all service calls during the plugin lifecycle. The test uses a custom listener to count accepted connections and asserts that only one connection is used for NodePrepareResources, NodeUnprepareResources, and NodeWatchResources calls.
…prefix4 Ensure keys used in storage and cacher start with resourcePrefix
…test-connection-reuse e2e_node: test DRA plugin gRPC connection reuse
…based-images [go] Bump dependencies, images and versions used to Go 1.25.1 and distroless iptables
Update MAP storage version to use v1beta1
Fix passing runtime.Object to HaveValidResourceVersion check
…anup DRA: ResourceSlice tracker cleanup
…to-get-list Move metrics calculations to getList
This reverts commit 243f47f. Not needed anymore, now that gengo calls 'gofmt -s'
On a given field or type, there may be multiple "cohorts" of validation which need to be processed together. For example, a short-circuit validation and a non-short-circuit for the same subfield. The unnamed cohort (aka the default cohort) is first, followed by named cohorts in order they are created. Named cohorts are emitted as inline functions, so early-return can be used. This allows tags like k8s:subfield or k8s:item to generate cohorts (named after the field's JSON name or the selector string, respectively). Subsequent commits will add support to those tags. Co-Authored-by: Yongrui Lin <[email protected]>
Given: ``` // +k8s:subfield(name)=+k8s:optional // +k8s:subfield(name)=+k8s:format=dns-label // +k8s:subfield(generateName)=+k8s:optional // +k8s:subfield(generateName)=+k8s:format=dns-label metav1.ObjectMeta `json:"metadata,omitempty" protobuf:"bytes,1,opt,name=metadata"` ``` ...we get: ``` errs = append(errs, func(fldPath *field.Path, obj, oldObj *metav1.ObjectMeta) (errs field.ErrorList) { func() { // cohort name if e := validate.Subfield(ctx, op, fldPath, obj, oldObj, "name", func(o *metav1.ObjectMeta) *string { return &o.Name }, validate.OptionalValue); len (e) != 0 { return // do not proceed } errs = append(errs, validate.Subfield(ctx, op, fldPath, obj, oldObj, "name", func(o *metav1.ObjectMeta) *string { return &o.Name }, validate.DNSLabel)...) }() func() { // cohort generateName if e := validate.Subfield(ctx, op, fldPath, obj, oldObj, "generateName", func(o *metav1.ObjectMeta) *string { return &o.GenerateName }, validate.OptionalValue); len(e) != 0 { return // do not proceed } errs = append(errs, validate.Subfield(ctx, op, fldPath, obj, oldObj, "generateName", func(o *metav1.ObjectMeta) *string { return &o.GenerateName }, validate.DNSLabel)...) }() return }(fldPath.Child("metadata"), &obj.ObjectMeta, safe.Field(oldObj, func(oldObj *corev1.ReplicationController) *metav1.ObjectMeta { return &oldObj.ObjectMeta }))...) ``` It is worth noting that this only works one level deep: ``` // +k8s:subfield(foo)=+k8s:subfield(bar)=+k8s:optional // +k8s:subfield(foo)=+k8s:subfield(bar)=+k8s:format=dns-label Field StructType ``` ...will generate one cohort ("foo") with two distinct calls to Subfield, so the optional will not short-circuit format. If we need deeper cohorting we need to either make subfield take multiple field names (e.g. `subfield(foo.bar)` or `subbfield(foo, bar)`) or we need to accumulate "structure" in addition to validator calls. This is similar to how we handle multiple `eachVal` tags - each one iterates the list.
Given: ``` // +k8s:item(stringKey: "target", intKey: 42, boolKey: true)=+k8s:validateFalse="item Items[stringKey=target,intKey=42,boolKey=true] 1" // +k8s:item(stringKey: "target", intKey: 42, boolKey: true)=+k8s:validateFalse="item Items[stringKey=target,intKey=42,boolKey=true] 2" Items []Item `json:"items"` ``` ...we get: ``` func() { // cohort {"stringKey": "target", "intKey": 42, "boolKey": true} errs = append(errs, validate.SliceItem(ctx, op, fldPath, obj, oldObj, func(item *Item) bool { return item.StringKey == "target" && item.IntKey == 42 && item.BoolKey == true }, validate.DirectEqual, func(ctx context.Context, op operation.Operation, fldPath *field.Path, obj, oldObj *Item) field.ErrorList { return validate.FixedResult(ctx, op, fldPath, obj, oldObj, false, "item Items[stringKey=target,intKey=42,boolKey=true] 1") })...) errs = append(errs, validate.SliceItem(ctx, op, fldPath, obj, oldObj, func(item *Item) bool { return item.StringKey == "target" && item.IntKey == 42 && item.BoolKey == true }, validate.DirectEqual, func(ctx context.Context, op operation.Operation, fldPath *field.Path, obj, oldObj *Item) field.ErrorList { return validate.FixedResult(ctx, op, fldPath, obj, oldObj, false, "item Items[stringKey=target,intKey=42,boolKey=true] 2") })...) }() ```
This new function does 1 main thing (changes how name validation works), and the rest is future-proofing. Specifically this changes the signature of the name validation function. Before: Name validation functions return `[]string` and apimachinery logic wraps those in `field.Invalid()`. Problem: This makes it hard to incrementally add declarative validation, which needs access to potential name and generateName errors to set origins properly. After: New name validation functions return `field.ErrorList`, which can be wrapped in trivial functions to do things like "mark as covered by declarative" in just one place rather than for all types. The "WithOpts" part of this gives us a new function name which isn't horrible and is plausibly useful in the future. Another notable change is that this no longer directly validates the `generateName` field (but only on the new path -- existing code still behaves the same). From the code: ``` + // generateName is not directly validated here. Types can have + // different rules for name generation, and the nameFn is for validating + // the post-generation data, not the input. In the past we assumed that + // name generation was always "append 5 random characters", but that's not + // NECESSARILY true. Also, the nameFn should always be considering the max + // length of the name, and it doesn't know enough about the name generation + // to do that. Also, given a bad generateName, the user will get errors + // for both the generateName and name fields. We will focus validation on + // the name field, which should give a better UX overall. ``` This also beefs up tests a bit.
This relies on `+k8s:subfield` and validation cohorts. The `k8s:optional` ensures that we don't run the name validation if name is empty, because core apimachinery will already flag it as Required(). This demonstrates some of the DV value - docs and clients are now (in theory) able to see what RC's name format is. Co-Authored-by: Yongrui Lin <[email protected]>
Everyone who referenced it now uses the underlying function. Clearer and frees me up to change objectMeta validation without impacting anyone else.
Once we go broad, people will copy these. Let's make them easy to debug :)
…ntextual-logs Add RunWithContext method for debugsocket
Update SIG-Autoscaling Leads
It is currently not supported.
…_maxitems Add declarative validation +k8s:maxItems tag to ResourceClaim
feat(validation-gen): Add "cohorts" & Tighten and simplify test framework
[InPlacePodVerticalScaling] Expand coverage for resourceQuota and limitRanger e2e tests
Rename CLE test files
9e82632
to
3b1f7b3
Compare
upodroid
pushed a commit
that referenced
this pull request
Oct 11, 2025
Instead of creating a new test case, the permutation is passed down. This enables adding the event numbers to the log output, which is useful to understand better which output belongs to which input: === RUN TestListPatchedResourceSlices/update-patch/2_3_0_1 tracker.go:396: I0929 14:28:40.032318] event #1: ResourceSlice add slice="s1" tracker.go:581: I0929 14:28:40.032404] event #1: syncing ResourceSlice resourceslice="s1" tracker.go:659: I0929 14:28:40.032446] event #1: ResourceSlice synced resourceslice="s1" change="add" tracker.go:396: I0929 14:28:40.032502] event kubernetes#2: ResourceSlice add slice="s2" tracker.go:581: I0929 14:28:40.032536] event kubernetes#2: syncing ResourceSlice resourceslice="s2" tracker.go:659: I0929 14:28:40.032568] event kubernetes#2: ResourceSlice synced resourceslice="s2" change="add" tracker.go:463: I0929 14:28:40.032609] event #0/#0: DeviceTaintRule add patch="rule" tracker.go:581: I0929 14:28:40.032639] event #0/#0: syncing ResourceSlice resourceslice="s1" tracker.go:703: I0929 14:28:40.032675] event #0/#0: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" tracker.go:807: I0929 14:28:40.032712] event #0/#0: applying matching DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" tracker.go:868: I0929 14:28:40.032780] event #0/#0: Assigned new taint ID, no matching taint resourceslice="s1" deviceTaintRule="rule" device="driver1.example.com/pool-1/device-1" taintID=0 taint="example.com/taint=tainted:NoExecute" tracker.go:654: I0929 14:28:40.033023] event #0/#0: ResourceSlice synced resourceslice="s1" change="update" diff=< @@ -23,7 +23,32 @@ "BindingConditions": null, "BindingFailureConditions": null, "AllowMultipleAllocations": null, - "Taints": null + "Taints": [ + { + "Rule": { + "metadata": { + "name": "rule" + }, + "spec": { + "deviceSelector": { + "pool": "pool-1" + }, + "taint": { + "key": "example.com/taint", + "value": "tainted", + "effect": "NoExecute", + "timeAdded": "2006-01-02T15:04:05Z" + } + }, + "status": {} + }, + "ID": 1, + "key": "example.com/taint", + "value": "tainted", + "effect": "NoExecute", + "timeAdded": "2006-01-02T15:04:05Z" + } + ] } ], "Taints": null, > tracker.go:482: I0929 14:28:40.033224] event #0/#1: DeviceTaintRule update patch="rule" diff=< @@ -4,7 +4,7 @@ }, "spec": { "deviceSelector": { - "pool": "pool-1" + "pool": "pool-2" }, "taint": { "key": "example.com/taint", > tracker.go:581: I0929 14:28:40.033285] event #0/#1: syncing ResourceSlice resourceslice="s1" tracker.go:703: I0929 14:28:40.033319] event #0/#1: processing DeviceTaintRule resourceslice="s1" deviceTaintRule="rule" tracker.go:654: I0929 14:28:40.033478] event #0/#1: ResourceSlice synced resourceslice="s1" change="update" diff=< @@ -23,32 +23,7 @@ "BindingConditions": null, "BindingFailureConditions": null, "AllowMultipleAllocations": null, - "Taints": [ - { - "Rule": { - "metadata": { - "name": "rule" - }, - "spec": { - "deviceSelector": { - "pool": "pool-1" - }, - "taint": { - "key": "example.com/taint", - "value": "tainted", - "effect": "NoExecute", - "timeAdded": "2006-01-02T15:04:05Z" - } - }, - "status": {} - }, - "ID": 1, - "key": "example.com/taint", - "value": "tainted", - "effect": "NoExecute", - "timeAdded": "2006-01-02T15:04:05Z" - } - ] + "Taints": null } ], "Taints": null, > tracker.go:581: I0929 14:28:40.033601] event #0/#1: syncing ResourceSlice resourceslice="s2" tracker.go:703: I0929 14:28:40.033633] event #0/#1: processing DeviceTaintRule resourceslice="s2" deviceTaintRule="rule" ... Disabling event checking only worked when actually running all sub-tests. When selectively running only one permutation with -run, the boolean variable was wrong: $ go test -run='.*/^update-patch$' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/ ok k8s.io/dynamic-resource-allocation/resourceslice/tracker $ go test -run='.*/^update-patch$/3_2_0_1' ./staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/ --- FAIL: TestListPatchedResourceSlices (0.01s) --- FAIL: TestListPatchedResourceSlices/update-patch (0.00s) --- FAIL: TestListPatchedResourceSlices/update-patch/3_2_0_1 (0.00s) tracker_test.go:762: Error Trace: /nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:762 /nvme/gopath/src/k8s.io/kubernetes/staging/src/k8s.io/dynamic-resource-allocation/resourceslice/tracker/tracker_test.go:856 Error: Not equal: expected: []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000346000)}} actual : []tracker.handlerEvent{tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc0001f9ba0)}, tracker.handlerEvent{event:"add", oldObj:(*api.ResourceSlice)(nil), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc000301d40), newObj:(*api.ResourceSlice)(0xc0003dba00)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0003dba00), newObj:(*api.ResourceSlice)(0xc000301d40)}, tracker.handlerEvent{event:"update", oldObj:(*api.ResourceSlice)(0xc0001f9ba0), newObj:(*api.ResourceSlice)(0xc0003dbba0)}} Now permutations are detected automatically based on the indices. While at it, documentation gets moved around a bit to make reading test cases easier without going to the implementation.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
VolumeAttachmentLimitExceeded
(Add Kubelet stress test for pod cleanup when rejection due toVolumeAttachmentLimitExceeded
kubernetes/kubernetes#133357)logs
to contextual logging[2]string
with a typeWhat type of PR is this?
What this PR does / why we need it:
Which issue(s) this PR is related to:
Special notes for your reviewer:
Does this PR introduce a user-facing change?
Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.: