Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions keps/prod-readiness/sig-node/4639.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,5 @@ alpha:
approver: "@deads2k"
beta:
approver: "@deads2k"
stable:
approver: "@deads2k"
58 changes: 35 additions & 23 deletions keps/sig-node/4639-oci-volume-source/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,9 +151,9 @@ Items marked with (R) are required *prior to targeting to a milestone / release*
- [x] (R) KEP approvers have approved the KEP status as `implementable`
- [x] (R) Design details are appropriately documented
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors)
- [ ] e2e Tests for all Beta API Operations (endpoints)
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [x] e2e Tests for all Beta API Operations (endpoints)
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free
- [x] (R) Graduation criteria is in place
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md)
- [x] (R) Production readiness review completed
Expand Down Expand Up @@ -209,7 +209,7 @@ which go beyond running particular images.
artifact, we don't want the runtime to be the entity responsible for
interpreting and correctly processing it to its final consumable state.
That could be delegated to the consumer or perhaps to some hooks and is
out of scope for alpha.
out of scope for this enhancement.
- Manifest list use cases are left out for now and will be restricted to
matching architecture like we do today for images. In the future (if there are
use cases) we will consider support for lists with items separated by
Expand Down Expand Up @@ -586,7 +586,8 @@ feature cannot be used. Pods using the new `VolumeSource` combined with a not
supported container runtime version will fail to run on the node, because the
`Mount.host_path` field is not set for those mounts.

For security reasons, `ro` (read-only) options by default.
For security reasons, `ro` (read-only) option is set by default. Having `rw`
(read-write) support will require a follow-up enhancement.

Note: in the process of mounting images into the container's rootfs, there may need to be intermediate mounts created. This is especially relevant if
the CRI implementation wishes to support one image being mounted with multiple different SELinux labels. If that's done, the CRI implementation is responsible
Expand Down Expand Up @@ -781,13 +782,11 @@ We expect no non-infra related flakes in the last month as a GA graduation crite
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with multiple pods and same image on the same node
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with pod and multiple volumes
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with pod and pull policy of Always
- [sig-node] ImageVolume [NodeFeature:ImageVolume] subPath should succeed when using a valid subPath
- [sig-node] ImageVolume [NodeFeature:ImageVolume] subPath should fail if subPath in volume is not existing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

beside the first "should fail" test, is there any tests needed for crashloop backoff?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have any specific scenario in mind which should be tested as well beside that the image is not available within the registry (ref test). The [test/e2e_node/image_pull_test.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/image_pull_test.go) also don't seem to test further scenarios fwiw.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a blocking comment.

I am not sure how important it is to test transient failures of the image pull.

Also interesting question test might be - ability to delete the Pod while it is in image pull backoff or while it is downloading the image.


https://testgrid.k8s.io/sig-node-cri-o#pr-crio-cgrpv2-imagevolume-e2e

When [containerd](https://github.com/containerd/containerd/pull/10579) adds
support for the feature, then the e2e tests will become available for that
runtime as well.

### Graduation Criteria

<!--
Expand Down Expand Up @@ -880,10 +879,14 @@ in back-to-back releases.

- Multiple examples of real world uses
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GA criteria typically has a requirement to imlpement a Conformance test. Can we include it please. It was a recent contention point with DRA and we need to follow the best practices here

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the test graduation to conformance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean real conformance, not only node conformance.

Conformance tests should cover all APIs. In this case we may have a simple conformance test that will create image-backed volume and produces it's content as an output.

Copy link
Member Author

@saschagrunert saschagrunert Sep 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good, I assume we should still move the existing tests to node conformance and added another conformance test as requirement.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, exactly. We need "official" conformance to ensure conformant clusters implement this API. And NodeConformance to indicate that this is a general feature universally supported on all nodes

- Production support in both CRI-O and containerd
- Allowing time for feedback
- Consider a new `RuntimeConfig` field to indicate to end users if the feature
is supported or not.
- Security Evaluation ensuring robust protection without the `noexec` option
- Removing the separate test lane:
https://testgrid.k8s.io/sig-node-cri-o#pr-crio-cgrpv2-imagevolume-e2e
- Move e2e test to node conformance and remove the
`[NodeFeature:ImageVolume]` flag.
- Create a simple conformance test that creates a pod using an image
volume and verifies the output.
- Allowing time for feedback:
- https://github.com/kubernetes/kubernetes/issues/131557

### Upgrade / Downgrade Strategy

Expand Down Expand Up @@ -1075,18 +1078,23 @@ Longer term, we may want to require automated upgrade/rollback tests, but we
are missing a bunch of machinery and tooling and can't do that now.
-->

Manual testing that will be done:
Manual testing that has been done:

- Upgrade:
- Enable the feature in the kube-apiserver, kubelet and container runtime
- Create a workload which uses the feature
- Verify that the image volume has been mounted.
- **Upgrade**:
1. Enable the feature in the kube-apiserver, kubelet and container runtime
2. Create a workload which uses the feature
3. Verify that the image volume has been mounted.

- Rollback:
- Disable the feature by rolling back the kube-apiserver, kubelet or
container runtime
- Recreate the workload, which will now fail because of either the not
existing API or the unsupported runtime version.
- **Rollback**:
1. Disable the feature by rolling back the kube-apiserver, kubelet or
container runtime
2. Recreate the workload
3. Verify that:
- Container creation will fail because of using an not existing API
- Container creation will fail because volume plugin of the kubelet is not
available.
- Container creation will succeed but volume won't get mounted if container
runtime does not support the feature due to lacking CRI support.

###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.?

Expand All @@ -1105,6 +1113,9 @@ For GA, this section is required: approvers should be able to confirm the
previous answers based on experience in the field.
-->

The added metrics `image_volume_requested_total` `image_volume_mounted_success`
`image_volume_mounted_error` can be used for monitoring.

###### How can an operator determine if the feature is in use by workloads?

<!--
Expand Down Expand Up @@ -1375,6 +1386,7 @@ Major milestones might include:
- 02-10-2024 KEP updated
- 06-02-2025 KEP targeting beta in v1.33
- 06-17-2025 KEP retargeting beta in v1.34, dropped noexec requirement
- 09-03-2025 KEP retargeting GA in v1.35

## Drawbacks

Expand Down
6 changes: 3 additions & 3 deletions keps/sig-node/4639-oci-volume-source/kep.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,18 +41,18 @@ approvers:
- "@mrunalp"

# The target maturity stage in the current dev cycle for this KEP.
stage: beta
stage: stable

# The most recent milestone for which work toward delivery of this KEP has been
# done. This can be the current (upcoming) milestone, if it is being actively
# worked on.
latest-milestone: "v1.34"
latest-milestone: "v1.35"

# The milestone at which this feature was, or is targeted to be, at each stage.
milestone:
alpha: "v1.31"
beta: "v1.34"
stable: "TBD"
stable: "v1.35"

# The following PRR answers are required at alpha release
# List the feature gate name and the components for which it must be enabled
Expand Down