-
Notifications
You must be signed in to change notification settings - Fork 1.6k
[KEP-4639] Graduate image volumes to GA #5450
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,3 +3,5 @@ alpha: | |
approver: "@deads2k" | ||
beta: | ||
approver: "@deads2k" | ||
stable: | ||
approver: "@deads2k" |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -151,9 +151,9 @@ Items marked with (R) are required *prior to targeting to a milestone / release* | |
- [x] (R) KEP approvers have approved the KEP status as `implementable` | ||
- [x] (R) Design details are appropriately documented | ||
- [x] (R) Test plan is in place, giving consideration to SIG Architecture and SIG Testing input (including test refactors) | ||
- [ ] e2e Tests for all Beta API Operations (endpoints) | ||
- [ ] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [ ] (R) Minimum Two Week Window for GA e2e tests to prove flake free | ||
- [x] e2e Tests for all Beta API Operations (endpoints) | ||
- [x] (R) Ensure GA e2e tests meet requirements for [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [x] (R) Minimum Two Week Window for GA e2e tests to prove flake free | ||
- [x] (R) Graduation criteria is in place | ||
- [x] (R) [all GA Endpoints](https://github.com/kubernetes/community/pull/1806) must be hit by [Conformance Tests](https://github.com/kubernetes/community/blob/master/contributors/devel/sig-architecture/conformance-tests.md) | ||
- [x] (R) Production readiness review completed | ||
|
@@ -209,7 +209,7 @@ which go beyond running particular images. | |
artifact, we don't want the runtime to be the entity responsible for | ||
interpreting and correctly processing it to its final consumable state. | ||
That could be delegated to the consumer or perhaps to some hooks and is | ||
out of scope for alpha. | ||
out of scope for this enhancement. | ||
- Manifest list use cases are left out for now and will be restricted to | ||
matching architecture like we do today for images. In the future (if there are | ||
use cases) we will consider support for lists with items separated by | ||
|
@@ -586,7 +586,8 @@ feature cannot be used. Pods using the new `VolumeSource` combined with a not | |
supported container runtime version will fail to run on the node, because the | ||
`Mount.host_path` field is not set for those mounts. | ||
|
||
For security reasons, `ro` (read-only) options by default. | ||
For security reasons, `ro` (read-only) option is set by default. Having `rw` | ||
(read-write) support will require a follow-up enhancement. | ||
|
||
Note: in the process of mounting images into the container's rootfs, there may need to be intermediate mounts created. This is especially relevant if | ||
the CRI implementation wishes to support one image being mounted with multiple different SELinux labels. If that's done, the CRI implementation is responsible | ||
|
@@ -781,13 +782,11 @@ We expect no non-infra related flakes in the last month as a GA graduation crite | |
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with multiple pods and same image on the same node | ||
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with pod and multiple volumes | ||
- [sig-node] ImageVolume [NodeFeature:ImageVolume] should succeed with pod and pull policy of Always | ||
- [sig-node] ImageVolume [NodeFeature:ImageVolume] subPath should succeed when using a valid subPath | ||
- [sig-node] ImageVolume [NodeFeature:ImageVolume] subPath should fail if subPath in volume is not existing | ||
|
||
https://testgrid.k8s.io/sig-node-cri-o#pr-crio-cgrpv2-imagevolume-e2e | ||
|
||
When [containerd](https://github.com/containerd/containerd/pull/10579) adds | ||
support for the feature, then the e2e tests will become available for that | ||
runtime as well. | ||
|
||
### Graduation Criteria | ||
|
||
<!-- | ||
|
@@ -880,10 +879,14 @@ in back-to-back releases. | |
|
||
- Multiple examples of real world uses | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. GA criteria typically has a requirement to imlpement a Conformance test. Can we include it please. It was a recent contention point with DRA and we need to follow the best practices here There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Added the test graduation to conformance. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean real conformance, not only node conformance. Conformance tests should cover all APIs. In this case we may have a simple conformance test that will create image-backed volume and produces it's content as an output. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good, I assume we should still move the existing tests to node conformance and added another conformance test as requirement. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, exactly. We need "official" conformance to ensure conformant clusters implement this API. And NodeConformance to indicate that this is a general feature universally supported on all nodes |
||
- Production support in both CRI-O and containerd | ||
- Allowing time for feedback | ||
- Consider a new `RuntimeConfig` field to indicate to end users if the feature | ||
saschagrunert marked this conversation as resolved.
Show resolved
Hide resolved
|
||
is supported or not. | ||
- Security Evaluation ensuring robust protection without the `noexec` option | ||
- Removing the separate test lane: | ||
https://testgrid.k8s.io/sig-node-cri-o#pr-crio-cgrpv2-imagevolume-e2e | ||
- Move e2e test to node conformance and remove the | ||
`[NodeFeature:ImageVolume]` flag. | ||
- Create a simple conformance test that creates a pod using an image | ||
volume and verifies the output. | ||
- Allowing time for feedback: | ||
- https://github.com/kubernetes/kubernetes/issues/131557 | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
|
@@ -1075,18 +1078,23 @@ Longer term, we may want to require automated upgrade/rollback tests, but we | |
are missing a bunch of machinery and tooling and can't do that now. | ||
--> | ||
|
||
Manual testing that will be done: | ||
Manual testing that has been done: | ||
|
||
- Upgrade: | ||
- Enable the feature in the kube-apiserver, kubelet and container runtime | ||
- Create a workload which uses the feature | ||
- Verify that the image volume has been mounted. | ||
- **Upgrade**: | ||
1. Enable the feature in the kube-apiserver, kubelet and container runtime | ||
2. Create a workload which uses the feature | ||
3. Verify that the image volume has been mounted. | ||
|
||
- Rollback: | ||
- Disable the feature by rolling back the kube-apiserver, kubelet or | ||
container runtime | ||
- Recreate the workload, which will now fail because of either the not | ||
existing API or the unsupported runtime version. | ||
- **Rollback**: | ||
1. Disable the feature by rolling back the kube-apiserver, kubelet or | ||
container runtime | ||
2. Recreate the workload | ||
3. Verify that: | ||
- Container creation will fail because of using an not existing API | ||
- Container creation will fail because volume plugin of the kubelet is not | ||
available. | ||
- Container creation will succeed but volume won't get mounted if container | ||
runtime does not support the feature due to lacking CRI support. | ||
|
||
###### Is the rollout accompanied by any deprecations and/or removals of features, APIs, fields of API types, flags, etc.? | ||
|
||
|
@@ -1105,6 +1113,9 @@ For GA, this section is required: approvers should be able to confirm the | |
previous answers based on experience in the field. | ||
--> | ||
|
||
The added metrics `image_volume_requested_total` `image_volume_mounted_success` | ||
`image_volume_mounted_error` can be used for monitoring. | ||
|
||
###### How can an operator determine if the feature is in use by workloads? | ||
|
||
<!-- | ||
|
@@ -1375,6 +1386,7 @@ Major milestones might include: | |
- 02-10-2024 KEP updated | ||
- 06-02-2025 KEP targeting beta in v1.33 | ||
- 06-17-2025 KEP retargeting beta in v1.34, dropped noexec requirement | ||
- 09-03-2025 KEP retargeting GA in v1.35 | ||
|
||
## Drawbacks | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
beside the first "should fail" test, is there any tests needed for crashloop backoff?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have any specific scenario in mind which should be tested as well beside that the image is not available within the registry (ref test). The
[test/e2e_node/image_pull_test.go](https://github.com/kubernetes/kubernetes/blob/master/test/e2e_node/image_pull_test.go)
also don't seem to test further scenarios fwiw.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a blocking comment.
I am not sure how important it is to test transient failures of the image pull.
Also interesting question test might be - ability to delete the Pod while it is in image pull backoff or while it is downloading the image.