-
Notifications
You must be signed in to change notification settings - Fork 568
MCO-1669: add BootImageSkewEnforcement API #2357
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MCO-1669: add BootImageSkewEnforcement API #2357
Conversation
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
Hello @djoshy! Some important instructions when contributing to openshift/api: |
Thanks for the questions & review(sorry it took a while!), this should be ready for another look. Happy to hop on a call if that is easier. Update: Did another push to fix up some tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaving another handful of comments.
I'm also happy to hop on a call if you think it would be beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few minor comments, but other than these I think this looks good
operator/v1/tests/machineconfigurations.operator.openshift.io/BootImageSkewEnforcement.yaml
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of maybe dumb questions:
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.20.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
66a9ee8
to
ced817a
Compare
ced817a
to
8181cc5
Compare
Last push includes:
It seems to be failing |
@djoshy The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took another look at this and from what I recall this aligns with what was discussed synchronously. I'm going to circle back around to see if there are any outstanding comments left from Joel's previous review while I was out, but otherwise this direction looks good to me.
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: This pull request references MCO-1669 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.21.0" version, but no target version was set. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/test all I think we're good to go here, I've updated the flowcharts and added the validation tables to the description per our latest discussions. cc @yuqi-zhang @everettraven Care to take a last look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A handful of minor comments, but other than these the API changes and proposed flow LGTM
04f36b6
to
7087c1e
Compare
// manual describes the current boot image of the cluster. | ||
// This should be set to the oldest boot image used amongst all machine resources in the cluster. | ||
// This includes the RHCOS version of the boot image and the OCP release version which shipped with that RHCOS boot image. | ||
// If ocpVersion and rhcosVersion are defined, both values will be used for checking skew compliance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the lower value (older image?) will be the authority? When does it make sense for both of these to be set? Should we enforce a single value only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume the lower value (older image?) will be the authority?
Our initial thought was an AND check if they're both defined, but its not a strongly held opinion by any means.
When does it make sense for both of these to be set? Should we enforce a single value only? When does it make sense for both of these to be set?
In Manual mode, where the admin sets these values, requiring only one of them felt a bit more convenient in terms of UX.
In Automatic mode, the MCO's controller would be able to set both. I think we figured more diagnostic information, if possible, was better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose, if we think its a bit superfluous/problematic, we could potentially tighten up the validation later? 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think an AND check means the oldest image is the one the skew comes from
I guess it's hard to define what the right thing to do here is, but if you think in automatic mode there's value in setting both, I can see that making sense
We could add a validation so that in Manual mode, only one choice can be made (in spec?) to prevent us having to make a choice
Starting to feel like the manual spec config should be a discriminated union 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No strong opinion on this, but as David said, I think this was here mostly for ease of use for different admins, and in that sense, there shouldn't be a reason for them to set both RHCOS and OCP versions. I'd be +1 on Joel's last suggestion around enforcing that in manual mode, you can only set one or the other.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New union looks correct to me, thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated status side + tests too, let me know how that looks! I tried my best to align the go-docs, hope I didn't miss anything 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing it, but I don't see the tests added for the manual modes in the status specifically.
I know it may seem redundant, but I'd personally like to have explicit tests for the union behavior in both spec and status to ensure we don't inadvertently break any expectations here in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack, sorry about that, just added it - let me know if that's sufficient. Also fixed the typo that Joel had mentioned on slack in the new union discriminator's godoc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
argh, found another typo, fixing and pushing up shortly 😅
Update: should be ready now
b7878f2
to
9657bc1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another couple minor typos (old field names most likely)
// This field must match the OCP semver compatible format of x.y.z. This field must be between | ||
// 5 and 10 characters long. | ||
// Required when mode is set to "OCPVersion" and forbidden otherwise. | ||
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]+$')",message="bootImageOCPVersion must match the OCP semver compatible format of x.y.z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]+$')",message="bootImageOCPVersion must match the OCP semver compatible format of x.y.z" | |
// +kubebuilder:validation:XValidation:rule="self.matches('^[0-9]+\\\\.[0-9]+\\\\.[0-9]+$')",message="ocpVersion must match the OCP semver compatible format of x.y.z" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks! And added a test since I realized we didn't have them for this error mode.
9657bc1
to
08bfa12
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/test remaining-required Scheduling tests matching the |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: everettraven, JoelSpeed The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@djoshy I'll leave it up to you how you would like to get this change verified and add the |
/verified later We'll verify this later alongside the MCO PoC PR. |
@djoshy: Only users can be targets for the In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
/verified later @ptalgulk01 |
@djoshy: This PR has been marked to be verified later by In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Install time workflow, active for release (n-1) and above
Reconciliation flow for skew API fields, active for upgrades to release (n-1) & above
Skew enforcement mechanism sync loop, active in release n
Validation Tables
Both Spec Fields Set
Only ManagedBootImages Empty
Only SkewEnforcement Empty
Both Spec Fields Empty