-
Notifications
You must be signed in to change notification settings - Fork 449
OCPBUGS-59723: Add mechanism to delete MCN v1alpha1 CRD in 4.16+ clusters #5215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@djoshy: This pull request references Jira Issue OCPBUGS-59723, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great to me, thanks @djoshy. Since I cannot answer your note about the image, I'll leave final review & tagging to a more senior member of the team than I.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall makes sense to me, just some followup questions inline
/retest |
1bcdfd1
to
10718d1
Compare
d679071
to
61bce7e
Compare
/test e2e-gcp-op |
43bc880
to
a9b20dd
Compare
@djoshy: This pull request references Jira Issue OCPBUGS-59723, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
a9b20dd
to
682dbeb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall the cronjob path makes sense to me, thanks for investigating thoroughly! Some last questions inline
restartPolicy: OnFailure | ||
containers: | ||
- name: crd-cleanup | ||
image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:48011539e9051f642e33e7aaa4617b93387e0d441f84d09d41f8f7966668bb04 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, should this be hardcoded? I thought you wanted to reference the RHCOS image of the corresponding payload
Or if we want to remove this in 4.21+ maybe that's fine as well to just use a singular
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh d'oh - I meant to change this back, was doing some manual testing and it must have snuck in. Although I guess your argument is fair, using the placeholder just makes it a tad more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be good now 😄
# Run every minute initially to trigger an immediate run | ||
schedule: "* * * * *" | ||
# Don't suspend initially - let it run once | ||
suspend: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this re-set every upgrade then? Since we patch it in the actual bash script, I thought the CVO would try to rectify back to this template?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think the create-only annotation should prevent updates to this resource.
/payload 4.20 blocking just to be safe |
/payload 4.20 nightly blocking |
@djoshy: trigger 10 job(s) of type blocking for the nightly release of OCP 4.20
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/69a756c0-73b3-11f0-8238-34e6750a6c1d-0 |
/test unit |
1 similar comment
/cherry-pick release-4.19 |
@djoshy: once the present PR merges, I will cherry-pick it on top of In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
/test all |
/retest-required |
1 similar comment
/retest-required |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: djoshy, yuqi-zhang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/override ci/prow/e2e-gcp-op-1of2 Overriding GCP tests, this PR has passed them before on this commit and these jobs are failing to launch due to some infra issues |
@djoshy: Overrode contexts on behalf of djoshy: ci/prow/e2e-gcp-op-1of2, ci/prow/e2e-gcp-op-2of2, ci/prow/e2e-gcp-op-single-node In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
8f6c60c
into
openshift:main
@djoshy: Jira Issue OCPBUGS-59723: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-59723 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
@djoshy: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
@djoshy: new pull request created: #5233 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
[ART PR BUILD NOTIFIER] Distgit: ose-machine-config-operator |
/cherry-pick release-4.19 |
@djoshy: new pull request could not be created: failed to create pull request against openshift/machine-config-operator#release-4.19 from head openshift-cherrypick-robot:cherry-pick-5215-to-release-4.19: status code 422 not one of [201], body: {"message":"Validation Failed","errors":[{"resource":"PullRequest","code":"custom","message":"A pull request already exists for openshift-cherrypick-robot:cherry-pick-5215-to-release-4.19."}],"documentation_url":"https://docs.github.com/rest/pulls/pulls#create-a-pull-request","status":"422"} In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
- What I did
This adds a new kubernetes CronJob manifest in the MCO's install folder to delete the unused
v1alpha1
MCN CRD. This job has a run level of0000_80_machine-config_00
; meaning that it would be deployed before the CVO applies thev1
MCN CRD whose run level is0000_80_machine-config_01
. While this delete may not happened instantly, once completed, the CVO will be able to successfully apply the v1 CRD. I also moved up the rbac manifest to the same run level so that the cronjob has the required permissions on its first try.- How to verify it
openshift-cluster-version
namespace. When the CVO finally gets to the MCO's install/ manifests(~780 mark):The last line is the MCN sync, which may fail initially while the job is still running. Examine cronjobs via:
This should've spawned a job:
Running an
oc get pods | grep cleanup
should get you the pod name,which can be examined to observe the logs:
This took about a minute for me(GCP cluster); the CVO sync should now progress to roll out the 4.20 MCO pods. The new operator will now begin to update the control-plane and worker nodes. It's possible that the job logs may get lost as the cluster updates; but the CR itself should persist.
On an install, this cronjob should be a no-op; you should see pod logs like:
- Other notes
I chose a CronJob for two reasons: