-
Notifications
You must be signed in to change notification settings - Fork 520
Description
(Also sent to k-dev, SIG Release MLs, and SIG Testing: https://groups.google.com/d/topic/kubernetes-dev/OuLMzqZkdtw/discussion)
Following an attempt to improve the semantics of the release tooling via shellcheck (#726), we found that we were unable to stage releases.
Multiple fixes were merged in an attempt to bring us to a usable state.
An unintended and unexpected side effect of this was a cascading failure of multiple release-blocking jobs. A few for example:
- Multiple GCE test suites in master-blocking are failing kubernetes#79652
- [Failing Test] Uses kubeadm/kinder to run kubeadm-e2e and e2e tests kubernetes#79668
- [Failing Test] Uses kubeadm/kinder to run kubeadm-e2e and e2e tests against a cluster created using kinder's skew-master-on-stable workflow kubernetes#79669
Ultimately, it was decided that the right course of action was to revert back to a known good state in the repo (#814) to stop the bleeding.
This implies that, in our current state, it is inadvisable to make any changes to the tooling in this repo.
As such, I'm advising the following course of action (h/t to @nikhita, @liggitt, and @BenTheElder for being a sounding board):
- (sig-release: Blockade changes to critical k/release tooling test-infra#13328) Add a blockade for files that have the potential to impact releasing and CI signal
(this will require repo admins to explicitly approve and override the blockade to merge changes to critical tooling) - (in progress below) Examine and document exactly why these release-blocking jobs failed
(they are using something in k/release; we need to figure out what those somethings are) - Tag the repo after executing a successful release of Kubernetes
(this locks in a known good state of k/release that doesn't need to bemaster
) - (sig-release: Add ci-kubernetes-build-canary to canary release tooling test-infra#13340) Setup a periodic/presubmit job that can emulate one of the existing jobs that broke recently
- Refactor release tooling/jobs that depend on tooling to accept pulling a tag of k/release instead of
master
At this point, we will have gotten to a place where we can safely make changes to k/release without impacting CI. We will then:
- Write tests around the specific pieces of the tooling that caused job failure (maybe https://github.com/sstephenson/bats ?)
For longer term goals, we should seek to:
- Write go tooling (and tests!) to replace the shell libraries (
lib/{common,gitlib,releaselib}
) and call these new tools in the existing release tooling
(this allows us to get some immediate benefit of a more robust language w/o having to completely refactor) - Full refactor of existing tools (shell --> go)
(Some historical references: kubernetes/kubernetes#28922, kubernetes/kubernetes#16529, kubernetes/kubernetes#15560, kubernetes/kubernetes#8686)
Please take this an initial assessment of the situation and feel free to provide feedback. :)
/assign
/milestone v1.16
/area release-eng
/sig release
/kind bug
/priority critical-urgent
cc @kubernetes/sig-release-admins @kubernetes/release-engineering @dims @neolit123 @pswica