Skip to content

[Umbrella] Implement improvements to protect CI and releases from changes to release tooling #816

@justaugustus

Description

@justaugustus

(Also sent to k-dev, SIG Release MLs, and SIG Testing: https://groups.google.com/d/topic/kubernetes-dev/OuLMzqZkdtw/discussion)

Following an attempt to improve the semantics of the release tooling via shellcheck (#726), we found that we were unable to stage releases.

Multiple fixes were merged in an attempt to bring us to a usable state.

An unintended and unexpected side effect of this was a cascading failure of multiple release-blocking jobs. A few for example:

Ultimately, it was decided that the right course of action was to revert back to a known good state in the repo (#814) to stop the bleeding.

This implies that, in our current state, it is inadvisable to make any changes to the tooling in this repo.

As such, I'm advising the following course of action (h/t to @nikhita, @liggitt, and @BenTheElder for being a sounding board):

  • (sig-release: Blockade changes to critical k/release tooling test-infra#13328) Add a blockade for files that have the potential to impact releasing and CI signal
    (this will require repo admins to explicitly approve and override the blockade to merge changes to critical tooling)
  • (in progress below) Examine and document exactly why these release-blocking jobs failed
    (they are using something in k/release; we need to figure out what those somethings are)
  • Tag the repo after executing a successful release of Kubernetes
    (this locks in a known good state of k/release that doesn't need to be master)
  • (sig-release: Add ci-kubernetes-build-canary to canary release tooling test-infra#13340) Setup a periodic/presubmit job that can emulate one of the existing jobs that broke recently
  • Refactor release tooling/jobs that depend on tooling to accept pulling a tag of k/release instead of master

At this point, we will have gotten to a place where we can safely make changes to k/release without impacting CI. We will then:

For longer term goals, we should seek to:

  • Write go tooling (and tests!) to replace the shell libraries (lib/{common,gitlib,releaselib}) and call these new tools in the existing release tooling
    (this allows us to get some immediate benefit of a more robust language w/o having to completely refactor)
  • Full refactor of existing tools (shell --> go)

(Some historical references: kubernetes/kubernetes#28922, kubernetes/kubernetes#16529, kubernetes/kubernetes#15560, kubernetes/kubernetes#8686)

Please take this an initial assessment of the situation and feel free to provide feedback. :)

/assign
/milestone v1.16
/area release-eng
/sig release
/kind bug
/priority critical-urgent

cc @kubernetes/sig-release-admins @kubernetes/release-engineering @dims @neolit123 @pswica

Metadata

Metadata

Assignees

Labels

area/release-engIssues or PRs related to the Release Engineering subprojectkind/bugCategorizes issue or PR as related to a bug.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.priority/important-soonMust be staffed and worked on either currently, or very soon, ideally in time for the next release.sig/releaseCategorizes an issue or PR as relevant to SIG Release.

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions