-
Notifications
You must be signed in to change notification settings - Fork 40
OCPCLOUD-2835: rebase on upstream 1.32.0 release #340
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Change log level of boot dist type and size defaulting in gce_price
…ud-endpoint-reloving-logging 7437 Add logging for endpoint resolving errors
Fix typo in error message
exclude self-managed nodes from being processed
…tion in Cluster Autoscaler Signed-off-by: Maxim Rubchinsky <[email protected]>
Signed-off-by: Omer Aplatony <[email protected]>
…5.75.2-2 Upgrade OCI providers SDK to v65.75.2.
Add flag to force remove long unregistered nodes
…dNodeInfo We need AddNodeInfo in order to propagate DRA objects through the snapshot, which makes AddNodeWithPods redundant.
AddNodes() is redundant - it was indended for batch adding nodes, with batch-specific optimizations in mind probably. However, it has always been implemented as just iterating over AddNode(), and is only used in test code. Most of the uses in the test code were initializing the cluster state. They are replaced with SetClusterState(), which will later be needed for handling DRA anyway (we'll have to start tracking things that aren't node- or pod-scoped). The other uses are replaced with inline loops over AddNode().
The method is already accessible via StorageInfos(), it's redundant.
AddNodeInfo already provides the same functionality, and has to be used in production code in order to propagate DRA objects correctly. Uses in production are replaced with SetClusterState(), which will later take DRA objects into account. Uses in the test code are replaced with AddNodeInfo().
RemoveNode is renamed to RemoveNodeInfo for consistency with other NodeInfo methods. For DRA, the snapshot will have to potentially allocate ResourceClaims when adding a Pod to a Node, and deallocate them when removing a Pod from a Node. This will happen in new methods added to ClusterSnapshot in later commits - SchedulePod and UnschedulePod. These new methods should be the "default" way of moving pods around the snapshot going forward. However, we'll still need to be able to add and remove pods from the snapshot "forcefully" to handle some corner cases (e.g. expendable pods). AddPod is renamed to ForceAddPod, and RemovePod to ForceRemovePod to highlight that these are no longer the "default" methods of moving pods around the snapshot, and are bypassing something important.
It's now redundant - SetClusterState with empty arguments does the same thing.
…groups Add support for node pool placement group config
…eanup CA: refactor ClusterSnapshot methods
…d-rrsa-new-env-vars 7435 Support New Alibaba Cloud ENV Variables names for RRSA Authorization
VPA - Update docs a little
this needs the carry patches we added for ocpbugs-11115 |
Add improved error handling for machines phase in the ClusterAPI node group implementation. When a machine is in Deleting/Failed/Pending phase, mark the cloudprovider.Instance with a status for cluster-autoscaler recovery actions. The changes: - Enhance Nodes listing to allow reporting the machine phase in Instance status - Add error status reporting for failed machines This change helps identify and manage failed machines more effectively, allowing the autoscaler to make better scaling decisions.
added the last carry patch from the upstream, we should be ok to merge this now. |
/hold cancel |
/lgtm |
2 similar comments
@elmiko: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
i think we are again seeing slow node creation on azure. /test e2e-azure-periodic-pre |
It looks like the git-history checker is picking up the It looks like this repo is using a custom script https://github.com/openshift/kubernetes-autoscaler/blob/master/hack/verify_history.sh and this might be a bug in it. I would suggest using commitchecker instead of it: https://github.com/openshift/release/blob/209cbb25d63506072a3089c85b2efdc84624c6e1/ci-operator/config/openshift/cluster-api/openshift-cluster-api-release-4.20.yaml#L173-L178 |
/test e2e-azure-periodic-pre |
i've created openshift/release#63941 to address the history stuff. |
/test e2e-hypershift |
1 similar comment
/test e2e-hypershift |
/tide refresh |
/payload 4.19 nightly blocking |
@sdodson: trigger 11 job(s) of type blocking for the nightly release of OCP 4.19
See details on https://pr-payload-tests.ci.openshift.org/runs/ci/917fc7c0-1b8b-11f0-96bd-dba6a0b812d0-0 |
/test e2e-hypershift |
8eef719
into
openshift:master
[ART PR BUILD NOTIFIER] Distgit: atomic-openshift-cluster-autoscaler |
[ART PR BUILD NOTIFIER] Distgit: vertical-pod-autoscaler |
/test |
This commit rebases the autoscaler on top of the Kubernetes/Autoscaler 1.32.0 release. There are several commits that we carry on top of the upstream autoscaler and the rebase process allows us to preserve those. Here is a description of the process I used to create this PR.
(inspired by the commit description for the 1.18 rebase. pr #139)
Process
First we need to identify the carry commits that we currently have, this is done against our previous rebase to catch new changes. Once identified we will drop commits which have merged upstream and only carry unique commits. (see below for the carried and dropped commits).
Identify carry commits (run from the openshift/master branch), these are the commits that begin with
UPSTREAM:
up until the merge commit for the previous rebase commit (merge upstream/cluster-autoscaler-1.31.0
)Important Note for next rebase: due to an error in the upstream
cluster-autoscaler-1.32.0
tag, this rebase uses thecluster-autoscaler-1.32
branch as the starting point.After identifying the carry commits, the next step is to create the new commit-tree that will be used for the rebase and then cherry pick the carry commits into the new branch. The following commands cover these steps:
With the
merge-1.32
branch in place, I cherry picked the carry commits which applied, resolved merge conflicts, and finally tested the resulting tree against the unit test and end-to-end suite.Carried Commits
These commits are for features which have not yet been accepted upstream, are integral to our CI platform, or are specific to the releases we create for OpenShift.
Squashed Commits
These commits were squashed into the carried commits to help reduce the length of our history. All these commits have been squashed into their topically related commits.
Dropped Commits
These commits were dropped.
Of special note in this rebase is this commit
due to the scale from zero changes being accepted upstream we can now drop our carried patch. but, the upstream implementation has differed slightly from our's (mainly around annotation names). we will need to carry this patch until we can fix all the providers to properly use the new annotations. This patch can be dropped once the epic contained in https://issues.redhat.com/browse/OCPCLOUD-2136 is completed.