-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Description
Which component are you using?:
Cluster Autoscaler
Is your feature request designed to solve a problem? If so describe the problem this feature should solve.:
Cluster Autoscaler has two parts: shared ("core") functionality which is generic to any cloud provider and the cloud provider specific one. CA releases contain both the shared part and all (~30) the cloud provider specific code, along with its dependencies. This approach leads to several problems:
- With large number of supported cloud providers comes even larger number of imported libraries. This increases a risk of CA release having bad dependencies: either malicious or simply bugged, which can lead to security (CA permissions in the cluster have to be quite wide) and performance (additional logic burning cpu cycles) issues.
- Maintainability is a problem. Many cloudprovider OWNERS are not k8s org members, so they cannot approve cloudprovider specific changes. At the same time, people with affinity to other cloud providers may not care enough or not be familiar enough to review or make changes in a specific cloud provider code.
- Release qualification doesn't exist. With so many cloud providers it is hard to provide sufficient coverage and we provide none (beyond unit tests). Cluster Autoscaler: Segmentation violation caused by HintingSimulator predicateChecker nil pointer #5378 is a recent example of that: 1.26.0 image was released even though it panics in runtime.
Describe the solution you'd like.:
I believe cloud provider specific code should live in separate repositories. OSS Cluster Autoscaler should really be a library that is being used in various ways, rather than a component trying to support all possible cloud providers. There may be an implementation or two that make sense in this repo (grpc & cluster API come to mind), but everything else probably belongs elsewhere.
Describe any alternative solutions you've considered.:
- Introduce more policies around cloud providers (i.e. iterate on https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/POLICY.md) . This might solve maintainability issues, but release qualification, security & performance risks would be hard to mitigate this way.
- Release separate CA image for each cloud provider. This was previously discussed on SIG meeting and would address security/performance concerns with multiple dependencies. It might potentially also help with release qualification. Maintainability would still be a concern.
- Combine the two above. This presents a chance to address all issues, but it would require building a lot of infrastructure which - in my opinion - shouldn't really be a part of this repository.
Additional context.: