Skip to content

Commit 0fdd20e

Browse files
committed
add spot_instance_diversity
Signed-off-by: helen frank <[email protected]>
1 parent 6dc2b75 commit 0fdd20e

File tree

2 files changed

+55
-1
lines changed

2 files changed

+55
-1
lines changed

src/app/_meta.global.ts

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,9 @@ export default {
2626
workload_config: {},
2727
workload_diversity: {},
2828
keep_part_nodes: {},
29-
node_template_configuration: {}
29+
node_template_configuration: {},
30+
node_pod_evictor: {},
31+
spot_instance_diversity: {}
3032
}
3133
},
3234
security: {
Lines changed: 52 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,52 @@
1+
# Optimizing Cluster Resilience with Spot Instance Diversity Management
2+
3+
This document outlines a feature designed to enhance Kubernetes cluster resilience and efficiency when leveraging Spot instances. By intelligently distributing workloads across heterogeneous instance types, the system reduces operational risks while maintaining cost-effectiveness.
4+
5+
---
6+
7+
## Key Features
8+
9+
### Automated Instance Type Diversification
10+
Dynamically distribute workloads across multiple Spot instance types using a decentralized scheduling strategy. This reduces dependency risk by ensuring that no single instance type is overloaded.
11+
12+
### Operational Risk Mitigation
13+
Reduces service disruptions caused by Spot instance interruptions. By spreading workloads across diverse instance families (e.g., `m5`, `t3`, `c5`), the cluster maintains elasticity even during sudden Spot market volatility.
14+
15+
### Cost-Stability Balance
16+
Achieves an equilibrium between Spot instance cost savings and workload reliability. The scheduler adapts to real-time market conditions without requiring manual intervention.
17+
18+
---
19+
20+
## How It Works
21+
22+
1. **Initial State Analysis**
23+
The system evaluates current cluster composition. For example:
24+
| Instance Type | Allocation |
25+
|---------------|------------|
26+
| `m5.large` | 60% |
27+
| `t3.medium` | 20% |
28+
| `c5.xlarge` | 20% |
29+
30+
2. **Gradual Redistribution**
31+
New workloads are redirected toward underrepresented instance types. Over time, the distribution evolves toward:
32+
| Instance Type | Allocation |
33+
|---------------|------------|
34+
| `m5.large` | 40% |
35+
| `t3.medium` | 30% |
36+
| `c5.xlarge` | 30% |
37+
38+
3. **Real-Time Adaptation**
39+
The scheduler continuously monitors:
40+
- Availability zone capacity
41+
- Spot price fluctuations
42+
- Instance termination rate history
43+
Adjustments occur incrementally to maintain workload stability.
44+
45+
---
46+
47+
## Implementation Notes
48+
49+
- **Manual Activation Required**: This feature must be configured by the CloudPilot AI engineering team. Contact [email protected] for activation.
50+
- **Limitations**: Actual performance depends on real-time Spot market conditions and regional instance availability.
51+
52+
Technical Guide for DevOps & SRE Teams | For detailed configuration support or advanced implementation scenarios, contact the CloudPilot AI Engineering Team.

0 commit comments

Comments
 (0)