Skip to content

Extended Toleration Operators for Threshold-Based Placement #5471

@helayoty

Description

@helayoty

Enhancement Description

Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like "SLA ≥ 95%".

Currently, NodeAffinity supports numeric comparisons (Gt, Lt, etc.) but lacks the operational benefits that taints/tolerations provide:

  • Policy orientation: NodeAffinity is per-pod; to keep most pods away from low-SLA nodes requires editing every workload. Taints invert control: nodes declare risk; only pods with matching tolerations may land.
  • Eviction semantics: Affinity has no eviction capability. Taints support NoExecute with tolerationSeconds, enabling operators to drain/evict pods when a node's SLA degrades or spot instances are reclaimed.
  • Operational ergonomics: Centralized, node-side policy is consistent with other safety taints (e.g., disk-pressure, memory-pressure).

This enhancement extends core/v1 Toleration to support numeric comparison operators (Lt, Gt) when matching Node Taints. This preserves the well-understood safety model of taints/tolerations while enabling threshold-based placement for SLA-aware scheduling.

Benefits for DRA and AI Workloads

  • Cost-reliability optimization: Bind resource claims to reliability tiers via taints with opt-in tolerations
  • Stage-aware placement: Different pipeline stages can tolerate different risk levels explicitly
  • Resilience after preemption: Use NoExecute/tolerationSeconds for graceful drain and controlled failover
  • Multi-tenant fairness: Prevent monopolization of high-SLA resources by requiring explicit tolerations
  • Smooth burst handling: Allow bursts to land on low-SLA pools with clear safety boundaries

The scheduler impact is limited to the existing TaintToleration Filter; no new scheduling stages or algorithms are required.

/sig scheduling
/sig apps
/stage alpha

/cc @ahg-g @alculquicondor @johnbelamaric @sanposhiho @kubernetes/sig-scheduling-misc

Metadata

Metadata

Assignees

Labels

sig/appsCategorizes an issue or PR as relevant to SIG Apps.sig/schedulingCategorizes an issue or PR as relevant to SIG Scheduling.stage/alphaDenotes an issue tracking an enhancement targeted for Alpha status

Type

No type

Projects

Status

In Progress

Status

Needs Triage

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions