[k8s] GPU labeler script should skip already labeled nodes #5023

SeungjinYang · 2025-03-24T18:11:04Z

sky/utils/kubernetes/gpu_labeler.py is a utility script a user can run to label GPU nodes in their k8s cluster. See https://docs.skypilot.co/en/latest/reference/kubernetes/kubernetes-setup.html#automatically-labelling-nodes for how this script may be used.

The python script labels GPU nodes by finding all nodes with nvidia.com/gpu resource on it, and scheduling a pod on each node which adds the necessary gpu label (specifically, skypilot.co/accelerator: <gpu_name> label). The relevant logic is copied here:

        # Get the list of nodes with GPUs
        gpu_nodes = []
        for node in nodes:
            if kubernetes_utils.get_gpu_resource_key() in node.status.capacity:
                gpu_nodes.append(node)
        ... # launch labeling job on each node

While this script works, the script launches a labeling job on every node with GPU resource - regardless of if the node has already been labeled.

One could imagine a k8s cluster with GPU nodes that have been labeled in the past, but had additional nodes join the cluster to better scale workloads. In such cases, a user may run the GPU labeler script to label the nodes that have just joined the cluster, but the script will schedule pods even on already labeled nodes. This is inefficient, and we'd like to avoid this.

We could check, in the for loop mentioned above, if the node already has a skypilot.co/accelerator label. If the node does, we should not launch a job to label that node.

The text was updated successfully, but these errors were encountered:

SeungjinYang added good first issue Good for newcomers good starter issues labels Mar 24, 2025

SeungjinYang mentioned this issue Mar 28, 2025

[k8s] sky check detects unlabeled nodes #5065

Merged

3 tasks

SeungjinYang closed this as completed in #5065 Apr 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[k8s] GPU labeler script should skip already labeled nodes #5023

[k8s] GPU labeler script should skip already labeled nodes #5023

SeungjinYang commented Mar 24, 2025 •

edited

Loading

[k8s] GPU labeler script should skip already labeled nodes #5023

[k8s] GPU labeler script should skip already labeled nodes #5023

Comments

SeungjinYang commented Mar 24, 2025 • edited Loading

SeungjinYang commented Mar 24, 2025 •

edited

Loading