Skip to content

gcp: Add nodepool for tests #7943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 30, 2025

Conversation

ameukam
Copy link
Member

@ameukam ameukam commented Mar 26, 2025

Related to:

Setup a new nodepool with taints so we can schedule specific tests on it for evaluation before we move all the test to a new nodepool. This nodepool will also use COS and cgroups v2.

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. area/infra Infrastructure management, infrastructure design, code in infra/ area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. approved Indicates a PR has been approved by an approver from all required OWNERS files. labels Mar 26, 2025
@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Mar 26, 2025
@ameukam
Copy link
Member Author

ameukam commented Mar 26, 2025

/hold
cc @BenTheElder @xmudrii @upodroid

@k8s-ci-robot k8s-ci-robot added size/S Denotes a PR that changes 10-29 lines, ignoring generated files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. labels Mar 26, 2025
@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

  # module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool will be created
+ resource "google_container_node_pool" "node_pool" {
      + cluster                     = (sensitive value)
      + id                          = (known after apply)
      + initial_node_count          = 1
      + instance_group_urls         = (known after apply)
      + location                    = (sensitive value)
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = (known after apply)
      + name_prefix                 = "pool6-"
      + node_count                  = (known after apply)
      + node_locations              = [
          + "us-central1-b",
          + "us-central1-c",
          + "us-central1-f",
        ]
      + operation                   = (known after apply)
      + project                     = "k8s-infra-prow-build"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 80
          + min_node_count  = 1
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + network_config (known after apply)

      + node_config {
          + disk_size_gb      = 100
          + disk_type         = "hyperdisk-balanced"
          + effective_taints  = (known after apply)
          + guest_accelerator = (known after apply)
          + image_type        = "COS_CONTAINERD"
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = (known after apply)
          + machine_type      = "c4-highmem-8"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "gke-nodes-prow-build@k8s-infra-prow-build.iam.gserviceaccount.com"
          + spot              = false

          + confidential_nodes (known after apply)

          + gcfs_config (known after apply)

          + kubelet_config (known after apply)

          + shielded_instance_config (known after apply)

          + taint {
              + effect = "NO_SCHEDULE"
              + key    = "dedicated"
              + value  = "sig-testing"
            }

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }

      + upgrade_settings (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 0 to change, 0 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@ameukam ameukam force-pushed the prow-build-nodepool-c4 branch from 4c22d88 to 993f75a Compare March 26, 2025 21:54
@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

  # module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool will be created
+ resource "google_container_node_pool" "node_pool" {
      + cluster                     = (sensitive value)
      + id                          = (known after apply)
      + initial_node_count          = 1
      + instance_group_urls         = (known after apply)
      + location                    = (sensitive value)
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = (known after apply)
      + name_prefix                 = "pool6-"
      + node_count                  = (known after apply)
      + node_locations              = [
          + "us-central1-b",
          + "us-central1-c",
          + "us-central1-f",
        ]
      + operation                   = (known after apply)
      + project                     = "k8s-infra-prow-build"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 80
          + min_node_count  = 1
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + network_config (known after apply)

      + node_config {
          + disk_size_gb      = 500
          + disk_type         = "hyperdisk-balanced"
          + effective_taints  = (known after apply)
          + guest_accelerator = (known after apply)
          + image_type        = "COS_CONTAINERD"
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = (known after apply)
          + machine_type      = "c4-highmem-8"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "gke-nodes-prow-build@k8s-infra-prow-build.iam.gserviceaccount.com"
          + spot              = false

          + confidential_nodes (known after apply)

          + gcfs_config (known after apply)

          + kubelet_config (known after apply)

          + shielded_instance_config (known after apply)

          + taint {
              + effect = "NO_SCHEDULE"
              + key    = "dedicated"
              + value  = "sig-testing"
            }

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }

      + upgrade_settings (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 0 to change, 0 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@ameukam ameukam force-pushed the prow-build-nodepool-c4 branch from 993f75a to a29a7c6 Compare March 27, 2025 13:30
@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

  # module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool will be created
+ resource "google_container_node_pool" "node_pool" {
      + cluster                     = (sensitive value)
      + id                          = (known after apply)
      + initial_node_count          = 1
      + instance_group_urls         = (known after apply)
      + location                    = (sensitive value)
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = (known after apply)
      + name_prefix                 = "sig-testing-pool6-"
      + node_count                  = (known after apply)
      + node_locations              = [
          + "us-central1-b",
          + "us-central1-c",
          + "us-central1-f",
        ]
      + operation                   = (known after apply)
      + project                     = "k8s-infra-prow-build"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 80
          + min_node_count  = 1
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + network_config (known after apply)

      + node_config {
          + disk_size_gb      = 500
          + disk_type         = "hyperdisk-balanced"
          + effective_taints  = (known after apply)
          + guest_accelerator = (known after apply)
          + image_type        = "COS_CONTAINERD"
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = (known after apply)
          + machine_type      = "c4-highmem-8"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "gke-nodes-prow-build@k8s-infra-prow-build.iam.gserviceaccount.com"
          + spot              = false

          + confidential_nodes (known after apply)

          + gcfs_config (known after apply)

          + kubelet_config (known after apply)

          + shielded_instance_config (known after apply)

          + taint {
              + effect = "NO_SCHEDULE"
              + key    = "dedicated"
              + value  = "sig-testing"
            }

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }

      + upgrade_settings (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 0 to change, 0 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@ameukam
Copy link
Member Author

ameukam commented Mar 27, 2025

atlantis apply

@k8s-infra-ci-robot
Copy link
Contributor

Ran Apply for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Apply Error

Show Output
running "/atlantis/bin/terraform1.11.3 apply -input=false \"/atlantis/repos/kubernetes/k8s.io/7943/default/infra/gcp/terraform/k8s-infra-prow-build/default.tfplan\"" in "/atlantis/repos/kubernetes/k8s.io/7943/default/infra/gcp/terraform/k8s-infra-prow-build": exit status 1
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Creating...
╷
│ Error: error creating NodePool: googleapi: Error 400: Node_pool.name must be less than 40 characters.
│ Details:
│ [
│   {
│     "@type": "type.googleapis.com/google.rpc.RequestInfo",
│     "requestId": "0x670e6335c60a1705"
│   }
│ ]
│ , badRequest
│ 
│   with module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool,
│   on ../modules/gke-nodepool/main.tf line 17, in resource "google_container_node_pool" "node_pool":
│   17: resource "google_container_node_pool" "node_pool" {
│ 
╵

Related to:
  - kubernetes#2438

Setup a new nodepool with taints so we can schedule specific tests on it
for evaluation before we move all the test to a new nodepool.
This nodepool will also use COS and cgroups v2.

Signed-off-by: Arnaud Meukam <[email protected]>
@ameukam ameukam force-pushed the prow-build-nodepool-c4 branch from a29a7c6 to e135ed4 Compare March 27, 2025 23:16
@k8s-infra-ci-robot
Copy link
Contributor

Ran Plan for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
+ create

Terraform will perform the following actions:

  # module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool will be created
+ resource "google_container_node_pool" "node_pool" {
      + cluster                     = (sensitive value)
      + id                          = (known after apply)
      + initial_node_count          = 1
      + instance_group_urls         = (known after apply)
      + location                    = (sensitive value)
      + managed_instance_group_urls = (known after apply)
      + max_pods_per_node           = (known after apply)
      + name                        = (known after apply)
      + name_prefix                 = "pool6-"
      + node_count                  = (known after apply)
      + node_locations              = [
          + "us-central1-b",
          + "us-central1-c",
          + "us-central1-f",
        ]
      + operation                   = (known after apply)
      + project                     = "k8s-infra-prow-build"
      + version                     = (known after apply)

      + autoscaling {
          + location_policy = (known after apply)
          + max_node_count  = 80
          + min_node_count  = 1
        }

      + management {
          + auto_repair  = true
          + auto_upgrade = true
        }

      + network_config (known after apply)

      + node_config {
          + disk_size_gb      = 500
          + disk_type         = "hyperdisk-balanced"
          + effective_taints  = (known after apply)
          + guest_accelerator = (known after apply)
          + image_type        = "COS_CONTAINERD"
          + labels            = (known after apply)
          + local_ssd_count   = (known after apply)
          + logging_variant   = (known after apply)
          + machine_type      = "c4-highmem-8"
          + metadata          = {
              + "disable-legacy-endpoints" = "true"
            }
          + min_cpu_platform  = (known after apply)
          + oauth_scopes      = [
              + "https://www.googleapis.com/auth/cloud-platform",
            ]
          + preemptible       = false
          + service_account   = "gke-nodes-prow-build@k8s-infra-prow-build.iam.gserviceaccount.com"
          + spot              = false

          + confidential_nodes (known after apply)

          + gcfs_config (known after apply)

          + kubelet_config (known after apply)

          + shielded_instance_config (known after apply)

          + taint {
              + effect = "NO_SCHEDULE"
              + key    = "dedicated"
              + value  = "sig-testing"
            }

          + workload_metadata_config {
              + mode = "GKE_METADATA"
            }
        }

      + upgrade_settings (known after apply)
    }

Plan: 1 to add, 0 to change, 0 to destroy.
  • ▶️ To apply this plan, comment:
    atlantis apply -d infra/gcp/terraform/k8s-infra-prow-build
  • 🚮 To delete this plan and lock, click here
  • 🔁 To plan this project again, comment:
    atlantis plan -d infra/gcp/terraform/k8s-infra-prow-build

Plan: 1 to add, 0 to change, 0 to destroy.


  • ⏩ To apply all unapplied plans from this Pull Request, comment:
    atlantis apply
  • 🚮 To delete all plans and locks from this Pull Request, comment:
    atlantis unlock

@ameukam
Copy link
Member Author

ameukam commented Mar 27, 2025

atlantis apply

@k8s-infra-ci-robot
Copy link
Contributor

Ran Apply for dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

Show Output
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Creating...
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [10s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [20s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [30s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [40s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [50s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m1s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m11s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m21s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m31s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m41s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [1m51s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m1s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m11s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m21s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m31s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m41s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [2m51s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [3m1s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [3m11s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [3m21s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [3m31s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Still creating... [3m41s elapsed]
module.prow_build_nodepool_c4_highmem_8_localssd.google_container_node_pool.node_pool: Creation complete after 3m45s [id=projects/k8s-infra-prow-build/locations/us-central1/clusters/prow-build/nodePools/pool6-20250327232037500200000001]

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.

Copy link
Member

@upodroid upodroid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 30, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ameukam, upodroid

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@upodroid
Copy link
Member

Canceling the hold as the infra change has been applied

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 30, 2025
@k8s-ci-robot k8s-ci-robot merged commit 610792f into kubernetes:main Mar 30, 2025
13 checks passed
@k8s-infra-ci-robot
Copy link
Contributor

Locks and plans deleted for the projects and workspaces modified in this pull request:

  • dir: infra/gcp/terraform/k8s-infra-prow-build workspace: default

@k8s-ci-robot k8s-ci-robot added this to the v1.33 milestone Mar 30, 2025
ameukam added a commit to ameukam/test-infra that referenced this pull request Mar 31, 2025
Follow-up:
 - kubernetes/k8s.io#7943

move some prowjobs to a dedicated nodepool in order to evaluate new
instances.

Signed-off-by: Arnaud Meukam <[email protected]>
ameukam added a commit to ameukam/test-infra that referenced this pull request Apr 13, 2025
Follow-up of:
  - kubernetes/k8s.io#7943

Move ci-test-infra-continuous-test to a dedicated nodepool

Signed-off-by: Arnaud Meukam <[email protected]>
ameukam added a commit to ameukam/test-infra that referenced this pull request Apr 24, 2025
Follow-up of:
  - kubernetes/k8s.io#7943

Move ci-test-infra-continuous-test to a dedicated nodepool

Signed-off-by: Arnaud Meukam <[email protected]>
disk_size_gb = 500
disk_type = "hyperdisk-balanced"
service_account = module.prow_build_cluster.cluster_node_sa.email
taints = [{ key = "dedicated", value = "sig-testing", effect = "NO_SCHEDULE" }]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weirdly I don't see this taint on the actual cluster nodes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$gcloud container node-pools describe pool6-20250327232037500200000001 --project k8s-infra-prow-build --format='value(config.taints)' --location us-central1 --cluster prow-build
{'effect': 'NO_SCHEDULE', 'key': 'dedicated', 'value': 'sig-testing'}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/infra/gcp Issues or PRs related to Kubernetes GCP infrastructure area/infra Infrastructure management, infrastructure design, code in infra/ area/prow Setting up or working with prow in general, prow.k8s.io, prow build clusters area/terraform Terraform modules, testing them, writing more of them, code in infra/gcp/clusters/ cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants