From 2a365263b0c31ea44d8c91a301188612c491a3cb Mon Sep 17 00:00:00 2001 From: Tuomas Katila Date: Tue, 18 Apr 2023 10:50:48 +0300 Subject: [PATCH 1/3] gpu: add note about dry-run and yaml output Fixes: #1059 Signed-off-by: Tuomas Katila --- cmd/gpu_plugin/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/cmd/gpu_plugin/README.md b/cmd/gpu_plugin/README.md index c64ebe844..3e9503398 100644 --- a/cmd/gpu_plugin/README.md +++ b/cmd/gpu_plugin/README.md @@ -152,7 +152,9 @@ Release tagged images of the components are also available on the Docker hub, ta release version numbers in the format `x.y.z`, corresponding to the branches and releases in this repository. Thus the easiest way to deploy the plugin in your cluster is to run this command -Note: Replace `` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. +> **Note**: Replace `` with the desired [release tag](https://github.com/intel/intel-device-plugins-for-kubernetes/tags) or `main` to get `devel` images. + +> **Note**: Add ```--dry-run=client -o yaml``` to the ```kubectl``` commands below to visualize the yaml content being applied. See [the development guide](../../DEVEL.md) for details if you want to deploy a customized version of the plugin. From 4a4a0e5d2f3239588fae6aed0845d7bb6c3b0cad Mon Sep 17 00:00:00 2001 From: Tuomas Katila Date: Tue, 18 Apr 2023 11:49:52 +0300 Subject: [PATCH 2/3] operator: improve readme structure Fixes: #1132 Co-authored-by: Eero Tamminen Signed-off-by: Tuomas Katila --- cmd/operator/README.md | 116 +++++++++++++++++++++++------------------ 1 file changed, 66 insertions(+), 50 deletions(-) diff --git a/cmd/operator/README.md b/cmd/operator/README.md index 3650132ff..02c4a9ebb 100644 --- a/cmd/operator/README.md +++ b/cmd/operator/README.md @@ -5,6 +5,7 @@ Table of Contents * [Introduction](#introduction) * [Installation](#installation) * [Upgrade](#upgrade) +* [Limiting Supported Devices](#limiting-supported-devices) * [Known issues](#known-issues) ## Introduction @@ -16,6 +17,12 @@ administrators. ## Installation +The default operator deployment depends on NFD and cert-manager. Those components have to be installed to the cluster before the operator can be deployed. + +> **Note**: Operator can also be installed via Helm charts. See [INSTALL.md](../../INSTALL.md) for details. + +### NFD + Install NFD (if it's not already installed) and node labelling rules (requires NFD v0.10+): ``` @@ -38,7 +45,7 @@ nfd-worker-qqq4h 1/1 Running 0 25h Note that labelling is not performed immediately. Give NFD 1 minute to pick up the rules and label nodes. As a result all found devices should have correspondent labels, e.g. for Intel DLB devices the label is -intel.feature.node.kubernetes.io/dlb: +`intel.feature.node.kubernetes.io/dlb`: ``` $ kubectl get no -o json | jq .items[].metadata.labels |grep intel.feature.node.kubernetes.io/dlb "intel.feature.node.kubernetes.io/dlb": "true", @@ -55,6 +62,8 @@ deployments/operator/samples/deviceplugin_v1_fpgadeviceplugin.yaml: intel.fea deployments/operator/samples/deviceplugin_v1_dsadeviceplugin.yaml: intel.feature.node.kubernetes.io/dsa: 'true' ``` +### Cert-Manager + The default operator deployment depends on [cert-manager](https://cert-manager.io/) running in the cluster. See installation instructions [here](https://cert-manager.io/docs/installation/kubectl/). @@ -68,45 +77,7 @@ cert-manager-cainjector-87c85c6ff-59sb5 1/1 Running 0 21d cert-manager-webhook-64dc9fff44-29cfc 1/1 Running 0 21d ``` -Also if your cluster operates behind a corporate proxy make sure that the API -server is configured not to send requests to cluster services through the -proxy. You can check that with the following command: - -```bash -$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc" -``` - -In case there's no output and your cluster was deployed with `kubeadm` open -`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and -append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable: - -```yaml -apiVersion: v1 -kind: Pod -metadata: - ... -spec: - containers: - - command: - - kube-apiserver - - --advertise-address=10.237.71.99 - ... - env: - - name: http_proxy - value: http://proxy.host:8080 - - name: https_proxy - value: http://proxy.host:8433 - - name: no_proxy - value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local - ... -``` - -**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning, -set the cluster service names to `$no_proxy` before `kubeadm init`: - -``` -$ export no_proxy=$no_proxy,.svc,.svc.cluster.local -``` +### Device Plugin Operator Finally deploy the operator itself: @@ -117,7 +88,7 @@ $ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes Now you can deploy the device plugins by creating corresponding custom resources. The samples for them are available [here](/deployments/operator/samples/). -## Usage +### Device Plugin Custom Resource Deploy your device plugin by applying its custom resource, e.g. `GpuDevicePlugin` with @@ -134,8 +105,22 @@ NAME DESIRED READY NODE SELECTOR AGE gpudeviceplugin-sample 1 1 5s ``` +## Upgrade + +The upgrade of the deployed plugins can be done by simply installing a new release of the operator. + +The operator auto-upgrades operator-managed plugins (CR images and thus corresponding deployed daemonsets) to the current release of the operator. + +During upgrade the tag in the image path is updated (e.g. docker.io/intel/intel-sgx-plugin:tag), but the rest of the path is left intact. + +No upgrade is done for: +- Non-operator managed deployments +- Operator deployments without numeric tags + +## Limiting Supported Devices + In order to limit the deployment to a specific device type, -use one of kustomizations under deployments/operator/device. +use one of kustomizations under `deployments/operator/device`. For example, to limit the deployment to FPGA, use: @@ -148,20 +133,51 @@ In this case, create a new kustomization with the necessary resources that passes the desired device types to the operator using `--device` command line argument multiple times. -## Upgrade +## Known issues -The upgrade of the deployed plugins can be done by simply installing a new release of the operator. +### Cluster behind a proxy -The operator auto-upgrades operator-managed plugins (CR images and thus corresponding deployed daemonsets) to the current release of the operator. +If your cluster operates behind a corporate proxy make sure that the API +server is configured not to send requests to cluster services through the +proxy. You can check that with the following command: -The [registry-url]/[namespace]/[image] are kept intact on the upgrade. +```bash +$ kubectl describe pod kube-apiserver --namespace kube-system | grep -i no_proxy | grep "\.svc" +``` -No upgrade is done for: +In case there's no output and your cluster was deployed with `kubeadm` open +`/etc/kubernetes/manifests/kube-apiserver.yaml` at the control plane nodes and +append `.svc` and `.svc.cluster.local` to the `no_proxy` environment variable: -- Non-operator managed deployments -- Operator deployments without numeric tags +```yaml +apiVersion: v1 +kind: Pod +metadata: + ... +spec: + containers: + - command: + - kube-apiserver + - --advertise-address=10.237.71.99 + ... + env: + - name: http_proxy + value: http://proxy.host:8080 + - name: https_proxy + value: http://proxy.host:8433 + - name: no_proxy + value: 127.0.0.1,localhost,.example.com,10.0.0.0/8,.svc,.svc.cluster.local + ... +``` -## Known issues +**Note:** To build clusters using `kubeadm` with the right `no_proxy` settings from the very beginning, +set the cluster service names to `$no_proxy` before `kubeadm init`: + +``` +$ export no_proxy=$no_proxy,.svc,.svc.cluster.local +``` + +### Leader election enabled When the operator is run with leader election enabled, that is with the option `--leader-elect`, make sure the cluster is not overloaded with excessive From 89712802152fa21bd4a64eff9956eb4251467599 Mon Sep 17 00:00:00 2001 From: Tuomas Katila Date: Tue, 18 Apr 2023 14:07:25 +0300 Subject: [PATCH 3/3] gpu: add notes about gpu-plugin modes Fixes: #1381 Co-authored-by: Eero Tamminen Signed-off-by: Tuomas Katila --- cmd/gpu_plugin/README.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/cmd/gpu_plugin/README.md b/cmd/gpu_plugin/README.md index 3e9503398..484c09e2f 100644 --- a/cmd/gpu_plugin/README.md +++ b/cmd/gpu_plugin/README.md @@ -4,6 +4,7 @@ Table of Contents * [Introduction](#introduction) * [Modes and Configuration Options](#modes-and-configuration-options) +* [Operation modes for different workload types](#operation-modes-for-different-workload-types) * [Installation](#installation) * [Prerequisites](#prerequisites) * [Drivers for discrete GPUs](#drivers-for-discrete-gpus) @@ -50,11 +51,23 @@ backend libraries can offload compute operations to GPU. | -enable-monitoring | - | disabled | Enable 'i915_monitoring' resource that provides access to all Intel GPU devices on the node | | -resource-manager | - | disabled | Enable fractional resource management, [see also dependencies](#fractional-resources) | | -shared-dev-num | int | 1 | Number of containers that can share the same GPU device | -| -allocation-policy | string | none | 3 possible values: balanced, packed, none. It is meaningful when shared-dev-num > 1, balanced mode is suitable for workload balance among GPU devices, packed mode is suitable for making full use of each GPU device, none mode is the default. Allocation policy does not have effect when resource manager is enabled. | +| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: _balanced_ mode spreads workloads among GPU devices, _packed_ mode fills one GPU fully before moving to next, and _none_ selects first available device from kubelet. Default is _none_. Allocation policy does not have an effect when resource manager is enabled. | The plugin also accepts a number of other arguments (common to all plugins) related to logging. Please use the -h option to see the complete list of logging related options. +## Operation modes for different workload types + +Intel GPU-plugin supports a few different operation modes. Depending on the workloads the cluster is running, some modes make more sense than others. Below is a table that explains the differences between the modes and suggests workload types for each mode. Mode selection applies to the whole GPU plugin deployment, so it is a cluster wide decision. + +| Mode | Sharing | Intended workloads | Suitable for time critical workloads | +|:---- |:-------- |:------- |:------- | +| shared-dev-num == 1 | No, 1 container per GPU | Workloads using all GPU capacity, e.g. AI training | Yes | +| shared-dev-num > 1 | Yes, >1 containers per GPU | (Batch) workloads using only part of GPU resources, e.g. inference, media transcode/analytics, or CPU bound GPU workloads | No | +| shared-dev-num > 1 && resource-management | Yes and no, 1>= containers per GPU | Any. For best results, all workloads should declare their expected GPU resource usage (memory, millicores). Requires [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling). See also [fractional use](#fractional-resources-details) | Yes. 1000 millicores = exclusive GPU usage. See note below. | + +> **Note**: Exclusive GPU usage with >=1000 millicores requires that also *all other GPU containers* specify (non-zero) millicores resource usage. + ## Installation The following sections detail how to obtain, build, deploy and test the GPU device plugin.