gpu: add notes about gpu-plugin modes

tkatila · tkatila · commit 99f7993138a8 · 2023-04-19T13:56:44.000+03:00
Fixes: #1381 Signed-off-by: Tuomas Katila <tuomas.katila@intel.com>
diff --git a/cmd/gpu_plugin/README.md b/cmd/gpu_plugin/README.md
@@ -4,6 +4,7 @@ Table of Contents
 
 * [Introduction](#introduction)
 * [Modes and Configuration Options](#modes-and-configuration-options)
+* [Use Cases for Different Modes](#use-cases-for-different-modes)
 * [Installation](#installation)
     * [Prerequisites](#prerequisites)
         * [Drivers for discrete GPUs](#drivers-for-discrete-gpus)
@@ -48,11 +49,21 @@ backend libraries can offload compute operations to GPU.
 | -enable-monitoring | - | disabled | Enable 'i915_monitoring' resource that provides access to all Intel GPU devices on the node |
 | -resource-manager | - | disabled | Enable fractional resource management, [see also dependencies](#fractional-resources) |
 | -shared-dev-num | int | 1 | Number of containers that can share the same GPU device |
-| -allocation-policy | string | none | 3 possible values: balanced, packed, none. It is meaningful when shared-dev-num > 1, balanced mode is suitable for workload balance among GPU devices, packed mode is suitable for making full use of each GPU device, none mode is the default. Allocation policy does not have effect when resource manager is enabled. |
+| -allocation-policy | string | none | 3 possible values: balanced, packed, none. For shared-dev-num > 1: balanced mode spreads workloads among GPU devices, packed mode fills one GPU fully before moving to next, and none selects first available device from kubelet. None mode is the default. Allocation policy does not have effect when resource manager is enabled. |
 
 The plugin also accepts a number of other arguments (common to all plugins) related to logging.
 Please use the -h option to see the complete list of logging related options.
 
+## Use Cases for Different Modes
+
+Intel GPU-plugin supports a few different operation modes. Depending on the workloads the cluster is running, some modes make less sense than others. Below is a table that explains the differences between the modes and suggests workload types for each mode. The mode selection requires pre-though as it is cluster wide.
+
+| Mode | Sharing | Intended workloads | Time critical |
+|:---- |:-------- |:------- |:------- |
+| shared-dev-num == 1 | No, 1 container per GPU | Workloads using all GPU capacity, e.g. AI training | Yes |
+| shared-dev-num > 1 | Yes, >1 containers per GPU | (Batch) workloads using only part of GPU resources, e.g. inference, media transcode/analytics | No |
+| shared-dev-num > 1 && resource-management | Yes and no, 1>= containers per GPU | Any. For best results, all workloads should declare their expected GPU resource usage (memory, millicores). Requires [GAS](https://github.com/intel/platform-aware-scheduling/tree/master/gpu-aware-scheduling). See also [fractional use](#fractional-resources-details). | Depends on the requested GPU resources |
+
 ## Installation
 
 The following sections detail how to obtain, build, deploy and test the GPU device plugin.
@@ -315,7 +326,6 @@ The GPU plugin functionality can be verified by deploying an [OpenCL image](../.
       Warning  FailedScheduling  <unknown>  default-scheduler  0/1 nodes are available: 1 Insufficient gpu.intel.com/i915.
     ```
 
-
 ## Issues with media workloads on multi-GPU setups
 
 Unlike with 3D & compute, and OneVPL media API, QSV (MediaSDK) & VA-API