vkpreempt

A Vulkan 1.2 application to experiment with preemption and core isolation.

To ensure a smooth user experience in multi-tasking systems where multiple applications and services share the same GPU resources, lower-priority GPU work sometimes has to be preempted by more crucial tasks like compositor work. While preemption is designed to prioritize critical tasks, a lack of efficient preemption can lead to significant delays, preventing high-priority workloads from executing in a timely manner. vkpreempt is designed to experiment with and understand GPU preemption and its effects as well as to explore the pros and cons of reserving GPU resources for higher-priority tasks.

Sample Application

vkpreempt runs graphics or compute GPU tasks at regular intervals. Each iteration of GPU work is aligned to multiples of the interval since the system clock's epoch such that two concurrent executions with the same arguments schedule their workloads to run at roughly the same time.

Dependencies

The following dependencies are downloaded, built and linked automatically by CMake:

argparse (v3.2) for parsing command line arguments. Published under MIT license.
Perfetto (v51.2) for emitting GPU task submissions and executions as Perfetto trace events (optional). Only used if vkpreempt is built with the CMake option VKPREEMPT_ENABLE_PERFETTO_TRACES enabled. Published under Apache 2.0 license.
glfw (3.4) for rendering into a native window instead of running in headless mode (optional). Only used if vkpreempt is build witht the CMake options VKPREEMPT_ENABLE_SURFACE and VKPREEMPT_USE_NATIVE_WINDOW enabled. Published under Zlib license.

Build

cmake --preset default
cmake --build build

Run

The executable vkpreempt supports two subcommands to run graphics and compute tasks respectively.

Graphics

Runs a graphics task rendering multiple octaves of noise in a full-screen 2D grid with n cells.

The graphics work can be scaled on three orthogonal dimensions:

Vertex shader invocations The sample renders a full-screen grid of n x n cells. The number of vertex shader invocations is increased by increasing the number of cells per row n.
Fragment shader invocations The number of fragment shader invocations is controlled by the output resolution.
Fragment shader load Each fragment shader invocation computes m octaves of noise. To increase the workload, increase the number of octaves.

Frames are scheduled to be rendered at regular intervals. Each frame is aligned to multiples of the interval since the system clock's epoch such that two concurrent executions with the same arguments schedule frames to be rendered at the same time.

Usage: vkpreempt graphics [--help] [--version] [--width VAR] [--height VAR] [--cells VAR] [--loops VAR] [--interval VAR] [--offset VAR] [--global-priority] [--cpu VAR]

Optional arguments:
  -h, --help             shows help message and exits 
  -v, --version          prints version information and exits 
  -w, --width            the width of the window [nargs=0..1] [default: 800]
  -H, --height           the height of the window [nargs=0..1] [default: 600]
  -c, --cells            the number of grid cells per row and column (cells x cells) - increase to scale up the number of vertex shader invocations [nargs=0..1] [default: 16]
  -l, --loops            the number of loop iterations to execute in the fragment shader - increase to scale up the fragment workload [nargs=0..1] [default: 1]
  -i, --interval         the interval in milliseconds to schedule and align each frame with [nargs=0..1] [default: 16]
  -o, --offset           the offset in nanoseconds from the scheduling inverval [nargs=0..1] [default: 0]
  -g, --global-priority  if this flag is set, sample's GPU queue is created with the maximum system-wide global priority 
  --cpu                  the index of the CPU core to pin this sample to (ignored if not supported)

Compute

Runs a compute task operating on n elements.

The compute work can be scaled on two orthogonal dimensions:

Compute shader load Each compute shader invocation computes m octaves of noise. To increase the workload, increase the number of octaves.
Compute shader invocations The number of compute shader invocations is controlled by the element count and the workgroup size.

Compute tasks are scheduled to be executed at regular intervals. Each compute job is aligned to multiples of the interval since the system clock's epoch such that two concurrent executions with the same arguments schedule their jobs to be run at the same time.

Usage: vkpreempt compute [--help] [--version] [--num-elements VAR] [--workgroup-size VAR] [--loops VAR] [--interval VAR] [--offset VAR] [--global-priority] [--cpu VAR]

Optional arguments:
  -h, --help             shows help message and exits 
  -v, --version          prints version information and exits 
  -n, --num-elements     the number of elements to operate on - increase to scale up the number of compute shader invocations [nargs=0..1] [default: 480000]
  -w, --workgroup-size   the size of each workgroup [nargs=0..1] [default: 256]
  -l, --loops            the number of loop iterations to execute in the compute shader - increase to scale up the compute workload [nargs=0..1] [default: 1]
  -i, --interval         the interval in milliseconds to schedule and align each frame with [nargs=0..1] [default: 16]
  -o, --offset           the offset in nanoseconds from the scheduling inverval [nargs=0..1] [default: 0]
  -g, --global-priority  if this flag is set, sample's GPU queue is created with the maximum system-wide global priority
  --cpu                  the index of the CPU core to pin this sample to (ignored if not supported)

Preemption

For a GPU workload to preempt another one, it needs to have a higher priority than the other. The Vulkan way of achieving this is by submitting it to a queue that has a system-wide global priority that is higher than the queue the other workload was submitted to. The device extension to query the system-scoped queue priorities available on a physical device as well as creating a queue with a certain system-wide priority is called VK_KHR_global_priority (Note: this extension has been promoted to core in Vulkan 1.4). In the sample application, we simply choose the highest global priority available if the corresponding command line option is given.

Core Isolation on ARM Mali GPUs

Some ARM Mali GPUs support core isolation on a software level. The idea is to partition GPU cores or groups of cores to submit work to dedicated partitions. A possible use case of this feature is to reserve GPU resources for high-priority tasks while low-priority tasks can run on the remaining cores to avoid preemption delays.

Mesa's Panfrost driver exposes a way to configure a process with a particular core mask via driconf. With that the environment variables pan_fragment_core_mask and pan_compute_core_mask can be used to enable / disable GPU cores. While applications can not explicitly submit work to cores or groups of cores on an API level, this makes it possible to specify core partitions available to an application during its whole runtime.

Trace PanVK Core Isolation with Perfetto

We're using Perfetto traces to investigate effects of core isolation with PanVK. To get started with tracing on PanVK make yourself familiar with Mesa's guide.

To start tracing with GPU counters enabled ...

Start pps-producer (for GPU counters)
Start executable with MESA_GPU_TRACES=perfetto and core mask using pan_fragment_core_mask=<mask> and pan_compute_core_mask=<mask>
Start tracing

Valid Core Masks on a 4-core Mali-G610

In our experiments, we use a Radxa ROCK 5 Model B with a 4-core Mali G610. The allowed values for PanVK's core mask in this configuration are:

0x00001 (Core 0)
0x00004 (Core 2)
0x10000 (Core 16)
0x40000 (Core 18)
any combination of the above (e.g., 0x40005 for cores 1, 2, and 18)

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
cmake		cmake
src		src
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
CMakePresets.json		CMakePresets.json
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vkpreempt

Sample Application

Dependencies

Build

Run

Graphics

Compute

Preemption

Core Isolation on ARM Mali GPUs

Trace PanVK Core Isolation with Perfetto

Valid Core Masks on a 4-core Mali-G610

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Snapchat/vkpreempt

Folders and files

Latest commit

History

Repository files navigation

vkpreempt

Sample Application

Dependencies

Build

Run

Graphics

Compute

Preemption

Core Isolation on ARM Mali GPUs

Trace PanVK Core Isolation with Perfetto

Valid Core Masks on a 4-core Mali-G610

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages