Skip to content

Commit 8543566

Browse files
[SYCL][L0] Change the default to SYCL_PI_LEVEL_ZERO_USM_RESIDENT=2 (#9109)
Signed-off-by: Sergey V Maslov <[email protected]>
1 parent 52ea580 commit 8543566

File tree

3 files changed

+19
-19
lines changed

3 files changed

+19
-19
lines changed

sycl/doc/EnvironmentVariables.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ variables in production code.</span>
259259
| `SYCL_PI_LEVEL_ZERO_EXPOSE_CSLICE_IN_AFFINITY_PARTITIONING` (Deprecated) | Integer | When set to non-zero value exposes compute slices as sub-sub-devices in `sycl::info::partition_property::partition_by_affinity_domain` partitioning scheme. Default is zero meaning that they are only exposed when partitioning by `sycl::info::partition_property::ext_intel_partition_by_cslice`. This option is introduced for compatibility reasons and is immediately deprecated. New code must not rely on this behavior. Also note that even if sub-sub-device was created using `partition_by_affinity_domain` it would still be reported as created via partitioning by compute slices. |
260260
| `SYCL_PI_LEVEL_ZERO_COMMANDLISTS_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of command lists in a queue exceeds this threshold, an attempt is made to cleanup completed command lists for their subsequent reuse. The default is 20. |
261261
| `SYCL_PI_LEVEL_ZERO_IMMEDIATE_COMMANDLISTS_EVENT_CLEANUP_THRESHOLD` | Integer | If non-negative then the threshold is set to this value. If negative, the threshold is set to INT_MAX. Whenever the number of events associated with an immediate command list exceeds this threshold, a check is made for signaled events and these events are recycled. Setting this threshold low causes events to be checked more often, which could result in unneeded events being recycled sooner. However, more frequent event status checks may cost time. The default is 1000. |
262-
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Controls if/where to make USM allocations resident at the time of allocation. If set to 0 (default) then no special residency is forced. If set to 1 then allocation (device or shared) is made resident at the device of allocation. If set to 2 then allocation (device or shared) is made resident on all devices in the context of allocation that have P2P access to the device of allocation. For host allocation, any non-0 setting forces the allocation resident on all devices in the context. |
262+
| `SYCL_PI_LEVEL_ZERO_USM_RESIDENT` | Integer | Controls if/where to make USM allocations resident at the time of allocation. If set to 0 then no special residency is forced. If set to 1 then allocation (device or shared) is made resident at the device of allocation. If set to 2 then allocation (device or shared) is made resident on all devices in the context of allocation that have P2P access to the device of allocation. For host allocation, any non-0 setting forces the allocation resident on all devices in the context. Default is 2. |
263263
| `SYCL_PI_LEVEL_ZERO_USE_NATIVE_USM_MEMCPY2D` | Integer | When set to a positive value enables the use of Level Zero USM 2D memory copy operations. Default is 0. |
264264

265265
## Debugging variables for CUDA Plugin

sycl/plugins/level_zero/pi_level_zero.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7177,12 +7177,12 @@ pi_result piextGetDeviceFunctionPointer(pi_device Device, pi_program Program,
71777177
}
71787178

71797179
enum class USMAllocationForceResidencyType {
7180-
// [Default] Do not force memory residency at allocation time.
7180+
// Do not force memory residency at allocation time.
71817181
None = 0,
71827182
// Force memory resident on the device of allocation at allocation time.
71837183
// For host allocation force residency on all devices in a context.
71847184
Device = 1,
7185-
// Force memory resident on all devices in the context with P2P
7185+
// [Default] Force memory resident on all devices in the context with P2P
71867186
// access to the device of allocation.
71877187
// For host allocation force residency on all devices in a context.
71887188
P2PDevices = 2
@@ -7192,7 +7192,7 @@ enum class USMAllocationForceResidencyType {
71927192
static USMAllocationForceResidencyType USMAllocationForceResidency = [] {
71937193
const auto Str = std::getenv("SYCL_PI_LEVEL_ZERO_USM_RESIDENT");
71947194
if (!Str)
7195-
return USMAllocationForceResidencyType::None;
7195+
return USMAllocationForceResidencyType::P2PDevices;
71967196
switch (std::atoi(Str)) {
71977197
case 1:
71987198
return USMAllocationForceResidencyType::Device;

sycl/test-e2e/USM/usm_pooling.cpp

100755100644
Lines changed: 15 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -4,29 +4,29 @@
44
// Allocate 2 items of 2MB. Free 2. Allocate 3 more of 2MB.
55

66
// With no pooling: 1,2,3,4,5 allocs lead to ZE call.
7-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
8-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
9-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
7+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
8+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
9+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_DISABLE_USM_ALLOCATOR=1 %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-NOPOOL
1010

1111
// With pooling enabled and MaxPooolable=1MB: 1,2,3,4,5 allocs lead to ZE call.
12-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
13-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
14-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
12+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
13+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
14+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;1M,4,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-12345
1515

1616
// With pooling enabled and capacity=1: 1,2,4,5 allocs lead to ZE call.
17-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
18-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
19-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
17+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
18+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
19+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,1,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
2020

2121
// With pooling enabled and MaxPoolSize=2MB: 1,2,4,5 allocs lead to ZE call.
22-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
23-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
24-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
22+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
23+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
24+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";2M;2M,4,64K" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-1245
2525

2626
// With pooling enabled and SlabMinSize of 4 MB: 1,5 allocs lead to ZE call.
27-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
28-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
29-
// RUN: env ZE_DEBUG=1 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
27+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out h 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
28+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out d 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
29+
// RUN: env ZE_DEBUG=1 SYCL_PI_LEVEL_ZERO_USM_RESIDENT=0 %GPU_RUN_PLACEHOLDER SYCL_PI_LEVEL_ZERO_USM_ALLOCATOR=";;2M,4,4M" %t.out s 2> %t1.out; cat %t1.out %GPU_CHECK_PLACEHOLDER --check-prefix CHECK-15
3030
#include "CL/sycl.hpp"
3131
#include <iostream>
3232
using namespace sycl;

0 commit comments

Comments
 (0)