From af713ab7f7d1c191f757b01c12900b08cb9d5a27 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 30 Apr 2021 16:44:18 -0400 Subject: [PATCH 01/22] [SYCL][Doc] Design doc: optional kernel features First public review of the design document to conform to the SYCL 2020 "optional kernel features" behavior. This mostly covers SYCL 2020 section 5.7 "Optional kernel features", but it also covers the design for the `[[sycl::requires()]]` attribute. Since a SPIR-V extension is proposed as part of this design, this commit also includes a proposed extension to SPIR-V for enabling code conditionally based on specialization constants. The SPIR-V extension is more general than required for the "optional kernel features" design because it also includes functionality that will be required for the SYCL_EXT_ONEAPI_DEVICE_IF extension to DPC++. --- sycl/doc/OptionalDeviceFeatures.md | 502 +++++++++++++ .../SPIRV/SPV_INTEL_spec_conditional.asciidoc | 682 ++++++++++++++++++ 2 files changed, 1184 insertions(+) create mode 100644 sycl/doc/OptionalDeviceFeatures.md create mode 100644 sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md new file mode 100644 index 0000000000000..91a043b8c6b28 --- /dev/null +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -0,0 +1,502 @@ +# Behavior for optional kernel features + +This design document describes the changes that are needed in DPC++ in order to +conform to the SYCL 2020 specification regarding the behavior of applications +that use optional kernel features. An optional kernel feature is any feature +that is supported by some devices and not by others. For example, not all +devices support 16-bit floating point operations, so the `sycl::half` data type +is an optional kernel feature. Some DPC++ extensions like AMX are also +optional kernel features. + +The requirements for this design come mostly from the SYCL 2020 specification +[section 5.7 "Optional kernel features"][1] but they also encompass the C++ +attribute `[[sycl::requires()]]` that is described in [section 5.8.1 "Kernel +attributes"][2] and [section 5.8.2 "Device function attributes"][3]. + +[1]: +[2]: +[3]: + + +## Requirements + +There are several categories of requirements covered by this design: + +* The front-end compiler must issue a diagnostic in some cases when a kernel + uses an optional feature. However, the front-end compiler must **not** + generate a diagnostic in other cases. + +* The runtime must raise an exception when a kernel using optional features + is submitted to a device that does not support those features. This + exception must be raised synchronously from the kernel invocation command + (e.g. `parallel_for()`). + +* The runtime must not raise an exception (or otherwise fail) merely due to + speculative compilation of a kernel for a device, when the application does + not specifically submit the kernel to that device. + +### Diagnostics from the front-end compiler + +By "front-end compiler", we mean the DPC++ compiler which parses DPC++ source +code, not the JIT compiler that translates SPIR-V into native code. + +In general, the front-end compiler does not know which kernels the application +will submit to which devices. Therefore, the front-end compiler does not +generally know which optional features a kernel can legally use. Thus, in +general, the front-end compiler must not issue any diagnostic simply because a +kernel uses an optional feature. + +The only exception to this rule occurs when the application uses the C++ +attribute `[[sycl::requires()]]`. When the application decorates a kernel or +device function with this attribute, it is an assertion that the kernel or +device function is expected to use *only* the features listed by the attribute. +Therefore, the front-end compiler must issue a diagnostic if the kernel or +device function uses any other optional kernel features. + +Note that this behavior does not change when the compiler runs in AOT mode. +Even if the user specifies a target device via "-fsycl-targets", that does not +necessarily mean that the user expects all the code in the application to be +runnable on that device. Consider an application that uses some middleware +library, where the library's header contains kernels optimized for different +devices. An application should be able to compile in AOT mode with this +library without getting errors. Therefore the AOT compiler must not fail +simply because the middleware header contains device code for devices that +are not being compiled for. + +### Runtime exception if device doesn't support feature + +When the application submits a kernel to a device via one of the the kernel +invocation commands (e.g. `parallel_for()`), the runtime must check to see +if the kernel uses optional features that are not supported on that device. +If the kernel uses a feature that is not supported, the runtime must throw +a synchronous `errc::kernel_not_supported` exception. + +This exception must be thrown in the following circumstances: + +* A device function in the kernel's static call tree uses a feature that the + device does not support. However, this only applies to features that are + exposed via a C++ type or function. Examples of this include `sycl::half` or + instantiating `sycl::atomic_ref` for a 64-bit type. In cases where the + feature is more "notional", such as requiring a particular type of forward + progress guarantee, no exception is required. + +* The kernel or a device function in the kernel's static call tree is decorated + with `[[sycl::requires()]]`, and the device does not have the required + aspects. An exception must be thrown in this case even if the kernel does + not actually use a feature corresponding to the aspect. + +* The kernel is decorated with the `[[sycl::reqd_work_group_size(W)]]` or + `[[sycl::reqd_sub_group_size(S)]]` attribute, and the device does not support + the work group size `W` or the sub-group size `S`. + +Note that the exception must be thrown synchronously, not delayed and thrown on +the queue's asynchronous handler. + +### No runtime exception for speculative compilation + +It is currently common for the runtime to speculatively compile some kernels. +For example, DPC++ may bundle all kernels from the same translation unit +together into a single device image. When the application submits one kernel K +to a device D, the runtime actually compiles all kernels in K's device image +for device D. Let's assume in this example that the kernel K uses only +features that are supported by D. It would be illegal for the runtime to throw +an exception in such a case just because some other kernel in the same device +image uses a feature that is not supported by device D. + + +## Design to implement required diagnostics + +### Changes to DPC++ headers + +With the exception of the `[[sycl::reqd_work_group_size()]]` and +`[[sycl::reqd_sub_group_size()]]` attributes, all kernel optional features +are associated with some device aspect. For example, the `sycl::half` type +is an optional feature which is only supported on devices that have the +`aspect::fp16` aspect. We can therefore use device aspects as a way to +describe the set of optional features that a kernel uses (with the exception +of the required work-group or sub-group size). + +As will see later, it will be very helpful to decorate all APIs in DPC++ +headers that correspond to optional kernel features with the +`[[sycl::requires()]]` attribute. For example, the declaration of the +`sycl::half` type would look like this: + +``` +using half [[sycl::requires(has(aspect::fp16))]] = cl::sycl::detail::half_impl::half; +``` + +In cases where the optional feature corresponds to use of a class (e.g. +`sycl::atomic_ref`), the declaration can look like this: + +``` +template +class [[sycl::requires(has(aspect::fp64))]] atomic_ref { + /* ... */ +}; +``` + +(We can use partial specialization tricks to decorate `atomic_ref` with the +attribute only when the underlying type is 64-bits.) + +In cases where the optional feature corresponds to a function, we can decorate +the function's declaration with the attribute like so (demonstrating a +hypothetical AMX multiplication extension): + +``` +[[sycl::requires(has(aspect::ext_intel_amx))]] +void amx_multiply(); +``` + +These attributes provide an explicit mapping between each optional kernel +feature and its associated aspect. + +Unfortunately, the fundamental type `double` is also an optional kernel +feature. Since there is no type alias for `double`, there is no convenient +place to add an attribute. Instead, the front-end device compiler must behave +as though there was an implicit `[[sycl::requires(has(aspect::fp64))]]` +attribute for any device code that uses the `double` type. + +Note that the usage of `[[sycl::requires()]]` is slightly expanded here beyond +the specified usage in the SYCL 2020 specification because we allow the +attribute also on type alias declarations and class declarations. If a device +function does any of the following with a type alias or class that was so +decorated, the function is assumed to "use the aspects" listed in the +attribute: + +* Declares a variable of that type. +* Has a formal parameter declared with that type. +* Returns that type. + +This also includes any qualified version of the type. + +**TODO**: This language is not very precise. The intent is to include most +uses of the type, except for cases like `sizeof(T)` or `decltype(T)`. Help +appreciated on tightening the wording here. + +**TODO**: Would it be better to use a different attribute name when decorating +types, rather than expanding the meaning of `[[sycl::requires()]]`? If we did +this, the new attribute would become an internal DPC++ implementation detail; +we would not add it to the SYCL specification. + +### Implementing diagnostics in the DPC++ front-end + +As noted above, the front-end device compiler must behave as though there is an +implicit `[[sycl::requires(has(aspect::fp64))]]` attribute on any use of the +`double` type in device code. + +Aside from this, the front-end compiler can implement the required diagnostics +purely from the C++ attributes that have been added to the DPC++ headers. +There is no need for the front-end compiler to know which device features are +optional. + +When the front-end compiler sees a kernel or device function that is decorated +with `[[sycl::requires()]]`, it forms the set of allowed aspects for that +kernel or device function using aspects listed in the attribute. Let's call +this the `Allowed` set. The front-end then computes the static call tree of +that kernel or device function and forms the union of all aspects in any +`[[sycl::requires()]]` attributes that decorate any of these functions or any +of the types used inside these functions. Let's call this the `Used` set. If +the `Used` set contains any aspects not in the `Allowed` set, the front-end +issues a diagnostic. + +In order to be user-friendly, the diagnostic should point the user to the +location of the problem. Therefore, the diagnostic message should include the +following information: + +* The source position of the `[[sycl::requires()]]` attribute that decorates + the kernel or device function which provides the `Allowed` aspect set. This + tells the user which aspects the kernel or device function intends to use. + +* The source position of the call to a function that is decorated with + `[[sycl::requires()]]` or the source position of the use of a type that is + decorated with `[[sycl::requires()]]`. This tells the user where in the + kernel a particular aspect is actually used. + +Note that this analysis can be done in the front-end compiler even when a +kernel makes a call to a function that is in another translation unit. +Language rules require the application to declare such a function with +`SYCL_EXTERNAL` in the calling TU, and the `SYCL_EXTERNAL` declaration must be +decorated with the `[[sycl::requires()]]` attribute. Therefore, the front-end +can diagnose errors with aspect usage even without seeing the definition of the +`SYCL_EXTERNAL` function. + + +## Design to raise required exceptions (and avoid forbidden errors) + +As described above the runtime must raise an `errc::kernel_not_supported` +exception when a kernel is submitted to a device that does not support the +optional features that the kernel uses. Likewise, the runtime must **not** +raise an exception (or otherwise produce an error) due to speculative +compilation of a kernel for a device, unless the application actually submits +the kernel to that device. The solution is largely the same for both JIT and +AOT cases. + +### JIT case + +The JIT case requires some change to the way kernels are bundled together into +device images. Currently, kernels are bundled together regardless of the +features they use, and this can lead to JIT errors due to speculative +compilation. Consider a device image that contains two kernels: `K1` uses no +optional features and `K2` uses an optional feature that corresponds to aspect +`A`. Now consider that the application submits kernel `K1` to a device that +does not support aspect `A`. Since the two kernels are bundled together into +one device image, the runtime really compiles both kernels for the device. +Currently, this will raise a JIT exception because the compilation of kernel +`K2` will fail when compiled for a device that does not support aspect `A`. + +There are two ways to solve this problem. One is to change the way kernels are +bundled into device images such that we never bundled two kernels together +unless they required exactly the same set of device aspects. Doing this would +avoid the error described above. However, we have elected for a different +solution. + +Instead, we will allow kernels to be bundled together as they currently are, +but we will introduce extra decorations into the generated SPIR-V that allow +the JIT compiler to discard kernels which require aspects that the device does +not support. Although this solution requires an extension to SPIR-V, we think +it is the better direction because it is aligned with the [device-if][4] +feature, which will also requires this same SPIR-V extension. + +[4]: + +The idea is to emit a SPIR-V specialization constant for each aspect that is +required by a kernel in the device image. We then introduce a new SPIR-V +"decoration" that tells the JIT compiler to discard a function if a +specialization constant is `False`. The DPC++ runtime will set the values of +the specialization constants according to the target device, thus the JIT +compiler discards (and does not compile) any kernels that use features which +are not supported on that device. This avoids errors due to speculative +compilation of kernels. + +#### Representation in SPIR-V + +To illustrate how kernels using optional features are represented in SPIR-V, +consider a kernel `K` that requires aspects `A1` and `A2`. The SPIR-V module +will contain three boolean specialization constants: one representing `A1`, one +representing `A2`, and one representing the expression `A1 && A2`. All of +these can be represented without any extension to SPIR-V. + +``` +OpDecorate %11 SpecId 1 ; External ID for spec const A1 +OpDecorate %12 SpecId 2 ; External ID for spec const A2 + +%10 = OpTypeBool +%11 = OpSpecConstantTrue %10 ; Represents A1 +%12 = OpSpecConstantTrue %10 ; Represents A2 +%13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 +``` + +In order to make it easy for the JIT compiler to discard all functions in a +kernel, each function in the kernel's static call tree (including the function +representing the kernel's entry point) is decorated with a new extended SPIR-V +decoration `ConditionalINTEL` whose operand is the `` of the specialization +constant representing `A1 && A2`. The semantic of this decoration is that the +JIT compiler must discard the function unless the value of the specialization +constant is `True`. Augmenting the example from above: + +``` +OpDecorate %11 SpecId 1 ; External ID for spec const A1 +OpDecorate %12 SpecId 2 ; External ID for spec const A2 +OpDecorate %16 ConditionalINTEL %13 ; Says to discard the function + ; below when (A1 && A2) is False +%10 = OpTypeBool +%11 = OpSpecConstantTrue %10 ; Represents A1 +%12 = OpSpecConstantTrue %10 ; Represents A2 +%13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 +%14 = OpTypeVoid +%15 = OpTypeFunction %14 + +%16 = OpFunction %14 None %15 ; Definition of function that is +... ; discarded when (A1 && A2) is False +OpFunctionEnd +``` + +See the extension specification of [SpecConditional][5] for a full +description of this new SPIR-V decoration. + +[5]: + +#### Representation in LLVM IR + +**TODO**: I need some help here on how to represent the `[[sycl::requires()]]` +attributes in LLVM IR. I suspect there is already some mechanism for +representing SYCL attributes in LLVM IR, so hopefully we can mostly reuse that +mechanism. + +#### Modifications to the post-link tool + +The post-link tool must be modified to add the SPIR-V `ConditionalINTEL` +decorations to the appropriate functions and to emit the specialization +constants that these decorations reference. This can be done with two passes +over each kernel's static call tree. + +The first pass operates only on kernel functions that are not decorated with +the `[[sycl::requires()]]` attribute. When the kernel is decorated with this +attribute, the attribute tells the full set of aspects that the kernel uses +(and the front-end compiler has already validated this). For kernels without +the attribute, the pass propagates the required aspects from +`[[sycl::requires()]]` attributes in a kernel's call tree up to the kernel +function, forming a union of all required aspects for the kernel. + +Once we have the full set of aspects used by each kernel, we do the following +for each kernel: + +* For each of the kernel's required aspects, emit an `OpSpecConstantTrue` op to + represent this requirement. We maintain a set of "required specialization + constants" for each kernel, which is used later. Add this specialization + constant to that set. In addition, add an "aspect" entry to the device + image's "SYCL/kernel reqs" property set, as described below. (We could + instead emit `OpSpecConstantFalse`. It doesn't matter because the runtime + will always provide a value for these specialization constants.) + +* If the kernel function is decorated with the `[[reqd_work_group_size()]]` + attribute, emit an `OpSpecConstantTrue` op to represent this requirement and + add this also to the kernel's set of required specialization constants. In + addition, add a "reqd\_work\_group\_size" entry to the device image's + "SYCL/kernel reqs" property set. + +* If the kernel function is decorated with the `[[reqd_sub_group_size()]]` + attribute, emit an `OpSpecConstantTrue` op to represent this requirement and + add this also to the kernel's set of required specialization constants. In + addition, add a "reqd\_sub\_group\_size" entry to the device image's + "SYCL/kernel reqs" property set. + +* If the kernel's set of required specialization constants is not empty, emit a + series of `OpSpecConstantOp` ops with the `OpLogicalAnd` opcode to compute + the expression `S1 && S2 && ...`, where `S1`, `S2`, etc. are the + specialization constants in that set. In addition, emit a + `ConditionalINTEL` decoration for the kernel's entry function which + references the `S1 && S2 && ...` specialization constant. + +The second pass propagates each kernel's required specialization constants back +down the static call tree. This pass starts such that each kernel entry +function has the set of required specialization constants as computed above. +The set of required specialization constants for each remaining function `F` is +computed as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions +of `F` in the static call tree. (Obviously, a `Pn` term can be omitted if the +parent function has no required specialization constants.) Once we have this +information, we do the following for each function `F` that has a non-empty set +of required specialization constants: + +* Emit a series of `OpSpecConstantOp` ops with the `OpLogicalAnd` and + `OpLogicalOr` opcodes to compute the expression `P1 || P2 || ...` described + above. + +* Emit a `ConditionalINTEL` decoration for the function, referencing this + computed specialization constant. + +In all cases above, we should keep track of the specialization constants that +are emitted and reuse them when possible, rather than emitting duplicates. + +#### New device image property set + +A new device image property set is needed to inform the DPC++ runtime of the +aspects that each kernel requires and the work-group or sub-group sizes it may +require. This property set is named "SYCL/kernel reqs". The name of each +property in the set is the name of a kernel in the device image. The value +of each property has the following form: + +``` +[entry_count (uint32)] +[entry_type (uint32)] +[entry_type (uint32)] +... +[entry_type (uint32)] +``` + +Where `entry_count` tells the number of subsequent entries. Each entry has a +variable number of parameters according to its type. The allowable types are: + +``` +enum { + aspect, + reqd_work_group_size, + reqd_sub_group_size +}; +``` + +The format of each entry type is as follows: + +``` +[aspect (uint32)] [aspect_id (uint32)] [spec_id (uint32)] +[reqd_work_group_size (uint32)] [dim_count (uint32)] [dim0 (uint32)] ... [spec_id (uint32)] +[reqd_sub_group_size (uint32)] [dim (uint32)] [spec_id (uint32)] +``` + +Where the parameter names have the following meaning: + +Parameter | Definition +--------- | ---------- +`aspect_id` | The value of the aspect from the `enum class aspect` enumeration. +`dim_count` | The number of work group dimensions (1, 2, or 3). +`dim0` ... | The value of a dimension from the `[[reqd_work_group_size]]` attribute. +`dim` | The value of the sub-group size from the `[[reqd_sub_group_size]]` attribute. +`spec_id` | The SPIR-V `SpecId` decoration for the specialization constant that the post-link tool generated for this requirement. + +Note that the post-link tool will generate a series of `OpSpecConstantOp` ops +when the kernel has multiple requirements. However, each property list entry +contains only the `SpecId` of the `OpSpecConstantTrue` op that is associated +with a single requirement. + +#### Modifications to the DPC++ runtime + +Modifications are also required to the DPC++ runtime in order to set the values +of the specialization constants that correspond to each kernel requirement. In +addition, the runtime needs to check if the target device supports each of the +kernel's requirements, and it must raise an `errc::kernel_not_supported` +exception if it does not. + +When a kernel is submitted to a device, the runtime finds the device image that +contains the kernel and also finds the kernel's entry in the "SYCL/kernel reqs" +property set. This entry tells the set of requirements for the kernel. If the +target device does not support all of these requirements, then the runtime +raises `errc::kernel_not_supported`. This check can be done before the device +image is JIT compiled, so the exception can be thrown synchronously. + +Assuming this check passes, the first attempt to submit a kernel from a device +image will cause it to be JIT compiled. The runtime must be modified to do the +following: + +* Compute the union of all requirements from all kernels in the + "SYCL/kernel reqs" property set and their associated specialization + constants. + +* Query the target device to see whether it supports each of these + requirements, yielding either `True` or `False` for each one. + +* Set the value of each associated specialization constant when JIT compiling + the device image for this target device. + +Note that the runtime's cache of compiled device images does not need any +special modification because the cache already needs to know the values of all +the specialization constants that were used to compile the device image. We +just need to make sure the cache is also aware of the specialization constants +which correspond to the kernels' requirements. + +#### Modifications to the GEN compiler + +The GEN compiler, of course, needs to be modified to implement the new +`ConditionalINTEL` SPIR-V decoration. It must discard any function with this +decoration (unless the corresponding specialization constant is `True`), and it +must not raise any sort of error due to compilation of these discarded +functions. + +### AOT case + +The AOT case uses exactly the same solution as the JIT case described above, +but there is one extra steps. For the AOT case, the post-link tool must set +the values of the specialization constants that correspond to the kernel +requirements, using the device named in the "-fsycl-targets" command line +option. After doing this, the post-link tool calls the AOT compiler to +generate native code from SPIR-V as it normally does. If more than one target +device is specified, the post-link tool sets the specialization constants +separately for each device before generating native code for that device. + +Note that the native device image may not contain all kernels if there are +kernels that use optional features. Nevertheless, the "SYCL/kernel reqs" +property set still has entries for all kernel functions. If the application +attempts to invoke one of the discarded kernels on a device (which does not +support the kernel's features), the runtime will see that the kernel is not +supported by using information from the "SYCL/kernel reqs" property set, and +the runtime will raise an exception. Thus, the runtime will never attempt to +invoke one of these discarded kernels. diff --git a/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc b/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc new file mode 100644 index 0000000000000..6029601d69d3a --- /dev/null +++ b/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc @@ -0,0 +1,682 @@ += SPV_INTEL_spec_conditional + +== Name Strings + +SPV_INTEL_spec_conditional + +== Contact + +To report problems with this extension, please open a new issue at: + +https://github.com/intel/llvm/issues + +== Contributors + +- Greg Lueck, Intel + +== Notice + +Copyright (c) 2021 Intel Corporation. All rights reserved. + +== Status + +Draft + +== Version + +[width="40%",cols="25,25"] +|======================================== +| Last Modified Date | 2021-05-18 +| Revision | 1 +|======================================== + +== Dependencies + +This extension is written against the SPIR-V Specification, +Version 1.5 Revision 5. + +This extension requires SPIR-V 1.0. + + +== Overview + +This extension introduces a way to have conditional branches in code, where the +condition is the value of a specialization constant. Since the condition will +be know at the time when SPIR-V is compiled into native code, the conditional +code is guaranteed to be removed if the condition is false, conceptually +similar to the C++ `if constexpr` statement. + + +== Extension Name + +To use this extension within a SPIR-V module, the following *OpExtension* must +be present in the module: + +---- +OpExtension "SPV_INTEL_spec_conditional" +---- + + +== Motivation + +The primary motivation for this extension is to support languages for offload +compute such as SYCL. However, we feel this extension could also support +similar use cases for shaders and other tools that use SPIR-V. + +=== Conditional code within a kernel + +Offload compute languages have a need to write kernels that have conditional +code based on the value of a specialization constant. The syntax could vary +from one language to another, but one hypothetical syntax might look like this: + +``` +void fancy() { + /* use features specific to this fancy GPU */ +} + +void fallback() { + /* use generic features */ +} + +void foo() { + specconstexpr bool isFancyGpu = /* get value of specialization constant */; + if specconstexpr (isFancyGpu) { + fancy(); + } + else { + fallback(); + } + specconstexpr int subGroupSize = /* get value of specialization constant */; + if speconstexpr (subGroupSize == 8) { + /* algorithm specific to device with sub group size of 8 */ + } +} +``` + +In this example, the offload kernel has two `if` statements that do something +conditionally based on the features that the target device provides. Since +these device features may not correspond to SPIR-V "capabilities", it's more +flexible to use specialization constants for the conditions rather than relying +on some extension to SPIR-V capabilities. The host runtime has greater +knowledge of the device features, and it can set the values of specialization +constants accordingly. + +It is important that the compiler consuming the SPIR-V is guaranteed to remove +the conditional code (in the case when the condition is false) because that +code may call intrinsic functions or make use of SPIR-V capabilities that are +not available on the target device. If the code was not removed, the SPIR-V +client compiler might fail to compile the code even if the control flow of the +kernel ensures it is never executed. + +=== Entire kernels that are conditional + +There are also cases when a SPIR-V module may contain entire kernels that use +features that are specific to certain devices. When such a module is compiled +for a device that does _not_ support these features, we need a way to exclude +these kernels from the compilation. Obviously, these kernels could not be run +on a device that does not support them, but the need to remove them prior to +compilation goes beyond the desire to optimize the compilation time. Rather, +we need to ensure that the compilation process doesn't fail while attempting to +compile a kernel for a device that does not support its features. + +The following code snippet illustrates a hypothetical, scenario: + +``` +void kernel1() { + /* uses generic features */ +} + +[[conditional(fancy)]] +void kernel2() { + /* uses features available only on "fancy" devices */ +} +``` + +In this hypothetical example, the kernel `kernel2()` is decorated with a C++ +attribute that associates the kernel with the specialization constant `fancy`. +The host runtime can now control whether this this kernel is compiled into the +module by setting the value of that specialization constant before compiling +the SPIR-V. + +One may ask why we cannot solve this problem instead by creating two modules: +one with `kernel1()` and the other with `kernel2()`. This is a fair criticism, +since this would also solve the problem without any extension to SPIR-V. +However, a SPIR-V extension that solves the first motivating example +(conditional code within a kernel) also provides almost everything we need for +this case too. It is more convenient (and less engineering effort) to use the +same solution for both cases. + + +== High level description + +As a general strategy, this extension adds new instructions that represent the +`if specconstexpr` statements in the hypothetical code snippets above as SPIR-V +control flow instructions, rather than as `#ifdef` like instructions. We feel +this strategy makes it easier to validate SPIR-V modules that use this +extension. At the same time, the new instructions have been designed such that +a tool can easily specialize (or partially specialize) a module with a simple +algorithm that replaces the extended instructions with normal SPIR-V control +flow instructions. Such a tool need not understand the control flow graph of +the module. + +=== Branching on specialization constants + +We add three new instructions to represent control flow that is conditioned on +a specialization constant: *OpBranchSpecConstantINTEL*, +*OpBranchSpecConstantWithElseINTEL*, and *OpPhiSpecConstantINTEL*. The first +two are similar to *OpBranchConditional* except that the condition is the +__ of a specialization constant. They also identify a range of control +flow blocks that must be removed when the condition is false (or that must be +removed when the condition is true for *OpBranchSpecConstantWithElseINTEL*). + +The *OpPhiSpecConstantINTEL* instruction is similar to *OpPhi*, except that it +is used when at least one of the merged values flows from an +*OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* condition. + +Like *OpPhi*, *OpPhiSpecConstantINTEL* has a set of parameters for every parent +block. Each parent has three parameters: the __ of the parent block, the +__ of a variable that is defined when control flows from that parent block, +and an __ of a specialization constant that provides a condition that gates +the merged value. The first two __ parameters have the same meaning as a +regular *OpPhi*. The value of the condition parameter depends on whether the +parent comes from an *OpBranchSpecConstantINTEL* or +*OpBranchSpecConstantWithElseINTEL* condition: + +* If the parent is a block contained by the "then" range of + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, the + condition parameter is the same specialization constant __ as the + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* + instruction. + +* If the parent is the _False Label_ in *OpBranchSpecConstantINTEL*, then the + condition parameter is the __ of a specialization constant that is the + logical negation of the specialization constant used by + *OpBranchSpecConstantINTEL*. + +* If the parent is a block contained by the "else" range of + *OpBranchSpecConstantWithElseINTEL*, then the condition parameter is the + __ of a specialization constant that is the logical negation of the + specialization constant used by *OpBranchSpecConstantWithElseINTEL*. + +* Otherwise, the condition parameter's value is zero. (The value zero is not a + legal __, so the value zero indicates that there is no associated + specialization constant for this parent.) + +This condition parameter allows tools to specialize SPIR-V more efficiently. + +A tool that specializes SPIR-V can do so by looking at each of these +instructions individually, without considering their context in the control +flow graph. Occurrences of *OpBranchSpecConstantINTEL* and +*OpBranchSpecConstantWithElseINTEL* are replaced with *OpBranch* to either the +_True Label_ or the _False Label_, according to the value of the specialization +constant. The specializing tool is also responsible for removing either the +"then" range of blocks or the "else" range of blocks that are associated with +the *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* +instruction. These ranges are easy to identify because they are guaranteed to +be contiguous in the SPIR-V module and the *OpBranchSpecConstantINTEL* / +*OpBranchSpecConstantWithElseINTEL* instructions provide pointers to the +ranges. + +Occurrences of *OpPhiSpecConstantINTEL* are replaced with regular *OpPhi*. +When constructing the parent parameters to *OpPhi*, the specialization tool +uses the parent's condition parameter: + +* If the condition parameter's value is zero, this parent has no associated + specialization constant and the parent's other two parameters are retained + in the *OpPhi*. Otherwise, the parent is associated with a specialization + constant: + +* If that specialization constant's value is *true*, the parent's other two + parameters are retained in the *OpPhi*. + +* If that specialization constant's value is *false*, the parent is omitted + from the *OpPhi*. + +The following example demonstrates this process. Consider this unspecialized +SPIR-V: + +``` + %int = OpTypeInt 32 0 + %bool = OpTypeBool + %spec = OpSpecConstantTrue %bool +%notspec = OpSpecConstantOp %bool LogicalNot %spec + + ... + + %top = OpLabel + %10 = OpIAdd %int .... + OpBranchSpecConstantINTEL %spec %true %true %false + %true = OpLabel + %11 = OpIAdd %int .... + OpBranch %false + %false = OpLabel + %12 = OpPhiSpecConstantINTEL %int %notspec %10 %top %spec %11 %true + + ... +``` + +Specializing this code such that *%spec* is *false* yields: + +``` + %int = OpTypeInt 32 0 + %bool = OpTypeBool + %spec = OpConstantFalse %bool +%notspec = OpConstantTrue %bool + + ... + + %top = OpLabel + %10 = OpIAdd %int .... + OpBranch %false + %false = OpLabel + %12 = OpPhi %int %10 %top + + ... +``` + +Note that the process of specialization can sometimes lead to blocks that have +only one parent and *OpPhi* instructions that have only one parent, as shown in +the example above. Normal optimizations in tools that consume SPIR-V can +optimize these cases, but such optimizations are not necessary for the +correctness of the specialized code. + +=== Conditional capabilities, functions, types, etc. + +Since the specialization process will remove blocks from the control flow graph +in some cases, it may be desirable to also remove functions that are called +only from these blocks. Likewise, it may be desirable to remove variables, +types, or constants that are used only in these blocks. This may be necessary +for correctness, for example, if a function that is called only from the +removed blocks uses device features or SPIR-V capabilities that are unavailable +on the device. (See the `fancy()` function in the motivation section for an +example of this.) + +To support this case, the extension adds a new *OpConditionalCapabilityINTEL* +instruction and a new *ConditionalINTEL* decoration. If a tool that generates +SPIR-V wants to guarantee that a function, variable, type, or constant is +removed when a specialization constant has a certain value, it must use the +*ConditionalINTEL* decoration to do this. Specializing tools have no +requirement to automatically find and remove these instructions, even if the +only reference is from blocks that the specializing tool removes. Likewise, if +a tool that generates SPIR-V wants to express that a capability is only +required when a specialization constant has a certain value, it must use the +*OpConditionalCapabilityINTEL* instruction. + +The *OpConditionalCapabilityINTEL* instruction is like *OpCapability* except it +has an __ parameter which references a specialization constant. This +instruction adds a requirement for the capability only if that specialization +constant's value is *true*. + +The *ConditionalINTEL* decoration instruction takes an __ operand that +references a specialization constant. The decorated instruction will be +removed during specialization if that specialization constant's value is +*false*. + +The specialization process is very straightforward for +*OpConditionalCapabilityINTEL*. This instruction is either removed or replaced +with *OpCapability* depending on the value of the specialization constant. + +When specializing an instruction that is decorated with *ConditionalINTEL* the +*ConditionalINTEL* decoration itself is always removed. In addition, the +following happens if the specialization constant is *false*: + +* If the decorated instruction is *OpFunction*, the function and all of its + instructions are removed. All decorations for the function and its + instructions are removed. If the *OpFunction* has an associated + *OpEntryPoint*, that is also removed. Any *OpName* or *OpMemberName* that + references the *OpFunction* or any of its instructions are also removed. + +* Otherwise, the decorated instruction is removed, all decorations for the + instruction are removed, and any *OpName* or *OpMemberName* referencing the + instruction are removed. + +Tools that generate SPIR-V are responsible for ensuring that the +*ConditionalINTEL* decoration is used such that an instruction that defines an +SSA __ is never removed unless all the references to that SSA __ are +also removed regardless of the values assigned to the specialization +constants. + +Since *ConditionalINTEL* may be applied to an instruction that defines a +specialization constant, there is the possibility of ambiguity. What if +specialization constant `A` is decorated with *ConditionalINTEL*, but `A` is +also used as the _Condition_ for *OpConditionalCapabilityINTEL*, +*OpBranchSpecConstantINTEL*, *OpBranchSpecConstantWithElseINTEL*, +*OpPhiSpecConstantINTEL*, or as the _Condition_ for another *ConditionalINTEL* +decoration? We avoid these ambiguities by making this situation illegal. +If a specialization constant __ is decorated with *ConditionalINTEL*, it +may not be used as a _Condition_ for any of these instructions or for the +_Condition_ in a *ConditionalINTEL* decoration. + +=== Validation + +In order to validate a module that uses this extension, we first apply the +normal validation rules assuming that either branch of +*OpBranchSpecConstantINTEL*, or *OpBranchSpecConstantWithElseINTEL* could be +taken at runtime. This essentially means that we treat these instructions as +though they were *OpBranchConditional*, we treat *OpPhiSpecConstantINTEL* as +though it was *OpPhi*, and we treat *OpConditionalCapabilityINTEL* as though it +was *OpCapability*. We then apply some additional validation rules to ensure +that the extension's instructions and decorations are used in a way that +results in consistent code. + +These additional validation rules start by computing a specialization constant +expression `G(i)` that gates usage of each instruction `i`. The value of +`G(i)` is computed with the following rules + +* Start with `G(i) = true`. + +* If the instruction resides in a "then" range of *OpBranchSpecConstantINTEL* + or *OpBranchSpecConstantWithElseINTEL*, let `G(i) = G(i) && S` where `S` is + the specialization constant referenced by *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL*. + +* If the instruction resides in an "else" range of + *OpBranchSpecConstantWithElseINTEL*, let `G(i) = G(i) && !S` where `S` is the + specialization constant referenced by *OpBranchSpecConstantWithElseINTEL*. + +* If the instruction resides in an *OpFunction* that is decorated with + *ConditionalINTEL*, let `G(i) = G(i) && S` where `S` is the specialization + constant referenced by the *ConditionalINTEL* decoration. + +* If the instruction itself is decorated with *ConditionalINTEL*, let + `G(i) = G(i) && S` where `S` is the specialization constant referenced by the + *ConditionalINTEL* decoration. + +We then apply the following validation rules: + +* If a module requires a capability `C` that can be statically checked, and if + that capability is required only through *OpConditionalCapabilityINTEL* + instructions, we compute the specialization constant expression `G(c)` that + is the logical "or" of the specialization constants used by each of these + *OpConditionalCapabilityINTEL* instructions. We then scan through the code + looking for instructions that use capability `C`. For each such instruction + `i`, validate that `G(i)` can never be true unless `G(c)` is also true. + +* For each instruction `idef` that defines an SSA __, search for all other + instructions `iuse` that use __. Validate that each `G(iuse)` can never + be true unless `G(idef)` is also true. + +* For each block that is contained by the "then" range of + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, verify + that each parent block is also contained by that same "then" range (allowing, + of course, that the *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL* instruction is a parent of the first + block in that range). + +* For each block that is contained by the "else" range of + *OpBranchSpecConstantWithElseINTEL*, verify that each parent block is also + contained by that same "else" range (allowing, of course, that the + *OpBranchSpecConstantWithElseINTEL* instruction is a parent of the first + block in that range). + +* For each (_Condition i_, _Variable i_, _Parent i_) triplet of + *OpPhiSpecConstantINTEL*: + - If _Parent i_ resides in the "then" range of *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL*, verify that _Condition i_ is the same + specialization constant __ as the *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL* instruction. + + - If _Parent i_ is the _False Label_ of *OpBranchSpecConstantINTEL* or + _Parent i_ resides in the "else" range of + *OpBranchSpecConstantWithElseINTEL*, verify that _Condition i_ is the + __ of a specialization constant that is the logical negation of the + specialization constant __ used by the *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL* instruction. + + - Otherwise, verify that _Condition i_ has the value zero. + + - If _Variable i_ is defined by a block in the "then" range of + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, or if + _Variable i_ is defined by a block in the "else" range of + *OpBranchSpecConstantWithElseINTEL*, verify that _Parent i_ is contained by + that same "then" or "else" range. + +* For each specialization constant (*OpSpecConstantXXX*) that is decorated with + *ConditionalINTEL*, verify that the specialization constant's __ is not + also used as a _Condition_ for *OpConditionalCapabilityINTEL*, + *OpBranchSpecConstantINTEL*, *OpBranchSpecConstantWithElseINTEL*, + *OpPhiSpecConstantINTEL*, or *ConditionalINTEL*. + +*TODO*: I'm looking for feedback on the validation rules that involve +comparison of abstract specialization constants such as "validate that `G(i)` +can never be true unless `G(c)` is also true". These validations are likely +very difficult in the general case, but most cases will be very easy. (In most +cases, I think that `G(i)` and `G(c)` will both be so simple that it will be +easy to prove whether one implies the other.) How should the spec handle this? +Are the validation rules _requirements_ that a validation tool must perform, or +are they just rules about what SPIR-V is not valid? If they are not +requirements, then we could leave the wording as I have it and then each +validation tool would be free to implement the validation checks as thoroughly +as it wants. + + +== New tokens defined by this extension + +=== New capabilities + +The module must declare that it uses the following capability in order to use +any of the decorations or instructions defined in the sections below. If +declared, this capability must be unconditionally declared via *OpCapability*. +It may not be conditionally declared via *OpConditionalCapabilityINTEL*. + +[cols="1,15,5",options="header",width="100%"] +|=== +2+^| Capability | Implicitly Declares +| ???? +| *SpecConditionalINTEL* + +Module conditionally enables code based on the value of a specialization +constant. +| +|=== + +=== New decorations + +[cols="1,10,5,5",options="header",width="100%"] +|==== +2+^| Decoration | Extra Operands | Enabling Capabilities +| ???? +| *ConditionalINTEL* + +May be applied only to *OpFunction*, global (module scope) *OpVariable*, type +declarations (*OpTypeXXX*), or constant instructions (*OpConstantXXX* or +*OpSpecConstantXXX*). Indicates that the decorated instruction must be removed +if the value of the specialization constant identified by _Condition_ is +*false*. The _Condition_ must be a _Boolean type_ scalar. + +If the decorated instruction is *OpFunction*, the function and all of the +instructions it contains are removed when the specialization constant is +*false*. If the function has an associated *OpEntryPoint*, that is also +removed. +| __ _Condition_ +|*SpecConditionalINTEL* +|==== + +=== New instructions + +[cols="1,1,2*3",width="100%"] +|=== +3+|*OpConditionalCapabilityINTEL* + + + +Declare a capability that is conditionally used by this module, depending on +the value of a specialization constant. + +The _Capability_ is used by this module only if the specialization constant +identified by _Condition_ is *true*. The _Condition_ must be a _Boolean type_ +scalar. + +1+|Capability: + +*SpecConditionalINTEL* +| 3 +| ???? +| __ _Condition_ +| _Capability_ +|=== + +[cols="1,1,4*3",width="100%"] +|=== +5+|*OpBranchSpecConstantINTEL* + + + +If the specialization constant _Condition_ is *true*, branch to _True Label_, +otherwise branch to _False Label_. The _Condition_ must be a _Boolean type_ +scalar. + +The consecutive blocks from _True Label_ to _True End_ (inclusive) are called +the "then" range of this instruction. If the _Condition_ is *false*, this +range of blocks is removed from the module. The module need not declare any +capabilities used by these instructions if they are removed. + +No block in the "then" range may have a parent that is outside of that range, +except for the _True Label_ reference from this *OpBranchSpecConstantINTEL* +instruction. + +1+|Capability: + +*SpecConditionalINTEL* +| 5 +| ???? +| __ _Condition_ +| __ _True Label_ +| __ _True End_ +| __ _False Label_ +|=== + +[cols="1,1,5*3",width="100%"] +|=== +6+|*OpBranchSpecConstantWithElseINTEL* + + + +If the specialization constant _Condition_ is *true*, branch to _True Label_, +otherwise branch to _False Label_. The _Condition_ must be a _Boolean type_ +scalar. + +The consecutive blocks from _True Label_ to _True End_ (inclusive) are called +the "then" range of this instruction. The consecutive blocks from +_False Label_ to _False End_ (inclusive) are called the "else" range of this +instruction. If the _Condition_ is *false*, the "then" range is removed from +the module. If the _Condition_ is *true*, the "else" range is removed from the +module. The module need not declare any capabilities used by these +instructions if they are removed. + +No block in the "then" range may have a parent that is outside of that range, +except for the _True Label_ reference from this +*OpBranchSpecConstantWithElseINTEL* instruction. No block in the "else" range +may have a parent that is outside of that range, except for the _False Label_ +reference from this *OpBranchSpecConstantWithElseINTEL* instruction. + +1+|Capability: + +*SpecConditionalINTEL* +| 6 +| ???? +| __ _Condition_ +| __ _True Label_ +| __ _True End_ +| __ _False Label_ +| __ _False End_ +|=== + +[cols="1a,1,3*3",width="100%"] +|=== +4+|*OpPhiSpecConstantINTEL* + + + +The SSA phi function, when one or more of the merged values is conditionally +gated by a specialization constant. This instruction must be used instead of +*OpPhi* when any of the following are true: + +* At least one of the parent blocks is the _False Label_ of an + *OpBranchSpecConstantINTEL* instruction. + +* At least one of the parent blocks is contained by the "then" range of an + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* + instruction or is contained by the "else" range of an + *OpBranchSpecConstantWithElseINTEL* instruction. + +* At least one of the _Variable i_ is defined by a block that is contained by + the "then" range of an *OpBranchSpecConstantINTEL* or + *OpBranchSpecConstantWithElseINTEL* instruction or is defined by a block that + is contained by the "else" range of an *OpBranchSpecConstantWithElseINTEL* + instruction. + +The result is selected based on control flow: If control reached the current +block from _Parent i_, _Result Id_ gets the value that _Variable i_ had at the +end of _Parent i_. + +_Result Type_ can be any type. + +Operands are a sequence of triplets: (_Cond 1_, _Variable 1_, _Parent 1_ +block), (_Cond 2_, _Variable 2_, _Parent 2_ block), ... Each _Parent i_ block +is the label of an immediate predecessor in the CFG of the current block. +There must be exactly one _Parent i_ for each parent block of the current block +in the CFG. If _Parent i_ is reachable in the CFG and _Variable i_ is defined +in a block, that defining block must dominate _Parent i_. All Variables must +have a type matching _Result Type_. + +If _Variable i_ is defined by a block that is contained by the "then" or "else" +range of an *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* +instruction, then the associated _Parent i_ must be contained by that same +"then" or "else" range. + +Each _Cond i_ is the __ of a specialization constant that gates the +associated _Variable i_ definition. This parameter must be set as follows: + +* If the _Parent i_ is the _False Label_ of an *OpBranchSpecConstantINTEL* + instruction, _Cond i_ must be the logical negation of the specialization + constant used by that *OpBranchSpecConstantINTEL* instruction. + +* If the _Parent i_ is contained by the "then" range of an + *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* + instruction, _Cond i_ must be the same specialization __ used by that + instruction. + +* If the _Parent i_ is contained by the "else" range of an + *OpBranchSpecConstantWithElseINTEL* instruction, _Cond i_ must be the logical + negation of the specialization constant used by that + *OpBranchSpecConstantWithElseINTEL* instruction. + +* Otherwise, _Variable i_ is not gated by a specialization constant, and + _Cond i_ must have the value zero (which is not a legal value for any + __). + +Within a block, this instruction must appear before all other instructions +aside from *OpPhi*, other instances of *OpPhiSpecConstantINTEL*, *OpLine*, or +*OpNoLine*. + +1+|Capability: + +*SpecConditionalINTEL* +| 3 + variable +| ???? +| __ _Result Type_ +| _Result _ +| {__ \| _0_}, __, __, ... + + _Cond_, _Variable_, _Parent_, ... +|=== + + +== Modifications to the SPIR-V Specification + +*TODO*: Exact wording changes for the SPIR-V specification will be proposed +once there is agreement on the semantics of this extension. + + +== Issues + +1) The motivation section currently lists only the use cases that are important + to SYCL. However, there have been previous proposals for conditional code + in SPIR-V that were motivated by shader use cases. Should these use cases + also be listed in the motivation section? Some of the shader use cases + would require small additions to this extension, so if we add those + motivating use cases, we would also need to add a few more instructions to + this extension proposal (instructions that would not be useful for SYCL). + +2) Some of the validation rules listed above would be difficult to implement in + the general case. (See *TODO* comment above.) How should this be resolved? + +3) Need to assign real numbers to the new tokens, replacing the "????" + placeholders. + + +== Revision History + +[cols="5,15,15,70"] +[grid="rows"] +[options="header"] +|======================================== +|Rev|Date|Author|Changes +|1|2021-05-18|Greg Lueck|*First public draft* +|======================================== From b0f4332e553fcca50563e2de9efcf93dca0e659e Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 20 May 2021 15:21:58 -0400 Subject: [PATCH 02/22] Minor changes to SPIR-V extension --- .../SPIRV/SPV_INTEL_spec_conditional.asciidoc | 16 ++++++++++++++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc b/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc index 6029601d69d3a..ee68e6cdd0d77 100644 --- a/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc +++ b/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc @@ -393,8 +393,16 @@ We then apply the following validation rules: `i`, validate that `G(i)` can never be true unless `G(c)` is also true. * For each instruction `idef` that defines an SSA __, search for all other - instructions `iuse` that use __. Validate that each `G(iuse)` can never - be true unless `G(idef)` is also true. + instructions `i` that use __. Compute `G(iuse)` as: ++ +-- + - Let `G(iuse) = G(i)`. + - If instruction `i` is *OpPhiSpecConstantINTEL*, compute `G(iuse)` + separately for each _Parent i_ as `G(iuse) = G(i) && S` where `S` is the + specialization constant _Cond i_ associated with _Parent i_. +-- ++ +Validate that each `G(iuse)` can never be true unless `G(idef)` is also true. * For each block that is contained by the "then" range of *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, verify @@ -528,6 +536,8 @@ No block in the "then" range may have a parent that is outside of that range, except for the _True Label_ reference from this *OpBranchSpecConstantINTEL* instruction. +This instruction must be the last instruction in a block. + 1+|Capability: + *SpecConditionalINTEL* | 5 @@ -560,6 +570,8 @@ except for the _True Label_ reference from this may have a parent that is outside of that range, except for the _False Label_ reference from this *OpBranchSpecConstantWithElseINTEL* instruction. +This instruction must be the last instruction in a block. + 1+|Capability: + *SpecConditionalINTEL* | 6 From a90bb7da54b42d7ece907f935641ff1402e29e64 Mon Sep 17 00:00:00 2001 From: gmlueck Date: Fri, 21 May 2021 09:45:13 -0400 Subject: [PATCH 03/22] Apply suggestions from code review Co-authored-by: kbobrovs Co-authored-by: Artem Gindinson --- sycl/doc/OptionalDeviceFeatures.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 91a043b8c6b28..017557fe8f110 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -49,7 +49,7 @@ kernel uses an optional feature. The only exception to this rule occurs when the application uses the C++ attribute `[[sycl::requires()]]`. When the application decorates a kernel or device function with this attribute, it is an assertion that the kernel or -device function is expected to use *only* the features listed by the attribute. +device function is allowed to use only those optional features which are listed by the attribute. Therefore, the front-end compiler must issue a diagnostic if the kernel or device function uses any other optional kernel features. @@ -65,10 +65,10 @@ are not being compiled for. ### Runtime exception if device doesn't support feature -When the application submits a kernel to a device via one of the the kernel -invocation commands (e.g. `parallel_for()`), the runtime must check to see +When the application submits a kernel to a device via one of the kernel +invocation commands (e.g. `parallel_for()`), the runtime must check if the kernel uses optional features that are not supported on that device. -If the kernel uses a feature that is not supported, the runtime must throw +If the kernel uses an unsupported feature, the runtime must throw a synchronous `errc::kernel_not_supported` exception. This exception must be thrown in the following circumstances: From 50835d43ff7e069801d5e3a5ebeb364ae2f2d9b4 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 21 May 2021 10:38:09 -0400 Subject: [PATCH 04/22] Reformat to 80 columns after applying suggestions --- sycl/doc/OptionalDeviceFeatures.md | 14 +++++++------- 1 file changed, 7 insertions(+), 7 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 017557fe8f110..d879cf567292b 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -49,9 +49,9 @@ kernel uses an optional feature. The only exception to this rule occurs when the application uses the C++ attribute `[[sycl::requires()]]`. When the application decorates a kernel or device function with this attribute, it is an assertion that the kernel or -device function is allowed to use only those optional features which are listed by the attribute. -Therefore, the front-end compiler must issue a diagnostic if the kernel or -device function uses any other optional kernel features. +device function is allowed to use only those optional features which are listed +by the attribute. Therefore, the front-end compiler must issue a diagnostic if +the kernel or device function uses any other optional kernel features. Note that this behavior does not change when the compiler runs in AOT mode. Even if the user specifies a target device via "-fsycl-targets", that does not @@ -66,10 +66,10 @@ are not being compiled for. ### Runtime exception if device doesn't support feature When the application submits a kernel to a device via one of the kernel -invocation commands (e.g. `parallel_for()`), the runtime must check -if the kernel uses optional features that are not supported on that device. -If the kernel uses an unsupported feature, the runtime must throw -a synchronous `errc::kernel_not_supported` exception. +invocation commands (e.g. `parallel_for()`), the runtime must check if the +kernel uses optional features that are not supported on that device. If the +kernel uses an unsupported feature, the runtime must throw a synchronous +`errc::kernel_not_supported` exception. This exception must be thrown in the following circumstances: From 333bf432e75db2018f559036d2b23f2a8f0bd9a4 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 21 May 2021 16:24:11 -0400 Subject: [PATCH 05/22] Clarify when errc::kernel_not_supported is thrown Clarify wording about when `errc::kernel_not_supported` is thrown. --- sycl/doc/OptionalDeviceFeatures.md | 58 ++++++++++++++++++++---------- 1 file changed, 40 insertions(+), 18 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index d879cf567292b..666e6fcc924e6 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -73,17 +73,32 @@ kernel uses an unsupported feature, the runtime must throw a synchronous This exception must be thrown in the following circumstances: -* A device function in the kernel's static call tree uses a feature that the - device does not support. However, this only applies to features that are - exposed via a C++ type or function. Examples of this include `sycl::half` or - instantiating `sycl::atomic_ref` for a 64-bit type. In cases where the - feature is more "notional", such as requiring a particular type of forward - progress guarantee, no exception is required. - -* The kernel or a device function in the kernel's static call tree is decorated - with `[[sycl::requires()]]`, and the device does not have the required - aspects. An exception must be thrown in this case even if the kernel does - not actually use a feature corresponding to the aspect. +* For a kernel that is not decorated with `[[sycl::requires()]]`: the exception + must be thrown if a device function in the kernel's static call tree uses a + feature that the device does not support. However, this only applies to + features that are exposed via a C++ type or function. Examples of this + include `sycl::half` or instantiating `sycl::atomic_ref` for a 64-bit type. + If the kernel relies on optional features that are more "notional" such as + sub-group independent forward progress + (`info::device::sub_group_independent_forward_progress`), no exception is + required. + +* For a kernel that is decorated with `[[sycl::requires()]]`: the exception + must be thrown if the device does not have the aspects listed in that + attribute. Note that the exception must be thrown even if the kernel does + not actually use a feature corresponding to the aspect, and it must be + thrown even if the aspect does not correspond to any optional feature. + +* For a kernel that is decorated with `[[sycl::requires()]]`: the exception + must be thrown if a function in the kernel's static call tree uses a feature + that the device does not support even if the `[[sycl::requires()]]` attribute + is missing the corresponding aspect. This case can only occur if the kernel + calls a `SYCL_EXTERNAL` function in another translation unit and the function + uses an optional feature that is not listed in the `[[sycl::requires()]]` + attribute attached to the `SYCL_EXTERNAL` function declaration. (In any + other case, the front-end compiler would diagnose an error.) Although the + SYCL 2020 spec says that such applications are non-conformant, it is easy for + DPC++ to throw an exception in such a case. * The kernel is decorated with the `[[sycl::reqd_work_group_size(W)]]` or `[[sycl::reqd_sub_group_size(S)]]` attribute, and the device does not support @@ -330,13 +345,20 @@ decorations to the appropriate functions and to emit the specialization constants that these decorations reference. This can be done with two passes over each kernel's static call tree. -The first pass operates only on kernel functions that are not decorated with -the `[[sycl::requires()]]` attribute. When the kernel is decorated with this -attribute, the attribute tells the full set of aspects that the kernel uses -(and the front-end compiler has already validated this). For kernels without -the attribute, the pass propagates the required aspects from -`[[sycl::requires()]]` attributes in a kernel's call tree up to the kernel -function, forming a union of all required aspects for the kernel. +The first pass operates on each kernel, iterating over all the functions in the +kernel's static call tree to form the union of all aspects required by kernel. +If the kernel is decorated with the `[[sycl::requires()]]`, those aspects are +also added to the union. + +**NOTE**: This first pass traverses the static call tree even for kernels that +are decorated with the `[[sycl::requires()]]` attribute. Although the +front-end compiler has already verified that the kernel doesn't require any +aspects beyond those listed in the attribute, the front-end compiler was only +able to verify this for the device functions that reside in the same +translation unit as the kernel. Therefore, we might still find more required +aspects in the post-link tool because we have visibility to all device +functions in the kernel's static call tree, even those that are defined in +other translation units. Once we have the full set of aspects used by each kernel, we do the following for each kernel: From cda2d8ff750acc1142bb9ab21bd466ed4d23781a Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 21 May 2021 16:26:55 -0400 Subject: [PATCH 06/22] Expand design to include exported device functions Expand the design to include the case when device functions are exported from a shared library, which is a new feature proposed in #3210. --- sycl/doc/OptionalDeviceFeatures.md | 221 ++++++++++++++++------------- 1 file changed, 122 insertions(+), 99 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 666e6fcc924e6..fbfa6dfbb0e0a 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -22,9 +22,9 @@ attributes"][2] and [section 5.8.2 "Device function attributes"][3]. There are several categories of requirements covered by this design: -* The front-end compiler must issue a diagnostic in some cases when a kernel - uses an optional feature. However, the front-end compiler must **not** - generate a diagnostic in other cases. +* The front-end compiler must issue a diagnostic in some cases when a kernel or + device function uses an optional feature. However, the front-end compiler + must **not** generate a diagnostic in other cases. * The runtime must raise an exception when a kernel using optional features is submitted to a device that does not support those features. This @@ -131,7 +131,7 @@ is an optional feature which is only supported on devices that have the describe the set of optional features that a kernel uses (with the exception of the required work-group or sub-group size). -As will see later, it will be very helpful to decorate all APIs in DPC++ +As we will see later, it will be very helpful to decorate all APIs in DPC++ headers that correspond to optional kernel features with the `[[sycl::requires()]]` attribute. For example, the declaration of the `sycl::half` type would look like this: @@ -248,40 +248,42 @@ AOT cases. ### JIT case -The JIT case requires some change to the way kernels are bundled together into -device images. Currently, kernels are bundled together regardless of the -features they use, and this can lead to JIT errors due to speculative -compilation. Consider a device image that contains two kernels: `K1` uses no -optional features and `K2` uses an optional feature that corresponds to aspect -`A`. Now consider that the application submits kernel `K1` to a device that -does not support aspect `A`. Since the two kernels are bundled together into -one device image, the runtime really compiles both kernels for the device. -Currently, this will raise a JIT exception because the compilation of kernel -`K2` will fail when compiled for a device that does not support aspect `A`. - -There are two ways to solve this problem. One is to change the way kernels are -bundled into device images such that we never bundled two kernels together -unless they required exactly the same set of device aspects. Doing this would -avoid the error described above. However, we have elected for a different -solution. - -Instead, we will allow kernels to be bundled together as they currently are, -but we will introduce extra decorations into the generated SPIR-V that allow -the JIT compiler to discard kernels which require aspects that the device does -not support. Although this solution requires an extension to SPIR-V, we think -it is the better direction because it is aligned with the [device-if][4] -feature, which will also requires this same SPIR-V extension. +The JIT case requires some change to the way kernels and device functions are +bundled together into device images. Currently, kernels and device functions +are bundled together regardless of the features they use, and this can lead to +JIT errors due to speculative compilation. Consider a device image that +contains two kernels: `K1` uses no optional features and `K2` uses an optional +feature that corresponds to aspect `A`. Now consider that the application +submits kernel `K1` to a device that does not support aspect `A`. Since the +two kernels are bundled together into one device image, the runtime really +compiles both kernels for the device. Currently, this will raise a JIT +exception because the compilation of kernel `K2` will fail when compiled for a +device that does not support aspect `A`. + +There are two ways to solve this problem. One is to change the way kernels and +device functions are bundled into device images such that we never bundled two +kernels or device functions together unless they require exactly the same set +of device aspects. Doing this would avoid the error described above. However, +we have elected for a different solution. + +Instead, we will allow kernels and device functions to be bundled together as +they currently are, but we will introduce extra decorations into the generated +SPIR-V that allow the JIT compiler to discard kernels and device functions +which require aspects that the device does not support. Although this solution +requires an extension to SPIR-V, we think it is the better direction because it +is aligned with the [device-if][4] feature, which will also requires this same +SPIR-V extension. [4]: The idea is to emit a SPIR-V specialization constant for each aspect that is -required by a kernel in the device image. We then introduce a new SPIR-V -"decoration" that tells the JIT compiler to discard a function if a -specialization constant is `False`. The DPC++ runtime will set the values of +required by a kernel or device function in the device image. We then introduce +a new SPIR-V "decoration" that tells the JIT compiler to discard a function if +a specialization constant is `False`. The DPC++ runtime will set the values of the specialization constants according to the target device, thus the JIT -compiler discards (and does not compile) any kernels that use features which -are not supported on that device. This avoids errors due to speculative -compilation of kernels. +compiler discards (and does not compile) any kernels or device functions that +use features which are not supported on that device. This avoids errors due to +speculative compilation of kernels or device functions. #### Representation in SPIR-V @@ -343,12 +345,19 @@ mechanism. The post-link tool must be modified to add the SPIR-V `ConditionalINTEL` decorations to the appropriate functions and to emit the specialization constants that these decorations reference. This can be done with two passes -over each kernel's static call tree. +over the static call tree for each kernel and each exported device function. -The first pass operates on each kernel, iterating over all the functions in the -kernel's static call tree to form the union of all aspects required by kernel. -If the kernel is decorated with the `[[sycl::requires()]]`, those aspects are -also added to the union. +**NOTE**: In this context, "exported device function" means a device function +that is exported from a shared library as defined by [Device Code Dynamic +Linking][6]. + +[6]: + +The first pass operates on each kernel and each exported device function, +iterating over all the functions in the static call tree of that kernel or +exported device function to form the union of all aspects it requires. If the +kernel or exported device function is decorated with the +`[[sycl::requires()]]`, those aspects are also added to the union. **NOTE**: This first pass traverses the static call tree even for kernels that are decorated with the `[[sycl::requires()]]` attribute. Although the @@ -360,43 +369,45 @@ aspects in the post-link tool because we have visibility to all device functions in the kernel's static call tree, even those that are defined in other translation units. -Once we have the full set of aspects used by each kernel, we do the following -for each kernel: +Once we have the full set of aspects used by each kernel and exported device +function, we do the following for each: -* For each of the kernel's required aspects, emit an `OpSpecConstantTrue` op to +* For each required aspects, emit an `OpSpecConstantTrue` instruction to represent this requirement. We maintain a set of "required specialization - constants" for each kernel, which is used later. Add this specialization - constant to that set. In addition, add an "aspect" entry to the device - image's "SYCL/kernel reqs" property set, as described below. (We could - instead emit `OpSpecConstantFalse`. It doesn't matter because the runtime - will always provide a value for these specialization constants.) + constants" for each kernel or exported device function, which is used later. + Add this specialization constant to that set. In addition, add an "aspect" + entry to the device image's "SYCL/requirements" property set, as described + below. (We could instead emit `OpSpecConstantFalse`. It doesn't matter + because the runtime will always provide a value for these specialization + constants.) * If the kernel function is decorated with the `[[reqd_work_group_size()]]` attribute, emit an `OpSpecConstantTrue` op to represent this requirement and add this also to the kernel's set of required specialization constants. In addition, add a "reqd\_work\_group\_size" entry to the device image's - "SYCL/kernel reqs" property set. + "SYCL/requirements" property set. * If the kernel function is decorated with the `[[reqd_sub_group_size()]]` attribute, emit an `OpSpecConstantTrue` op to represent this requirement and add this also to the kernel's set of required specialization constants. In addition, add a "reqd\_sub\_group\_size" entry to the device image's - "SYCL/kernel reqs" property set. - -* If the kernel's set of required specialization constants is not empty, emit a - series of `OpSpecConstantOp` ops with the `OpLogicalAnd` opcode to compute - the expression `S1 && S2 && ...`, where `S1`, `S2`, etc. are the - specialization constants in that set. In addition, emit a - `ConditionalINTEL` decoration for the kernel's entry function which - references the `S1 && S2 && ...` specialization constant. - -The second pass propagates each kernel's required specialization constants back -down the static call tree. This pass starts such that each kernel entry -function has the set of required specialization constants as computed above. -The set of required specialization constants for each remaining function `F` is -computed as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions -of `F` in the static call tree. (Obviously, a `Pn` term can be omitted if the -parent function has no required specialization constants.) Once we have this + "SYCL/requirements" property set. + +* If the kernel or exported device function's set of required specialization + constants is not empty, emit a series of `OpSpecConstantOp` ops with the + `OpLogicalAnd` opcode to compute the expression `S1 && S2 && ...`, where + `S1`, `S2`, etc. are the specialization constants in that set. In addition, + emit a `ConditionalINTEL` decoration for the function which references the + `S1 && S2 && ...` specialization constant. + +The second pass propagates each kernel or exported device function's required +specialization constants back down the static call tree. This pass starts by +assigning each each kernel entry function and each exported device function the +set of required specialization constants that were computed above. The set of +required specialization constants for each remaining function `F` is computed +as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions of `F` in +the static call tree. (Obviously, a `Pn` term can be omitted if the parent +function has no required specialization constants.) Once we have this information, we do the following for each function `F` that has a non-empty set of required specialization constants: @@ -413,10 +424,11 @@ are emitted and reuse them when possible, rather than emitting duplicates. #### New device image property set A new device image property set is needed to inform the DPC++ runtime of the -aspects that each kernel requires and the work-group or sub-group sizes it may -require. This property set is named "SYCL/kernel reqs". The name of each -property in the set is the name of a kernel in the device image. The value -of each property has the following form: +aspects that each kernel or exported device function requires and the +work-group or sub-group sizes that each kernel requires. This property set is +named "SYCL/requirements". The name of each property in the set is the name of +a kernel or the name of an exported device function in the device image. The +value of each property has the following form: ``` [entry_count (uint32)] @@ -456,9 +468,9 @@ Parameter | Definition `spec_id` | The SPIR-V `SpecId` decoration for the specialization constant that the post-link tool generated for this requirement. Note that the post-link tool will generate a series of `OpSpecConstantOp` ops -when the kernel has multiple requirements. However, each property list entry -contains only the `SpecId` of the `OpSpecConstantTrue` op that is associated -with a single requirement. +when the kernel or exported device function has multiple requirements. +However, each property list entry contains only the `SpecId` of the +`OpSpecConstantTrue` op that is associated with a single requirement. #### Modifications to the DPC++ runtime @@ -469,19 +481,26 @@ kernel's requirements, and it must raise an `errc::kernel_not_supported` exception if it does not. When a kernel is submitted to a device, the runtime finds the device image that -contains the kernel and also finds the kernel's entry in the "SYCL/kernel reqs" -property set. This entry tells the set of requirements for the kernel. If the -target device does not support all of these requirements, then the runtime -raises `errc::kernel_not_supported`. This check can be done before the device -image is JIT compiled, so the exception can be thrown synchronously. - -Assuming this check passes, the first attempt to submit a kernel from a device -image will cause it to be JIT compiled. The runtime must be modified to do the -following: - -* Compute the union of all requirements from all kernels in the - "SYCL/kernel reqs" property set and their associated specialization - constants. +contains the kernel and also finds the kernel's entry in the +"SYCL/requirements" property set. This entry tells the set of requirements for +the kernel. If the target device does not support all of these requirements, +then the runtime raises `errc::kernel_not_supported`. This check can be done +before the device image is JIT compiled, so the exception can be thrown +synchronously. + +If the kernel imports device function symbols from a shared library as defined +in [Device Code Dynamic Linking][6], the runtime first identifies all the +device images that define these exported device functions. Before attempting +to link them together, the runtime finds the entries for the exported device +functions in their "SYCL/requirements" property sets and checks that the device +supports all these requirements. If it does not, the runtime throws +`errc::kernel_not_supported`. + +Whenever the runtime submits a SPIR-V image to the backend for online +compilation, it must do the following additional steps: + +* Compute the union of all requirements from all entries in the image's + "SYCL/requirements" property set. * Query the target device to see whether it supports each of these requirements, yielding either `True` or `False` for each one. @@ -493,7 +512,7 @@ Note that the runtime's cache of compiled device images does not need any special modification because the cache already needs to know the values of all the specialization constants that were used to compile the device image. We just need to make sure the cache is also aware of the specialization constants -which correspond to the kernels' requirements. +which correspond to the requirements from the "SYCL/requirements" property set. #### Modifications to the GEN compiler @@ -507,18 +526,22 @@ functions. The AOT case uses exactly the same solution as the JIT case described above, but there is one extra steps. For the AOT case, the post-link tool must set -the values of the specialization constants that correspond to the kernel -requirements, using the device named in the "-fsycl-targets" command line -option. After doing this, the post-link tool calls the AOT compiler to -generate native code from SPIR-V as it normally does. If more than one target -device is specified, the post-link tool sets the specialization constants -separately for each device before generating native code for that device. - -Note that the native device image may not contain all kernels if there are -kernels that use optional features. Nevertheless, the "SYCL/kernel reqs" -property set still has entries for all kernel functions. If the application -attempts to invoke one of the discarded kernels on a device (which does not -support the kernel's features), the runtime will see that the kernel is not -supported by using information from the "SYCL/kernel reqs" property set, and -the runtime will raise an exception. Thus, the runtime will never attempt to -invoke one of these discarded kernels. +the values of the specialization constants that correspond to the requirements +for the kernel or exported device function, using the device named in the +"-fsycl-targets" command line option. After doing this, the post-link tool +calls the AOT compiler to generate native code from SPIR-V as it normally does. +If more than one target device is specified, the post-link tool sets the +specialization constants separately for each device before generating native +code for that device. + +Note that the native device image may not contain all kernels or all exported +device functions if they use optional features. Nevertheless, the +"SYCL/requirements" property set still has entries for all kernel functions and +all exported device functions. If the application attempts to invoke one of +the discarded kernels on a device (which does not support the kernel's +features), the runtime will see that the kernel is not supported by using +information from the "SYCL/requirements" property set, and the runtime will +raise an exception. Thus, the runtime will never attempt to invoke one of +these discarded kernels. Likewise, if a kernel imports a discarded device +function, the runtime will see that the device function is unsupported and +will raise an exception before attempting to perform the dynamic link. From eed9bc693b4d3ae3404c89245778e38b7e6733da Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 21 May 2021 17:47:00 -0400 Subject: [PATCH 07/22] Add appendix showing atomic_ref with attribute Address a code review comment asking how we can decorate `atomic_ref` with `[[sycl::requires()]]` only when the type is 8 bytes. Add an appendix showing how this can be done using partial specialization. --- sycl/doc/OptionalDeviceFeatures.md | 78 ++++++++++++++++++++++++++---- 1 file changed, 68 insertions(+), 10 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index fbfa6dfbb0e0a..fb4d7e20a688a 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -145,13 +145,16 @@ In cases where the optional feature corresponds to use of a class (e.g. ``` template -class [[sycl::requires(has(aspect::fp64))]] atomic_ref { +class [[sycl::requires(has(aspect::atomic64))]] atomic_ref { /* ... */ }; ``` -(We can use partial specialization tricks to decorate `atomic_ref` with the -attribute only when the underlying type is 64-bits.) +(In reality, we can use partial specialization tricks to decorate `atomic_ref` +with the attribute only when the underlying type is 64-bits. See ["Appendix: +Adding an attribute to 8-byte `atomic_ref`"][4].) + +[4]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> In cases where the optional feature corresponds to a function, we can decorate the function's declaration with the attribute like so (demonstrating a @@ -271,10 +274,10 @@ they currently are, but we will introduce extra decorations into the generated SPIR-V that allow the JIT compiler to discard kernels and device functions which require aspects that the device does not support. Although this solution requires an extension to SPIR-V, we think it is the better direction because it -is aligned with the [device-if][4] feature, which will also requires this same +is aligned with the [device-if][5] feature, which will also requires this same SPIR-V extension. -[4]: +[5]: The idea is to emit a SPIR-V specialization constant for each aspect that is required by a kernel or device function in the device image. We then introduce @@ -328,10 +331,10 @@ OpDecorate %16 ConditionalINTEL %13 ; Says to discard the function OpFunctionEnd ``` -See the extension specification of [SpecConditional][5] for a full +See the extension specification of [SpecConditional][6] for a full description of this new SPIR-V decoration. -[5]: +[6]: #### Representation in LLVM IR @@ -349,9 +352,9 @@ over the static call tree for each kernel and each exported device function. **NOTE**: In this context, "exported device function" means a device function that is exported from a shared library as defined by [Device Code Dynamic -Linking][6]. +Linking][7]. -[6]: +[7]: The first pass operates on each kernel and each exported device function, iterating over all the functions in the static call tree of that kernel or @@ -489,7 +492,7 @@ before the device image is JIT compiled, so the exception can be thrown synchronously. If the kernel imports device function symbols from a shared library as defined -in [Device Code Dynamic Linking][6], the runtime first identifies all the +in [Device Code Dynamic Linking][7], the runtime first identifies all the device images that define these exported device functions. Before attempting to link them together, the runtime finds the entries for the exported device functions in their "SYCL/requirements" property sets and checks that the device @@ -545,3 +548,58 @@ raise an exception. Thus, the runtime will never attempt to invoke one of these discarded kernels. Likewise, if a kernel imports a discarded device function, the runtime will see that the device function is unsupported and will raise an exception before attempting to perform the dynamic link. + + +## Appendix: Adding an attribute to 8-byte `atomic_ref` + +As described above under ["Changes to DPC++ headers"][8], we need to decorate the +`atomic_ref` type with the `[[sycl::requires()]]` attribute only when it is +specialized with an 8-byte type. This can be accomplished with some template +partial specialization tricks. The following code snippet demonstrates (best +read from bottom to top): + +[8]: <#changes-to-dpc-headers> + +``` +namespace sycl { +namespace detail { + +template +class atomic_ref_impl_base { + public: + atomic_ref_impl_base(T x) : x(x) {} + + // All the member functions for atomic_ref go here + + private: + T x; +}; + +// Template class which can be specialized based on the size of the underlying +// type. +template +class atomic_ref_impl : public atomic_ref_impl_base { + public: + using atomic_ref_impl_base::atomic_ref_impl_base; +}; + +// Explicit specialization for 8-byte types. Only this specialization has the +// attribute. +template +class [[sycl::requires(has(aspect::atomic64))]] atomic_ref_impl : + public atomic_ref_impl_base { + public: + using atomic_ref_impl_base::atomic_ref_impl_base; +}; + +} // namespace detail + +// Publicly visible atomic_ref class. +template +class atomic_ref : public detail::atomic_ref_impl { + public: + atomic_ref(T x) : detail::atomic_ref_impl(x) {} +}; + +} // namespace sycl +``` From 2b51cf572fb0b92f7257968dcdad589639ab7bfb Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 27 May 2021 17:29:18 -0400 Subject: [PATCH 08/22] Address review comments * Clarify that the initial short list of requirements will be explained in detail later. * Clarify what "static call tree" means when there are function pointers. * Post-link tool now diagnoses an error if a `SYCL_EXTERNAL` function is incorrect decorated with `[[sycl::requires()]]`. * Use `OpSpecConstantFalse` instead of `OpSpecConstantTrue`. --- sycl/doc/OptionalDeviceFeatures.md | 186 +++++++++++++++++++---------- 1 file changed, 121 insertions(+), 65 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index fb4d7e20a688a..f4b4a5d10e8bf 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -20,7 +20,8 @@ attributes"][2] and [section 5.8.2 "Device function attributes"][3]. ## Requirements -There are several categories of requirements covered by this design: +There are several categories of requirements covered by this design. Each of +these is described in more detail in the sections that follow: * The front-end compiler must issue a diagnostic in some cases when a kernel or device function uses an optional feature. However, the front-end compiler @@ -35,6 +36,31 @@ There are several categories of requirements covered by this design: speculative compilation of a kernel for a device, when the application does not specifically submit the kernel to that device. +### Clarification of a kernel's static call tree + +Some of the requirements below refer to the "static call tree" of a kernel. In +standard SYCL, device code is not allowed to contain function pointers, virtual +function, or indirect function calls. It is therefore easy to compute the +static call tree of a kernel. By starting at the kernel function itself (e.g. +the function passed to `parallel_for`), the compiler can identify all functions +called by that function, then it can find all functions called by those +functions, etc. Depending on the tool which does the analysis, the "static +call tree" could include only those functions that reside in the same +translation unit as the kernel, or it could include all functions that reside +in the same executable image (or shared library) as the kernel. In the +sections below, we try to make the distinction clear whenever we refer to a +kernel's static call tree. + +We are contemplating a DPC++ extension that would allow some limited use of +function pointers in device code. This feature is not yet fully defined or +supported. We expect that the semantics of this feature will include some way +for the compiler to deduce a limited set of possible targets for each indirect +function call. Therefore, it is still possible for the compiler to construct a +"static call tree" for each kernel, the only difference is that each call site +now adds a set of possible target functions to a kernel's static call tree. +The details about how this will work are expected to be included in the DPC++ +extension specification that enables indirect function calls. + ### Diagnostics from the front-end compiler By "front-end compiler", we mean the DPC++ compiler which parses DPC++ source @@ -53,6 +79,18 @@ device function is allowed to use only those optional features which are listed by the attribute. Therefore, the front-end compiler must issue a diagnostic if the kernel or device function uses any other optional kernel features. +The SYCL 2020 specification only mandates this error when a kernel or device +function that is decorated with `[[sycl::requires()]]` uses an optional kernel +feature (not listed in the attribute), **and** when that use is in the kernel's +static call tree as computed for the translation unit that contains the kernel +function. Thus, the compiler is not required to diagnose an error if the use +is in a `SYCL_EXTERNAL` function that is defined in another translation unit. + +It turns out, though, that DPC++ can diagnose this case at link time, when we +have visibility into device functions that are defined in other translation +units. Since the design proposed below allows this error to be checked with +minimal extra effort, it is desirable to do so. + Note that this behavior does not change when the compiler runs in AOT mode. Even if the user specifies a target device via "-fsycl-targets", that does not necessarily mean that the user expects all the code in the application to be @@ -71,38 +109,42 @@ kernel uses optional features that are not supported on that device. If the kernel uses an unsupported feature, the runtime must throw a synchronous `errc::kernel_not_supported` exception. -This exception must be thrown in the following circumstances: +When doing these checks, the runtime must consider all uses of optional +features in the kernel's static call tree, regardless of whether those uses are +in the same translation unit as the kernel and regardless of whether those uses +come from device code in a shared library. -* For a kernel that is not decorated with `[[sycl::requires()]]`: the exception - must be thrown if a device function in the kernel's static call tree uses a - feature that the device does not support. However, this only applies to - features that are exposed via a C++ type or function. Examples of this - include `sycl::half` or instantiating `sycl::atomic_ref` for a 64-bit type. - If the kernel relies on optional features that are more "notional" such as - sub-group independent forward progress - (`info::device::sub_group_independent_forward_progress`), no exception is - required. +This exception, however, is only required for features that are exposed via a +C++ type or function. Examples of this include `sycl::half` or instantiating +`sycl::atomic_ref` for a 64-bit type. If the kernel relies on optional +features that are more "notional" such as sub-group independent forward +progress (`info::device::sub_group_independent_forward_progress`), no exception +is required. -* For a kernel that is decorated with `[[sycl::requires()]]`: the exception +To further clarify, this exception must be thrown in the following +circumstances: + +* For a kernel that is not decorated with `[[sycl::requires()]]`, the exception + must be thrown if the kernel uses a feature that the device does not support. + +* For a kernel that is decorated with `[[sycl::requires()]]`, the exception must be thrown if the device does not have the aspects listed in that attribute. Note that the exception must be thrown even if the kernel does not actually use a feature corresponding to the aspect, and it must be thrown even if the aspect does not correspond to any optional feature. -* For a kernel that is decorated with `[[sycl::requires()]]`: the exception - must be thrown if a function in the kernel's static call tree uses a feature - that the device does not support even if the `[[sycl::requires()]]` attribute - is missing the corresponding aspect. This case can only occur if the kernel - calls a `SYCL_EXTERNAL` function in another translation unit and the function - uses an optional feature that is not listed in the `[[sycl::requires()]]` - attribute attached to the `SYCL_EXTERNAL` function declaration. (In any - other case, the front-end compiler would diagnose an error.) Although the - SYCL 2020 spec says that such applications are non-conformant, it is easy for - DPC++ to throw an exception in such a case. - -* The kernel is decorated with the `[[sycl::reqd_work_group_size(W)]]` or - `[[sycl::reqd_sub_group_size(S)]]` attribute, and the device does not support - the work group size `W` or the sub-group size `S`. +* For a kernel that is decorated with `[[sycl::requires()]]`, the compiler will + mostly check (at compile time) whether the kernel uses any features that are + not listed in the attribute. The only case not checked at compile time is + when a kernel calls a device function that is defined in a shared library. + Therefore, the runtime is responsible for throwing the exception if a + kernel's function (defined in a shared library) uses an optional feature + that the device does not support. + +* For a kernel that is decorated with the `[[sycl::reqd_work_group_size(W)]]` + or `[[sycl::reqd_sub_group_size(S)]]` attribute, the exception must be thrown + if the device does not support the work group size `W` or the sub-group size + `S`. Note that the exception must be thrown synchronously, not delayed and thrown on the queue's asynchronous handler. @@ -211,11 +253,11 @@ When the front-end compiler sees a kernel or device function that is decorated with `[[sycl::requires()]]`, it forms the set of allowed aspects for that kernel or device function using aspects listed in the attribute. Let's call this the `Allowed` set. The front-end then computes the static call tree of -that kernel or device function and forms the union of all aspects in any -`[[sycl::requires()]]` attributes that decorate any of these functions or any -of the types used inside these functions. Let's call this the `Used` set. If -the `Used` set contains any aspects not in the `Allowed` set, the front-end -issues a diagnostic. +that kernel or device function (examining only code within this translation +unit) and forms the union of all aspects in any `[[sycl::requires()]]` +attributes that decorate any of these functions or any of the types used inside +these functions. Let's call this the `Used` set. If the `Used` set contains +any aspects not in the `Allowed` set, the front-end issues a diagnostic. In order to be user-friendly, the diagnostic should point the user to the location of the problem. Therefore, the diagnostic message should include the @@ -301,8 +343,8 @@ OpDecorate %11 SpecId 1 ; External ID for spec const A1 OpDecorate %12 SpecId 2 ; External ID for spec const A2 %10 = OpTypeBool -%11 = OpSpecConstantTrue %10 ; Represents A1 -%12 = OpSpecConstantTrue %10 ; Represents A2 +%11 = OpSpecConstantFalse %10 ; Represents A1 +%12 = OpSpecConstantFalse %10 ; Represents A2 %13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 ``` @@ -320,8 +362,8 @@ OpDecorate %12 SpecId 2 ; External ID for spec const A2 OpDecorate %16 ConditionalINTEL %13 ; Says to discard the function ; below when (A1 && A2) is False %10 = OpTypeBool -%11 = OpSpecConstantTrue %10 ; Represents A1 -%12 = OpSpecConstantTrue %10 ; Represents A2 +%11 = OpSpecConstantFalse %10 ; Represents A1 +%12 = OpSpecConstantFalse %10 ; Represents A2 %13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 %14 = OpTypeVoid %15 = OpTypeFunction %14 @@ -349,6 +391,10 @@ The post-link tool must be modified to add the SPIR-V `ConditionalINTEL` decorations to the appropriate functions and to emit the specialization constants that these decorations reference. This can be done with two passes over the static call tree for each kernel and each exported device function. +When this phase computes the static call tree, it considers all code in any of +the translation units that are being linked together. This may not be the +complete call tree, however, in cases where a kernel calls out to a device +function that is defined in a different shared library. **NOTE**: In this context, "exported device function" means a device function that is exported from a shared library as defined by [Device Code Dynamic @@ -356,42 +402,52 @@ Linking][7]. [7]: -The first pass operates on each kernel and each exported device function, -iterating over all the functions in the static call tree of that kernel or -exported device function to form the union of all aspects it requires. If the -kernel or exported device function is decorated with the -`[[sycl::requires()]]`, those aspects are also added to the union. - -**NOTE**: This first pass traverses the static call tree even for kernels that -are decorated with the `[[sycl::requires()]]` attribute. Although the -front-end compiler has already verified that the kernel doesn't require any -aspects beyond those listed in the attribute, the front-end compiler was only -able to verify this for the device functions that reside in the same -translation unit as the kernel. Therefore, we might still find more required -aspects in the post-link tool because we have visibility to all device -functions in the kernel's static call tree, even those that are defined in -other translation units. - -Once we have the full set of aspects used by each kernel and exported device -function, we do the following for each: - -* For each required aspects, emit an `OpSpecConstantTrue` instruction to - represent this requirement. We maintain a set of "required specialization - constants" for each kernel or exported device function, which is used later. - Add this specialization constant to that set. In addition, add an "aspect" - entry to the device image's "SYCL/requirements" property set, as described - below. (We could instead emit `OpSpecConstantFalse`. It doesn't matter - because the runtime will always provide a value for these specialization - constants.) +The first pass operates on the static call tree for each kernel and each +exported device function, propagating the aspects that are used up from the +leaves of the call tree. The result of this pass is that each function in +the call tree is labeled with the union of all aspects that are used in that +function or in any of the functions it calls. We call this the `Used` set of +aspects. + +The error checking in the front-end of the compiler has already verified that +a function decorated with the `[[sycl::requires()]]` attribute does not use +any optional features other than those listed in the attribute. However, the +static call tree constructed by the front-end may not be as complete as the +call tree constructed by the post-link tool, for example when a kernel calls +a `SYCL_EXTERNAL` device function that is defined in another translation unit. +Since the effort is minimal, we do the error checking again in the post-link +tool in order to catch more errors. + +If any of the device functions was annotated with the `[[sycl::requires()]]` +attribute, we call the set of aspects in that attribute the `Allowed` set. +If the `Used` set contains any aspects not in the `Allowed` set, we issue a +diagnostic. Note that we do this analysis for every device function, not +just the ones that correspond to kernels or to exported device functions. + +**TODO**: Can this diagnostic include the source position of the attribute and +the source position of the code that uses optional feature? It think this +depends on the information in the LLVM IR, which is not defined yet. + +After checking for diagnostics, we compute the union of the `Used` set and the +`Allowed` set (if any) for each kernel and each exported device function. We +call this the function's `Required` set of aspects. We then do the following +for each kernel and each exported device function: + +* For each aspect in the `Required` set, emit an `OpSpecConstantFalse` + instruction to represent this requirement. We maintain a set of "required + specialization constants" for each kernel or exported device function, which + is used later. Add this specialization constant to that set. In addition, + add an "aspect" entry to the device image's "SYCL/requirements" property set, + as described below. * If the kernel function is decorated with the `[[reqd_work_group_size()]]` - attribute, emit an `OpSpecConstantTrue` op to represent this requirement and + attribute, emit an `OpSpecConstantFalse` op to represent this requirement and add this also to the kernel's set of required specialization constants. In addition, add a "reqd\_work\_group\_size" entry to the device image's "SYCL/requirements" property set. * If the kernel function is decorated with the `[[reqd_sub_group_size()]]` - attribute, emit an `OpSpecConstantTrue` op to represent this requirement and + attribute, emit an `OpSpecConstantFalse` op to represent this requirement and add this also to the kernel's set of required specialization constants. In addition, add a "reqd\_sub\_group\_size" entry to the device image's "SYCL/requirements" property set. @@ -473,7 +529,7 @@ Parameter | Definition Note that the post-link tool will generate a series of `OpSpecConstantOp` ops when the kernel or exported device function has multiple requirements. However, each property list entry contains only the `SpecId` of the -`OpSpecConstantTrue` op that is associated with a single requirement. +`OpSpecConstantFalse` op that is associated with a single requirement. #### Modifications to the DPC++ runtime From f67816befcf8bdb4afc68e26cedc68759eedbf16 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 1 Jun 2021 15:24:03 -0400 Subject: [PATCH 09/22] Address more review comments * Clarify that the new "SYCL/requirements" property list has properties of type `BYTE_ARRAY`, which is one of the existing property types. --- sycl/doc/OptionalDeviceFeatures.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index f4b4a5d10e8bf..b3a887f6c0ad2 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -487,7 +487,7 @@ aspects that each kernel or exported device function requires and the work-group or sub-group sizes that each kernel requires. This property set is named "SYCL/requirements". The name of each property in the set is the name of a kernel or the name of an exported device function in the device image. The -value of each property has the following form: +value of each property is a `BYTE_ARRAY` that has the following form: ``` [entry_count (uint32)] From 7a6e0797a5634f5bc771f41866cf1b11b8ae5c90 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Mon, 7 Jun 2021 15:28:34 -0400 Subject: [PATCH 10/22] Several large changes in design * After some internal discussions, we decided to use an alternate design that does not rely on an extension to SPIR-V. Instead, we change the device code split algorithm in the post-link tool to bundle two kernels together only if they have the same requirements (same aspects and same required sub-group size). We feel this will make it easier to support other backends in the future. * In the section describing changes to the front-end, add a more precise definition for what it means to "use an aspect". * Add a description of the LLVM IR changes that are needed in order to pass information from the front-end to the post-link tool. * Add more information about changes needed in order to support AOT mode, but more work is still required here. * We decided that the front-end should not be responsible for diagnosing errors with the `[[sycl::requires()]]` attribute, and that a separate clang static analysis pass should do this instead. Start a description of this pass, but more work is required here too. --- sycl/doc/OptionalDeviceFeatures.md | 853 +++++++++--------- .../SPIRV/SPV_INTEL_spec_conditional.asciidoc | 694 -------------- 2 files changed, 423 insertions(+), 1124 deletions(-) delete mode 100644 sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index b3a887f6c0ad2..7c64cf416d657 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -36,31 +36,33 @@ these is described in more detail in the sections that follow: speculative compilation of a kernel for a device, when the application does not specifically submit the kernel to that device. -### Clarification of a kernel's static call tree - -Some of the requirements below refer to the "static call tree" of a kernel. In -standard SYCL, device code is not allowed to contain function pointers, virtual -function, or indirect function calls. It is therefore easy to compute the -static call tree of a kernel. By starting at the kernel function itself (e.g. -the function passed to `parallel_for`), the compiler can identify all functions -called by that function, then it can find all functions called by those -functions, etc. Depending on the tool which does the analysis, the "static -call tree" could include only those functions that reside in the same + +### Clarification of a kernel's static call graph + +Some of the requirements below refer to the "static call graph" of a kernel. +In standard SYCL, device code is not allowed to contain function pointers, +virtual functions, or indirect function calls. It is therefore easy to compute +the static call graph of a kernel. By starting at the kernel function itself +(e.g. the function passed to `parallel_for`), the compiler can identify all +functions called by that function, then it can find all functions called by +those functions, etc. Depending on the tool which does the analysis, the +"static call graph" could include only those functions that reside in the same translation unit as the kernel, or it could include all functions that reside in the same executable image (or shared library) as the kernel. In the sections below, we try to make the distinction clear whenever we refer to a -kernel's static call tree. +kernel's static call graph. We are contemplating a DPC++ extension that would allow some limited use of function pointers in device code. This feature is not yet fully defined or supported. We expect that the semantics of this feature will include some way for the compiler to deduce a limited set of possible targets for each indirect function call. Therefore, it is still possible for the compiler to construct a -"static call tree" for each kernel, the only difference is that each call site -now adds a set of possible target functions to a kernel's static call tree. +"static call graph" for each kernel, the only difference is that each call site +now adds a set of possible target functions to a kernel's static call graph. The details about how this will work are expected to be included in the DPC++ extension specification that enables indirect function calls. + ### Diagnostics from the front-end compiler By "front-end compiler", we mean the DPC++ compiler which parses DPC++ source @@ -79,17 +81,18 @@ device function is allowed to use only those optional features which are listed by the attribute. Therefore, the front-end compiler must issue a diagnostic if the kernel or device function uses any other optional kernel features. -The SYCL 2020 specification only mandates this error when a kernel or device -function that is decorated with `[[sycl::requires()]]` uses an optional kernel -feature (not listed in the attribute), **and** when that use is in the kernel's -static call tree as computed for the translation unit that contains the kernel -function. Thus, the compiler is not required to diagnose an error if the use -is in a `SYCL_EXTERNAL` function that is defined in another translation unit. +The SYCL 2020 specification only mandates this diagnostic when a kernel or +device function that is decorated with `[[sycl::requires()]]` uses an optional +kernel feature (not listed in the attribute), **and** when that use is in the +kernel's static call graph as computed for the translation unit that contains +the kernel function. Thus, the compiler is not required to issue a diagnostic +if the use is in a `SYCL_EXTERNAL` function that is defined in another +translation unit. It turns out, though, that DPC++ can diagnose this case at link time, when we have visibility into device functions that are defined in other translation -units. Since the design proposed below allows this error to be checked with -minimal extra effort, it is desirable to do so. +units. Since the design proposed below allows this diagnostic to be checked +with minimal extra effort, it is desirable to do so. Note that this behavior does not change when the compiler runs in AOT mode. Even if the user specifies a target device via "-fsycl-targets", that does not @@ -101,6 +104,7 @@ library without getting errors. Therefore the AOT compiler must not fail simply because the middleware header contains device code for devices that are not being compiled for. + ### Runtime exception if device doesn't support feature When the application submits a kernel to a device via one of the kernel @@ -110,9 +114,9 @@ kernel uses an unsupported feature, the runtime must throw a synchronous `errc::kernel_not_supported` exception. When doing these checks, the runtime must consider all uses of optional -features in the kernel's static call tree, regardless of whether those uses are -in the same translation unit as the kernel and regardless of whether those uses -come from device code in a shared library. +features in the kernel's static call graph, regardless of whether those uses +are in the same translation unit as the kernel and regardless of whether those +uses come from device code in a shared library. This exception, however, is only required for features that are exposed via a C++ type or function. Examples of this include `sycl::half` or instantiating @@ -135,11 +139,10 @@ circumstances: * For a kernel that is decorated with `[[sycl::requires()]]`, the compiler will mostly check (at compile time) whether the kernel uses any features that are - not listed in the attribute. The only case not checked at compile time is - when a kernel calls a device function that is defined in a shared library. - Therefore, the runtime is responsible for throwing the exception if a - kernel's function (defined in a shared library) uses an optional feature - that the device does not support. + not listed in the attribute. However, this check only results in a warning, + so the runtime is still responsible for throwing the exception if any of the + functions called by the kernel uses an optional feature that the device does + not support. * For a kernel that is decorated with the `[[sycl::reqd_work_group_size(W)]]` or `[[sycl::reqd_sub_group_size(S)]]` attribute, the exception must be thrown @@ -149,19 +152,20 @@ circumstances: Note that the exception must be thrown synchronously, not delayed and thrown on the queue's asynchronous handler. + ### No runtime exception for speculative compilation It is currently common for the runtime to speculatively compile some kernels. For example, DPC++ may bundle all kernels from the same translation unit -together into a single device image. When the application submits one kernel K -to a device D, the runtime actually compiles all kernels in K's device image -for device D. Let's assume in this example that the kernel K uses only -features that are supported by D. It would be illegal for the runtime to throw -an exception in such a case just because some other kernel in the same device -image uses a feature that is not supported by device D. +together into a single device image. When the application submits one kernel +*K* to a device *D*, the runtime actually compiles all kernels in *K*'s device +image for device *D*. Let's assume in this example that the kernel *K* uses +only features that are supported by *D*. It would be illegal for the runtime +to throw an exception in such a case just because some other kernel in the same +device image uses a feature that is not supported by device *D*. -## Design to implement required diagnostics +## Design ### Changes to DPC++ headers @@ -173,448 +177,437 @@ is an optional feature which is only supported on devices that have the describe the set of optional features that a kernel uses (with the exception of the required work-group or sub-group size). -As we will see later, it will be very helpful to decorate all APIs in DPC++ -headers that correspond to optional kernel features with the -`[[sycl::requires()]]` attribute. For example, the declaration of the -`sycl::half` type would look like this: +As we will see later, it is helpful to decorate all APIs in DPC++ headers that +correspond to optional kernel features with a C++ attribute that identifies the +associated aspect. We cannot use the `[[sycl::requires()]]` attribute for this +purpose, though, because that attribute is allowed only for functions. +Instead, we invent a new internal attribute `[[sycl_detail::uses_aspects()]]` +that can be used to decorate both functions and types. This attribute is not +documented for user code; instead it is an internal implementation detail of +DPC++. + +Like all use of C++ attributes in the DPC++ headers, the headers should spell +the attribute using initial and trailing double underscores +(`[[__sycl_detail__::__uses_aspects__()]]`). We show that form in the code +samples below, but this design document uses the form without the underbars +elsewhere. Both forms refer to the same attribute. + +To illustrate, the type `sycl::half` is an optional feature whose associated +aspect is `aspect::fp16`. We therefore decorate the declaration like this: ``` -using half [[sycl::requires(has(aspect::fp16))]] = cl::sycl::detail::half_impl::half; +using half [[__sycl_detail__::__uses_aspects__(has(aspect::fp16))]] = + cl::sycl::detail::half_impl::half; ``` -In cases where the optional feature corresponds to use of a class (e.g. -`sycl::atomic_ref`), the declaration can look like this: +If an optional feature is expressed as a class type, it can be similarly +decorated (here illustrating a hypothetical AMX type): ``` -template -class [[sycl::requires(has(aspect::atomic64))]] atomic_ref { +class [[__sycl_detail__::__uses_aspects__(has(aspect::ext_intel_amx))]] amx_type { /* ... */ }; ``` -(In reality, we can use partial specialization tricks to decorate `atomic_ref` -with the attribute only when the underlying type is 64-bits. See ["Appendix: -Adding an attribute to 8-byte `atomic_ref`"][4].) - -[4]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> - -In cases where the optional feature corresponds to a function, we can decorate -the function's declaration with the attribute like so (demonstrating a -hypothetical AMX multiplication extension): +This attribute is also used to decorate function declarations that correspond +to optional features. Again, illustrating a hypothetical AMX extension: ``` -[[sycl::requires(has(aspect::ext_intel_amx))]] +[[__sycl_detail__::__uses_aspects__(has(aspect::ext_intel_amx))]] void amx_multiply(); ``` -These attributes provide an explicit mapping between each optional kernel -feature and its associated aspect. +This attribute can also be used to decorate class templates where only certain +instantiations correspond to optional features. See ["Appendix: Adding an +attribute to 8-byte `atomic_ref`"][4] for an illustration of how this attribute +can be used in conjunction with partial specialization to mark only certain +instantiations of `sycl::atomic_ref` as an optional feature. + +[4]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> + +As you can see from the examples above, the syntax for the parameter to the +`[[sycl_detail::uses_aspects()]]` attribute is identical to the syntax for the +standard `[[sycl::requires()]]` attribute. Unfortunately, the fundamental type `double` is also an optional kernel feature. Since there is no type alias for `double`, there is no convenient place to add an attribute. Instead, the front-end device compiler must behave -as though there was an implicit `[[sycl::requires(has(aspect::fp64))]]` -attribute for any device code that uses the `double` type. - -Note that the usage of `[[sycl::requires()]]` is slightly expanded here beyond -the specified usage in the SYCL 2020 specification because we allow the -attribute also on type alias declarations and class declarations. If a device -function does any of the following with a type alias or class that was so -decorated, the function is assumed to "use the aspects" listed in the -attribute: - -* Declares a variable of that type. -* Has a formal parameter declared with that type. -* Returns that type. - -This also includes any qualified version of the type. - -**TODO**: This language is not very precise. The intent is to include most -uses of the type, except for cases like `sizeof(T)` or `decltype(T)`. Help -appreciated on tightening the wording here. - -**TODO**: Would it be better to use a different attribute name when decorating -types, rather than expanding the meaning of `[[sycl::requires()]]`? If we did -this, the new attribute would become an internal DPC++ implementation detail; -we would not add it to the SYCL specification. - -### Implementing diagnostics in the DPC++ front-end - -As noted above, the front-end device compiler must behave as though there is an -implicit `[[sycl::requires(has(aspect::fp64))]]` attribute on any use of the -`double` type in device code. - -Aside from this, the front-end compiler can implement the required diagnostics -purely from the C++ attributes that have been added to the DPC++ headers. -There is no need for the front-end compiler to know which device features are -optional. - -When the front-end compiler sees a kernel or device function that is decorated -with `[[sycl::requires()]]`, it forms the set of allowed aspects for that -kernel or device function using aspects listed in the attribute. Let's call -this the `Allowed` set. The front-end then computes the static call tree of -that kernel or device function (examining only code within this translation -unit) and forms the union of all aspects in any `[[sycl::requires()]]` -attributes that decorate any of these functions or any of the types used inside -these functions. Let's call this the `Used` set. If the `Used` set contains -any aspects not in the `Allowed` set, the front-end issues a diagnostic. - -In order to be user-friendly, the diagnostic should point the user to the -location of the problem. Therefore, the diagnostic message should include the -following information: - -* The source position of the `[[sycl::requires()]]` attribute that decorates - the kernel or device function which provides the `Allowed` aspect set. This - tells the user which aspects the kernel or device function intends to use. - -* The source position of the call to a function that is decorated with - `[[sycl::requires()]]` or the source position of the use of a type that is - decorated with `[[sycl::requires()]]`. This tells the user where in the - kernel a particular aspect is actually used. - -Note that this analysis can be done in the front-end compiler even when a -kernel makes a call to a function that is in another translation unit. -Language rules require the application to declare such a function with -`SYCL_EXTERNAL` in the calling TU, and the `SYCL_EXTERNAL` declaration must be -decorated with the `[[sycl::requires()]]` attribute. Therefore, the front-end -can diagnose errors with aspect usage even without seeing the definition of the -`SYCL_EXTERNAL` function. - - -## Design to raise required exceptions (and avoid forbidden errors) - -As described above the runtime must raise an `errc::kernel_not_supported` -exception when a kernel is submitted to a device that does not support the -optional features that the kernel uses. Likewise, the runtime must **not** -raise an exception (or otherwise produce an error) due to speculative -compilation of a kernel for a device, unless the application actually submits -the kernel to that device. The solution is largely the same for both JIT and -AOT cases. - -### JIT case - -The JIT case requires some change to the way kernels and device functions are -bundled together into device images. Currently, kernels and device functions -are bundled together regardless of the features they use, and this can lead to -JIT errors due to speculative compilation. Consider a device image that -contains two kernels: `K1` uses no optional features and `K2` uses an optional -feature that corresponds to aspect `A`. Now consider that the application -submits kernel `K1` to a device that does not support aspect `A`. Since the -two kernels are bundled together into one device image, the runtime really -compiles both kernels for the device. Currently, this will raise a JIT -exception because the compilation of kernel `K2` will fail when compiled for a -device that does not support aspect `A`. - -There are two ways to solve this problem. One is to change the way kernels and -device functions are bundled into device images such that we never bundled two -kernels or device functions together unless they require exactly the same set -of device aspects. Doing this would avoid the error described above. However, -we have elected for a different solution. - -Instead, we will allow kernels and device functions to be bundled together as -they currently are, but we will introduce extra decorations into the generated -SPIR-V that allow the JIT compiler to discard kernels and device functions -which require aspects that the device does not support. Although this solution -requires an extension to SPIR-V, we think it is the better direction because it -is aligned with the [device-if][5] feature, which will also requires this same -SPIR-V extension. - -[5]: - -The idea is to emit a SPIR-V specialization constant for each aspect that is -required by a kernel or device function in the device image. We then introduce -a new SPIR-V "decoration" that tells the JIT compiler to discard a function if -a specialization constant is `False`. The DPC++ runtime will set the values of -the specialization constants according to the target device, thus the JIT -compiler discards (and does not compile) any kernels or device functions that -use features which are not supported on that device. This avoids errors due to -speculative compilation of kernels or device functions. - -#### Representation in SPIR-V - -To illustrate how kernels using optional features are represented in SPIR-V, -consider a kernel `K` that requires aspects `A1` and `A2`. The SPIR-V module -will contain three boolean specialization constants: one representing `A1`, one -representing `A2`, and one representing the expression `A1 && A2`. All of -these can be represented without any extension to SPIR-V. +as though there was an implicit +`[[sycl_detail::uses_aspects(has(aspect::fp64))]]` attribute for any device +code that uses the `double` type. -``` -OpDecorate %11 SpecId 1 ; External ID for spec const A1 -OpDecorate %12 SpecId 2 ; External ID for spec const A2 -%10 = OpTypeBool -%11 = OpSpecConstantFalse %10 ; Represents A1 -%12 = OpSpecConstantFalse %10 ; Represents A2 -%13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 +### New LLVM IR metadata + +In order to communicate the information from `[[sycl::requires()]]` and +`[[sycl_detail::uses_aspects()]]` attributes to the DPC++ post-link tool, we +introduce two new LLVM IR metadata that can be attached to a function +definition, similar to the existing `!intel_reqd_sub_group_size`. + +These new metadata are named `!intel_allowed_aspects` and +`!intel_used_aspects`. In each case, the parameter is an (unnamed) metadata +node, and the value of the metadata node is a list of `i32` constants, where +each constant is a value from `enum class aspect`. For example, the following +illustrates the IR that corresponds to a function `foo` that is decorated with +`[[sycl::requires()]]` where the required aspects have the numerical values +`8` and `9`. In addition, the function uses an optional feature that +corresponds to an aspect with numerical value `8`. + +``` +define void @foo() !intel_allowed_aspects !1 !intel_used_aspects !2 {} +!1 = !{i32 8, i32 9} +!2 = !{i32 8} ``` -In order to make it easy for the JIT compiler to discard all functions in a -kernel, each function in the kernel's static call tree (including the function -representing the kernel's entry point) is decorated with a new extended SPIR-V -decoration `ConditionalINTEL` whose operand is the `` of the specialization -constant representing `A1 && A2`. The semantic of this decoration is that the -JIT compiler must discard the function unless the value of the specialization -constant is `True`. Augmenting the example from above: + +### Changes to the DPC++ front-end + +The front-end of the device compiler is responsible for parsing the +`[[sycl::requires()]]` and `[[sycl_detail::uses_aspects()]]` attributes and +transferring the information to the LLVM IR `!intel_allowed_aspects` and +`!intel_used_aspects` metadata. Processing the `[[sycl::requires()]]` +attribute is straightforward. When a device function is decorated with +this attribute, the front-end emits an `!intel_allowed_aspects` metadata +on the function definition with the numerical values of the aspects in +the attribute. + +The front-end also emits an `!intel_uses_aspects` metadata for a function *F* +listing all the aspects that the function "uses". A function "uses" an aspect +in the following cases: + +* The function *F* contains a potentially evaluated expression that makes a + direct call (i.e. not through a function pointer) to some other function *C* + that is decorated with the `[[sycl_detail::uses_aspects()]]` attribute, and + that expression is not in a statement that is discarded through + `constexpr if`. In this case, the function *F* uses all of the aspects named + in that attribute. + +* The function *F* contains a potentially evaluated expression that does any of + the following with a type *T* that is decorated with the + `[[sycl_detail::uses_aspects()]]` attribute, and that expression is not in a + statement that is discarded through `constexpr if`. In this case, the + function *F* uses all of the aspects named in that attribute: + + - Defines an object (including a temporary object) of type *U*. + - Calls a "new expression" of type *U*. + - Throws an expression of type *U*. + - Contains a cast to type *U*. + - References a literal of type *T*. + + Where the type *U* is any of the following: + + - The type *T*. + - A cv-qualified version of type *T*. + - An array of, pointer to, or reference to type *T*. + - A type that derives from type *T*. + - A class type that contains a non-static member object of type *T*. + - Any type that applies these rules recursively to type *T* (e.g. array of + pointers to type *T*, etc.) + + When applying these rules, the front-end treats any use of the `double` type + as though it was implicitly decorated with + `[[sycl_detail::uses_aspects(has(aspect::fp64))]]`. + +If the `[[sycl_detail::uses_aspects()]]` attribute decorates a base class +member function and a derived class overrides the member function, the +overriding member function does not automatically inherit the attribute. +Therefore, when the front-end considers the set of aspects used by a call to a +member function, it need not consider any `[[sycl_detail::uses_aspects()]]` +attributes that decorate overridden versions of the function. + +As noted earlier, standard SYCL does not allow indirect function calls or +virtual functions in device code, although a DPC++ extension that adds some +limited form of indirect function call is being contemplated. If this +extension allows virtual functions, we expect that when +`[[sycl_detail::uses_aspects()]]` decorates a virtual function, it applies only +to the static type of the class. Therefore, when the front-end considers the +set of aspects used by a virtual function call like the following: ``` -OpDecorate %11 SpecId 1 ; External ID for spec const A1 -OpDecorate %12 SpecId 2 ; External ID for spec const A2 -OpDecorate %16 ConditionalINTEL %13 ; Says to discard the function - ; below when (A1 && A2) is False -%10 = OpTypeBool -%11 = OpSpecConstantFalse %10 ; Represents A1 -%12 = OpSpecConstantFalse %10 ; Represents A2 -%13 = OpSpecConstantOp %10 LogicalAnd %11 %12 ; Represents A1 && A2 -%14 = OpTypeVoid -%15 = OpTypeFunction %14 - -%16 = OpFunction %14 None %15 ; Definition of function that is -... ; discarded when (A1 && A2) is False -OpFunctionEnd +void foo(Base *b) { + b->bar(); +} ``` -See the extension specification of [SpecConditional][6] for a full -description of this new SPIR-V decoration. +It considers only the `[[sycl_detail::uses_aspects()]]` attribute that may +decorate the definition of `Base::foo()` even though the application may pass a +pointer to a derived class which decorates `foo()` differently. -[6]: -#### Representation in LLVM IR +### Changes to other phases of clang -**TODO**: I need some help here on how to represent the `[[sycl::requires()]]` -attributes in LLVM IR. I suspect there is already some mechanism for -representing SYCL attributes in LLVM IR, so hopefully we can mostly reuse that -mechanism. +Any clang phases that do function inlining will need to be changed, so that the +`!intel_allowed_aspects` and `!intel_uses_aspects` metadata are transferred +from the inlined function to the function that receives the inlined function +body. Presumably, there is already similar logic for the existing +`!reqd_work_group_size` metadata, which already decorates device functions. -#### Modifications to the post-link tool -The post-link tool must be modified to add the SPIR-V `ConditionalINTEL` -decorations to the appropriate functions and to emit the specialization -constants that these decorations reference. This can be done with two passes -over the static call tree for each kernel and each exported device function. -When this phase computes the static call tree, it considers all code in any of -the translation units that are being linked together. This may not be the -complete call tree, however, in cases where a kernel calls out to a device -function that is defined in a different shared library. +### Changes to the post-link tool + +As noted in the requirements section above, DPC++ currently bundles kernels +together regardless of the optional features they use, and this can lead to +problems resulting from speculative compilation. To illustrate, consider +kernel *K1* that uses no optional features and kernel *K2* that uses a feature +corresponding to aspect *A*, and consider the case when *K1* and *K2* are +bundled together in the same device image. Now consider an application that +submits *K1* to a device that does not have aspect *A*. The application should +expect this to work, but DPC++ currently fails because JIT-compiling *K1* +causes the entire bundle to be compiled, and this fails when trying to compile +*K2* for a device that does not have aspect *A*. + +We solve this problem by changing the post-link tool to bundle kernels +according to the aspects that they use. + +The post-link tool is also a convenient place to issue a diagnostic when a +function uses aspects that it is not allowed to use (i.e. the function is +decorated with `[[sycl::requires()]]` and it uses some optional feature that +corresponds to an aspect that is not in the `[[sycl::requires()]]` list). + +The post-link tool achieves this by examining the static call graph of each +kernel and each exported device function. When this tool computes the static +call graph, it considers all code in any of the translation units that are +being linked together. This may not be the complete call graph, however, in +cases where a kernel calls out to a device function that is defined in a +different shared library. **NOTE**: In this context, "exported device function" means a device function that is exported from a shared library as defined by [Device Code Dynamic -Linking][7]. - -[7]: - -The first pass operates on the static call tree for each kernel and each -exported device function, propagating the aspects that are used up from the -leaves of the call tree. The result of this pass is that each function in -the call tree is labeled with the union of all aspects that are used in that -function or in any of the functions it calls. We call this the `Used` set of -aspects. - -The error checking in the front-end of the compiler has already verified that -a function decorated with the `[[sycl::requires()]]` attribute does not use -any optional features other than those listed in the attribute. However, the -static call tree constructed by the front-end may not be as complete as the -call tree constructed by the post-link tool, for example when a kernel calls -a `SYCL_EXTERNAL` device function that is defined in another translation unit. -Since the effort is minimal, we do the error checking again in the post-link -tool in order to catch more errors. - -If any of the device functions was annotated with the `[[sycl::requires()]]` -attribute, we call the set of aspects in that attribute the `Allowed` set. -If the `Used` set contains any aspects not in the `Allowed` set, we issue a -diagnostic. Note that we do this analysis for every device function, not -just the ones that correspond to kernels or to exported device functions. - -**TODO**: Can this diagnostic include the source position of the attribute and -the source position of the code that uses optional feature? It think this -depends on the information in the LLVM IR, which is not defined yet. - -After checking for diagnostics, we compute the union of the `Used` set and the -`Allowed` set (if any) for each kernel and each exported device function. We -call this the function's `Required` set of aspects. We then do the following -for each kernel and each exported device function: - -* For each aspect in the `Required` set, emit an `OpSpecConstantFalse` - instruction to represent this requirement. We maintain a set of "required - specialization constants" for each kernel or exported device function, which - is used later. Add this specialization constant to that set. In addition, - add an "aspect" entry to the device image's "SYCL/requirements" property set, - as described below. - -* If the kernel function is decorated with the `[[reqd_work_group_size()]]` - attribute, emit an `OpSpecConstantFalse` op to represent this requirement and - add this also to the kernel's set of required specialization constants. In - addition, add a "reqd\_work\_group\_size" entry to the device image's - "SYCL/requirements" property set. - -* If the kernel function is decorated with the `[[reqd_sub_group_size()]]` - attribute, emit an `OpSpecConstantFalse` op to represent this requirement and - add this also to the kernel's set of required specialization constants. In - addition, add a "reqd\_sub\_group\_size" entry to the device image's - "SYCL/requirements" property set. - -* If the kernel or exported device function's set of required specialization - constants is not empty, emit a series of `OpSpecConstantOp` ops with the - `OpLogicalAnd` opcode to compute the expression `S1 && S2 && ...`, where - `S1`, `S2`, etc. are the specialization constants in that set. In addition, - emit a `ConditionalINTEL` decoration for the function which references the - `S1 && S2 && ...` specialization constant. - -The second pass propagates each kernel or exported device function's required -specialization constants back down the static call tree. This pass starts by -assigning each each kernel entry function and each exported device function the -set of required specialization constants that were computed above. The set of -required specialization constants for each remaining function `F` is computed -as `P1 || P2 || ...`, where `P1`, `P2`, etc. are the parent functions of `F` in -the static call tree. (Obviously, a `Pn` term can be omitted if the parent -function has no required specialization constants.) Once we have this -information, we do the following for each function `F` that has a non-empty set -of required specialization constants: - -* Emit a series of `OpSpecConstantOp` ops with the `OpLogicalAnd` and - `OpLogicalOr` opcodes to compute the expression `P1 || P2 || ...` described - above. - -* Emit a `ConditionalINTEL` decoration for the function, referencing this - computed specialization constant. - -In all cases above, we should keep track of the specialization constants that -are emitted and reuse them when possible, rather than emitting duplicates. - -#### New device image property set - -A new device image property set is needed to inform the DPC++ runtime of the -aspects that each kernel or exported device function requires and the -work-group or sub-group sizes that each kernel requires. This property set is -named "SYCL/requirements". The name of each property in the set is the name of -a kernel or the name of an exported device function in the device image. The -value of each property is a `BYTE_ARRAY` that has the following form: +Linking][5]. + +[5]: + +#### Pass to identify aspects used by each device function + +This pass operates on the static call graph for each kernel and each exported +device function, propagating the aspects from the `!intel_used_aspects` and +`!intel_allowed_aspects` metadata from the leaves of the call graph up to their +callers. The result of this pass is that each device function is labeled with +a *Used* set of aspects which is computed as the union of the following: + +* The aspects in the function's `!intel_used_aspects` metadata (if any). +* The aspects in the function's `!intel_allowed_aspects` metadata (if any). +* The aspects in the *Used* set of all functions called by this function. + +Once the *Used* set of aspects is known for each function, the post-link tool +compares this set of aspects with the aspects from any `!intel_allowed_aspects` +metadata. If the function has this metadata and if the *Used* set contains +aspects not in that set, it issues a warning indicating that the function uses +aspects that are not in the `[[sycl::requires()]]` list. Unfortunately, the +post-link tool is unable to include the source position of the code that uses +the aspect in question. To compensate, the warning message must include +instructions telling the user how to run the clang static analyzer which +provides a better diagnostic. This analysis phase is described in more detail +below. + +#### Changes to the device code split algorithm + +The algorithm for splitting device functions into images must be changed to +account for the *Used* aspects of each kernel or exported device function. The +goal is to ensure that two kernels or exported device functions are only +bundled together into the same device image if their *Used* sets are identical. + +We must also split two kernels into different device images if they have +different `[[sycl::reqd_sub_group_size()]]` or different +`[[sycl::reqd_work_group_size()]]` values. The reasoning is similar as the +aspect case. The JIT compiler currently raises an error if it tries to compile +a kernel that has a required sub-group size if the size isn't supported by the +target device. The behavior for required work-group size is less clear. The +Intel implementation does not raise a JIT compilation error when compiling a +kernel that uses an unsupported work-group size, but other backends might. +Therefore, it seems safest to split device code based required work-group size +also. + +Therefore, two kernels or exported device functions are only bundled together +into the same device image if all of the following are true: + +* They share the same set of *Used* aspects, +* They either both have no required sub-group size or both have the same + required sub-group size, and +* They either both have no required work-group size or both have the same + required work-group size. + +These criteria are an additional filter applied to the device code split +algorithm after taking into account the `-fsycl-device-code-split` command line +option. If the user requests `per_kernel` device code split, then each kernel +is already in its own device image, so no further splitting is required. If +the user requests any other option, device code is first split according to +that option, and then another split is performed to ensure that each device +image contains only kernels or exported device functions that meet the criteria +listed above. + +#### Create the "SYCL/image-requirements" property set + +The DPC++ runtime needs some way to know about the *Used* aspects, required +sub-group size, and required work-group size of an image. Therefore, the +post-link tool provides this information in a new property set named +"SYCL/image-requirements". + +The following table lists the properties that this set may contain and their +types: + +Property Name | Property Type +------------- | ------------- +"aspect" | `PI_PROPERTY_TYPE_BYTE_ARRAY` +"reqd\_sub\_group\_size" | `PI_PROPERTY_TYPE_BYTE_ARRAY` +"reqd\_work\_group\_size" | `PI_PROPERTY_TYPE_BYTE_ARRAY` + +There is an "aspect" property if the image's *Used* set is not empty. The +value of the property is an array of `uint32` values, where each `uint32` value +is the numerical value of an aspect from `enum class aspect`. The size of the +property (which is always divisible by `4`) tells the number of aspects in the +array. + +There is a "reqd\_sub\_group\_size" property if the image contains any kernels +with a required sub-group size. The value of the property is a `uint32` value +that tells the required size. (The device code split algorithm ensures that +there are never two kernels with different required sub-group sizes in the same +image.) + +There is a "reqd\_work\_group\_size" property if the image contains any kernels +with a required work-group size. The value of the property is a `BYTE_ARRAY` +with the following layout: ``` -[entry_count (uint32)] -[entry_type (uint32)] -[entry_type (uint32)] -... -[entry_type (uint32)] + ... ``` -Where `entry_count` tells the number of subsequent entries. Each entry has a -variable number of parameters according to its type. The allowable types are: +Where `dim_count` is the number of work group dimensions (i.e. 1, 2, or 3), and +`dim0 ...` are the values of the dimensions from the +`[[reqd_work_group_size()]]` attribute, in the same order as they appear in the +attribute. -``` -enum { - aspect, - reqd_work_group_size, - reqd_sub_group_size -}; -``` +**NOTE**: One may wonder why the type of the "reqd\_sub\_group\_size" property +is not `PI_PROPERTY_TYPE_UINT32` since its value is always 32-bits. The +reason is that we may want to expand this property in the future to contain a +list of required sub-group sizes. Likewise, the "reqd\_work\_group\_size" +property may be expanded in the future to contain a list of required work-group +sizes. + + +### Changes specific to AOT mode + +In AOT mode, DPC++ normally invokes either the `ocloc` command or the +`opencl-aot` command on each SPIR-V device image to compile the SPIR-V into +native code for the devices specified by the `-fsycl-targets` command line +option. This causes a problem, though, for device images that use optional +features because these commands could fail if they attempt to compile SPIR-V +using an optional feature that is not supported by the target device. We +therefore need some way to avoid calling these commands in these cases. -The format of each entry type is as follows: +The overall design is as follows. The DPC++ installation includes a +configuration file that has one entry for each device that we support. Each +entry lists the set of aspects that the device supports and a list of the +sub-group sizes that it supports. DPC++ then consults this configuration +file to decide whether to invoke `ocloc` or `opencl-aot` on each SPIR-V device +image, using the information from the device image's "SYCL/image-requirements" +property set. + +#### Format of the configuration file + +The configuration file uses a simple YAML format where each top-level key is +the name of a device. There are sub-keys under each device for the supported +aspects and sub-group sizes. For example: ``` -[aspect (uint32)] [aspect_id (uint32)] [spec_id (uint32)] -[reqd_work_group_size (uint32)] [dim_count (uint32)] [dim0 (uint32)] ... [spec_id (uint32)] -[reqd_sub_group_size (uint32)] [dim (uint32)] [spec_id (uint32)] +gen9: + aspects: [1, 2, 3] + sub-group-sizes: [8, 16] +avx512: + aspects: [1, 2, 3, 9, 11] + sub-group-sizes: [8, 32] ``` -Where the parameter names have the following meaning: - -Parameter | Definition ---------- | ---------- -`aspect_id` | The value of the aspect from the `enum class aspect` enumeration. -`dim_count` | The number of work group dimensions (1, 2, or 3). -`dim0` ... | The value of a dimension from the `[[reqd_work_group_size]]` attribute. -`dim` | The value of the sub-group size from the `[[reqd_sub_group_size]]` attribute. -`spec_id` | The SPIR-V `SpecId` decoration for the specialization constant that the post-link tool generated for this requirement. - -Note that the post-link tool will generate a series of `OpSpecConstantOp` ops -when the kernel or exported device function has multiple requirements. -However, each property list entry contains only the `SpecId` of the -`OpSpecConstantFalse` op that is associated with a single requirement. - -#### Modifications to the DPC++ runtime - -Modifications are also required to the DPC++ runtime in order to set the values -of the specialization constants that correspond to each kernel requirement. In -addition, the runtime needs to check if the target device supports each of the -kernel's requirements, and it must raise an `errc::kernel_not_supported` -exception if it does not. - -When a kernel is submitted to a device, the runtime finds the device image that -contains the kernel and also finds the kernel's entry in the -"SYCL/requirements" property set. This entry tells the set of requirements for -the kernel. If the target device does not support all of these requirements, -then the runtime raises `errc::kernel_not_supported`. This check can be done -before the device image is JIT compiled, so the exception can be thrown -synchronously. - -If the kernel imports device function symbols from a shared library as defined -in [Device Code Dynamic Linking][7], the runtime first identifies all the -device images that define these exported device functions. Before attempting -to link them together, the runtime finds the entries for the exported device -functions in their "SYCL/requirements" property sets and checks that the device -supports all these requirements. If it does not, the runtime throws -`errc::kernel_not_supported`. - -Whenever the runtime submits a SPIR-V image to the backend for online -compilation, it must do the following additional steps: - -* Compute the union of all requirements from all entries in the image's - "SYCL/requirements" property set. - -* Query the target device to see whether it supports each of these - requirements, yielding either `True` or `False` for each one. - -* Set the value of each associated specialization constant when JIT compiling - the device image for this target device. - -Note that the runtime's cache of compiled device images does not need any -special modification because the cache already needs to know the values of all -the specialization constants that were used to compile the device image. We -just need to make sure the cache is also aware of the specialization constants -which correspond to the requirements from the "SYCL/requirements" property set. - -#### Modifications to the GEN compiler - -The GEN compiler, of course, needs to be modified to implement the new -`ConditionalINTEL` SPIR-V decoration. It must discard any function with this -decoration (unless the corresponding specialization constant is `True`), and it -must not raise any sort of error due to compilation of these discarded -functions. - -### AOT case - -The AOT case uses exactly the same solution as the JIT case described above, -but there is one extra steps. For the AOT case, the post-link tool must set -the values of the specialization constants that correspond to the requirements -for the kernel or exported device function, using the device named in the -"-fsycl-targets" command line option. After doing this, the post-link tool -calls the AOT compiler to generate native code from SPIR-V as it normally does. -If more than one target device is specified, the post-link tool sets the -specialization constants separately for each device before generating native -code for that device. - -Note that the native device image may not contain all kernels or all exported -device functions if they use optional features. Nevertheless, the -"SYCL/requirements" property set still has entries for all kernel functions and -all exported device functions. If the application attempts to invoke one of -the discarded kernels on a device (which does not support the kernel's -features), the runtime will see that the kernel is not supported by using -information from the "SYCL/requirements" property set, and the runtime will -raise an exception. Thus, the runtime will never attempt to invoke one of -these discarded kernels. Likewise, if a kernel imports a discarded device -function, the runtime will see that the device function is unsupported and -will raise an exception before attempting to perform the dynamic link. +The values of the aspects in this configuration file are just the numerical +values from the `enum class aspect` enumeration. + +One advantage to encoding this information in a textual configuration file is +that customers can update the file if necessary. This could be useful, for +example, if a new device is released before there is a new DPC++ release. In +fact, the DPC++ driver supports a command line option which allows the user +to select an alternate configuration file. + +**TODO**: Add more sections here describing the changes to the DPC++ driver +and related tools. Other things to describe are: + +* The names of the devices in the configuration file. +* The name of the DPC++ driver option that selects an alternate configuration + file. + + +### Changes to the DPC++ runtime + +The DPC++ runtime must be changed to check if a kernel uses any optional +features that the device does not support. If this happens, the runtime must +raise a synchronous `errc::kernel_not_supported` exception. + +When the application submits a kernel to a device, the runtime identifies all +the other device images that export device functions that are needed by the +kernel as described in [Device Code Dynamic Linking][5]. Before the runtime +actually links these images together, it compares each image's +"SYCL/image-requirements" against the features provided by the target +device. If any of the following checks fail, the runtime throws +`errc::kernel_not_supported`: + +* The "aspect" property contains an aspect that is not provided by the device, + or +* The "reqd\_sub\_group\_size" property contains a sub-group size that the + device does not support. + +There is no way currently for the runtime to query the work-group sizes that a +device supports, so the "reqd\_work\_group\_size" property is not checked. We +include this property in the set nonetheless for possible future use. + +If the runtime throws an exception, it happens even before the runtime tries to +access the contents of the device image. + + +### Clang static analyzer to diagnose unexpected aspect usage + +When a device function is decorated with the `[[sycl::requires()]]` attribute, +it is an assertion that the function (and all of the functions it calls) do not +use any optional features beyond those listed in the attribute. The post-link +tool diagnoses a warning if the function does use additional aspects, but this +diagnostic is not user-friendly for two reasons: + +* It does not contain the source position of the offending code, so it is + difficult for the user to identify the location of the problem. + +* The diagnostic happens at link time instead of compile-time, so the user + doesn't learn about it until late in the build process. + +Ideally, we would diagnose these cases in the compiler front-end, but we +believe this is contrary to the clang design principles because +inter-procedural analysis is required to identify these cases, and the clang +front-end does not normally do inter-procedural analysis. Instead, clang +normally uses a static analyzer phase to diagnose cases like this which require +deeper analysis. + +The downside is that clang does not execute the static analyzer by default, so +the user won't see these diagnostics unless they ask for the analysis. This is +one reason we issue these diagnostics in the post-link tool, which is always +run for SYCL applications. + +If this static analyzer phase is enabled, it operates on a single translation +unit, analyzing the static call graph of any device function that is decorated +with `[[sycl::requires()]]`. + +**TODO**: More information here about how it works. Presumably, the static +analyzer operates on LLVM IR, but the IR we generate in the front end doesn't +have information about source location of "used" aspects. ## Appendix: Adding an attribute to 8-byte `atomic_ref` -As described above under ["Changes to DPC++ headers"][8], we need to decorate the -`atomic_ref` type with the `[[sycl::requires()]]` attribute only when it is -specialized with an 8-byte type. This can be accomplished with some template -partial specialization tricks. The following code snippet demonstrates (best -read from bottom to top): +As described above under ["Changes to DPC++ headers"][6], we need to decorate +any SYCL type representing an optional device feature with the +`[[sycl_detail::uses_aspects()]]` attribute. This is somewhat tricky for +`atomic_ref`, though, because it is only an optional feature when specialized +for a 8-byte type. However, we can accomplish this by using partial +specialization techniques. The following code snippet demonstrates (best read +from bottom to top): -[8]: <#changes-to-dpc-headers> +[6]: <#changes-to-dpc-headers> ``` namespace sycl { @@ -642,8 +635,8 @@ class atomic_ref_impl : public atomic_ref_impl_base { // Explicit specialization for 8-byte types. Only this specialization has the // attribute. template -class [[sycl::requires(has(aspect::atomic64))]] atomic_ref_impl : - public atomic_ref_impl_base { +class [[__sycl_detail__::__uses_aspects__(has(aspect::atomic64))]] + atomic_ref_impl : public atomic_ref_impl_base { public: using atomic_ref_impl_base::atomic_ref_impl_base; }; diff --git a/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc b/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc deleted file mode 100644 index ee68e6cdd0d77..0000000000000 --- a/sycl/doc/extensions/SPIRV/SPV_INTEL_spec_conditional.asciidoc +++ /dev/null @@ -1,694 +0,0 @@ -= SPV_INTEL_spec_conditional - -== Name Strings - -SPV_INTEL_spec_conditional - -== Contact - -To report problems with this extension, please open a new issue at: - -https://github.com/intel/llvm/issues - -== Contributors - -- Greg Lueck, Intel - -== Notice - -Copyright (c) 2021 Intel Corporation. All rights reserved. - -== Status - -Draft - -== Version - -[width="40%",cols="25,25"] -|======================================== -| Last Modified Date | 2021-05-18 -| Revision | 1 -|======================================== - -== Dependencies - -This extension is written against the SPIR-V Specification, -Version 1.5 Revision 5. - -This extension requires SPIR-V 1.0. - - -== Overview - -This extension introduces a way to have conditional branches in code, where the -condition is the value of a specialization constant. Since the condition will -be know at the time when SPIR-V is compiled into native code, the conditional -code is guaranteed to be removed if the condition is false, conceptually -similar to the C++ `if constexpr` statement. - - -== Extension Name - -To use this extension within a SPIR-V module, the following *OpExtension* must -be present in the module: - ----- -OpExtension "SPV_INTEL_spec_conditional" ----- - - -== Motivation - -The primary motivation for this extension is to support languages for offload -compute such as SYCL. However, we feel this extension could also support -similar use cases for shaders and other tools that use SPIR-V. - -=== Conditional code within a kernel - -Offload compute languages have a need to write kernels that have conditional -code based on the value of a specialization constant. The syntax could vary -from one language to another, but one hypothetical syntax might look like this: - -``` -void fancy() { - /* use features specific to this fancy GPU */ -} - -void fallback() { - /* use generic features */ -} - -void foo() { - specconstexpr bool isFancyGpu = /* get value of specialization constant */; - if specconstexpr (isFancyGpu) { - fancy(); - } - else { - fallback(); - } - specconstexpr int subGroupSize = /* get value of specialization constant */; - if speconstexpr (subGroupSize == 8) { - /* algorithm specific to device with sub group size of 8 */ - } -} -``` - -In this example, the offload kernel has two `if` statements that do something -conditionally based on the features that the target device provides. Since -these device features may not correspond to SPIR-V "capabilities", it's more -flexible to use specialization constants for the conditions rather than relying -on some extension to SPIR-V capabilities. The host runtime has greater -knowledge of the device features, and it can set the values of specialization -constants accordingly. - -It is important that the compiler consuming the SPIR-V is guaranteed to remove -the conditional code (in the case when the condition is false) because that -code may call intrinsic functions or make use of SPIR-V capabilities that are -not available on the target device. If the code was not removed, the SPIR-V -client compiler might fail to compile the code even if the control flow of the -kernel ensures it is never executed. - -=== Entire kernels that are conditional - -There are also cases when a SPIR-V module may contain entire kernels that use -features that are specific to certain devices. When such a module is compiled -for a device that does _not_ support these features, we need a way to exclude -these kernels from the compilation. Obviously, these kernels could not be run -on a device that does not support them, but the need to remove them prior to -compilation goes beyond the desire to optimize the compilation time. Rather, -we need to ensure that the compilation process doesn't fail while attempting to -compile a kernel for a device that does not support its features. - -The following code snippet illustrates a hypothetical, scenario: - -``` -void kernel1() { - /* uses generic features */ -} - -[[conditional(fancy)]] -void kernel2() { - /* uses features available only on "fancy" devices */ -} -``` - -In this hypothetical example, the kernel `kernel2()` is decorated with a C++ -attribute that associates the kernel with the specialization constant `fancy`. -The host runtime can now control whether this this kernel is compiled into the -module by setting the value of that specialization constant before compiling -the SPIR-V. - -One may ask why we cannot solve this problem instead by creating two modules: -one with `kernel1()` and the other with `kernel2()`. This is a fair criticism, -since this would also solve the problem without any extension to SPIR-V. -However, a SPIR-V extension that solves the first motivating example -(conditional code within a kernel) also provides almost everything we need for -this case too. It is more convenient (and less engineering effort) to use the -same solution for both cases. - - -== High level description - -As a general strategy, this extension adds new instructions that represent the -`if specconstexpr` statements in the hypothetical code snippets above as SPIR-V -control flow instructions, rather than as `#ifdef` like instructions. We feel -this strategy makes it easier to validate SPIR-V modules that use this -extension. At the same time, the new instructions have been designed such that -a tool can easily specialize (or partially specialize) a module with a simple -algorithm that replaces the extended instructions with normal SPIR-V control -flow instructions. Such a tool need not understand the control flow graph of -the module. - -=== Branching on specialization constants - -We add three new instructions to represent control flow that is conditioned on -a specialization constant: *OpBranchSpecConstantINTEL*, -*OpBranchSpecConstantWithElseINTEL*, and *OpPhiSpecConstantINTEL*. The first -two are similar to *OpBranchConditional* except that the condition is the -__ of a specialization constant. They also identify a range of control -flow blocks that must be removed when the condition is false (or that must be -removed when the condition is true for *OpBranchSpecConstantWithElseINTEL*). - -The *OpPhiSpecConstantINTEL* instruction is similar to *OpPhi*, except that it -is used when at least one of the merged values flows from an -*OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* condition. - -Like *OpPhi*, *OpPhiSpecConstantINTEL* has a set of parameters for every parent -block. Each parent has three parameters: the __ of the parent block, the -__ of a variable that is defined when control flows from that parent block, -and an __ of a specialization constant that provides a condition that gates -the merged value. The first two __ parameters have the same meaning as a -regular *OpPhi*. The value of the condition parameter depends on whether the -parent comes from an *OpBranchSpecConstantINTEL* or -*OpBranchSpecConstantWithElseINTEL* condition: - -* If the parent is a block contained by the "then" range of - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, the - condition parameter is the same specialization constant __ as the - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* - instruction. - -* If the parent is the _False Label_ in *OpBranchSpecConstantINTEL*, then the - condition parameter is the __ of a specialization constant that is the - logical negation of the specialization constant used by - *OpBranchSpecConstantINTEL*. - -* If the parent is a block contained by the "else" range of - *OpBranchSpecConstantWithElseINTEL*, then the condition parameter is the - __ of a specialization constant that is the logical negation of the - specialization constant used by *OpBranchSpecConstantWithElseINTEL*. - -* Otherwise, the condition parameter's value is zero. (The value zero is not a - legal __, so the value zero indicates that there is no associated - specialization constant for this parent.) - -This condition parameter allows tools to specialize SPIR-V more efficiently. - -A tool that specializes SPIR-V can do so by looking at each of these -instructions individually, without considering their context in the control -flow graph. Occurrences of *OpBranchSpecConstantINTEL* and -*OpBranchSpecConstantWithElseINTEL* are replaced with *OpBranch* to either the -_True Label_ or the _False Label_, according to the value of the specialization -constant. The specializing tool is also responsible for removing either the -"then" range of blocks or the "else" range of blocks that are associated with -the *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* -instruction. These ranges are easy to identify because they are guaranteed to -be contiguous in the SPIR-V module and the *OpBranchSpecConstantINTEL* / -*OpBranchSpecConstantWithElseINTEL* instructions provide pointers to the -ranges. - -Occurrences of *OpPhiSpecConstantINTEL* are replaced with regular *OpPhi*. -When constructing the parent parameters to *OpPhi*, the specialization tool -uses the parent's condition parameter: - -* If the condition parameter's value is zero, this parent has no associated - specialization constant and the parent's other two parameters are retained - in the *OpPhi*. Otherwise, the parent is associated with a specialization - constant: - -* If that specialization constant's value is *true*, the parent's other two - parameters are retained in the *OpPhi*. - -* If that specialization constant's value is *false*, the parent is omitted - from the *OpPhi*. - -The following example demonstrates this process. Consider this unspecialized -SPIR-V: - -``` - %int = OpTypeInt 32 0 - %bool = OpTypeBool - %spec = OpSpecConstantTrue %bool -%notspec = OpSpecConstantOp %bool LogicalNot %spec - - ... - - %top = OpLabel - %10 = OpIAdd %int .... - OpBranchSpecConstantINTEL %spec %true %true %false - %true = OpLabel - %11 = OpIAdd %int .... - OpBranch %false - %false = OpLabel - %12 = OpPhiSpecConstantINTEL %int %notspec %10 %top %spec %11 %true - - ... -``` - -Specializing this code such that *%spec* is *false* yields: - -``` - %int = OpTypeInt 32 0 - %bool = OpTypeBool - %spec = OpConstantFalse %bool -%notspec = OpConstantTrue %bool - - ... - - %top = OpLabel - %10 = OpIAdd %int .... - OpBranch %false - %false = OpLabel - %12 = OpPhi %int %10 %top - - ... -``` - -Note that the process of specialization can sometimes lead to blocks that have -only one parent and *OpPhi* instructions that have only one parent, as shown in -the example above. Normal optimizations in tools that consume SPIR-V can -optimize these cases, but such optimizations are not necessary for the -correctness of the specialized code. - -=== Conditional capabilities, functions, types, etc. - -Since the specialization process will remove blocks from the control flow graph -in some cases, it may be desirable to also remove functions that are called -only from these blocks. Likewise, it may be desirable to remove variables, -types, or constants that are used only in these blocks. This may be necessary -for correctness, for example, if a function that is called only from the -removed blocks uses device features or SPIR-V capabilities that are unavailable -on the device. (See the `fancy()` function in the motivation section for an -example of this.) - -To support this case, the extension adds a new *OpConditionalCapabilityINTEL* -instruction and a new *ConditionalINTEL* decoration. If a tool that generates -SPIR-V wants to guarantee that a function, variable, type, or constant is -removed when a specialization constant has a certain value, it must use the -*ConditionalINTEL* decoration to do this. Specializing tools have no -requirement to automatically find and remove these instructions, even if the -only reference is from blocks that the specializing tool removes. Likewise, if -a tool that generates SPIR-V wants to express that a capability is only -required when a specialization constant has a certain value, it must use the -*OpConditionalCapabilityINTEL* instruction. - -The *OpConditionalCapabilityINTEL* instruction is like *OpCapability* except it -has an __ parameter which references a specialization constant. This -instruction adds a requirement for the capability only if that specialization -constant's value is *true*. - -The *ConditionalINTEL* decoration instruction takes an __ operand that -references a specialization constant. The decorated instruction will be -removed during specialization if that specialization constant's value is -*false*. - -The specialization process is very straightforward for -*OpConditionalCapabilityINTEL*. This instruction is either removed or replaced -with *OpCapability* depending on the value of the specialization constant. - -When specializing an instruction that is decorated with *ConditionalINTEL* the -*ConditionalINTEL* decoration itself is always removed. In addition, the -following happens if the specialization constant is *false*: - -* If the decorated instruction is *OpFunction*, the function and all of its - instructions are removed. All decorations for the function and its - instructions are removed. If the *OpFunction* has an associated - *OpEntryPoint*, that is also removed. Any *OpName* or *OpMemberName* that - references the *OpFunction* or any of its instructions are also removed. - -* Otherwise, the decorated instruction is removed, all decorations for the - instruction are removed, and any *OpName* or *OpMemberName* referencing the - instruction are removed. - -Tools that generate SPIR-V are responsible for ensuring that the -*ConditionalINTEL* decoration is used such that an instruction that defines an -SSA __ is never removed unless all the references to that SSA __ are -also removed regardless of the values assigned to the specialization -constants. - -Since *ConditionalINTEL* may be applied to an instruction that defines a -specialization constant, there is the possibility of ambiguity. What if -specialization constant `A` is decorated with *ConditionalINTEL*, but `A` is -also used as the _Condition_ for *OpConditionalCapabilityINTEL*, -*OpBranchSpecConstantINTEL*, *OpBranchSpecConstantWithElseINTEL*, -*OpPhiSpecConstantINTEL*, or as the _Condition_ for another *ConditionalINTEL* -decoration? We avoid these ambiguities by making this situation illegal. -If a specialization constant __ is decorated with *ConditionalINTEL*, it -may not be used as a _Condition_ for any of these instructions or for the -_Condition_ in a *ConditionalINTEL* decoration. - -=== Validation - -In order to validate a module that uses this extension, we first apply the -normal validation rules assuming that either branch of -*OpBranchSpecConstantINTEL*, or *OpBranchSpecConstantWithElseINTEL* could be -taken at runtime. This essentially means that we treat these instructions as -though they were *OpBranchConditional*, we treat *OpPhiSpecConstantINTEL* as -though it was *OpPhi*, and we treat *OpConditionalCapabilityINTEL* as though it -was *OpCapability*. We then apply some additional validation rules to ensure -that the extension's instructions and decorations are used in a way that -results in consistent code. - -These additional validation rules start by computing a specialization constant -expression `G(i)` that gates usage of each instruction `i`. The value of -`G(i)` is computed with the following rules - -* Start with `G(i) = true`. - -* If the instruction resides in a "then" range of *OpBranchSpecConstantINTEL* - or *OpBranchSpecConstantWithElseINTEL*, let `G(i) = G(i) && S` where `S` is - the specialization constant referenced by *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL*. - -* If the instruction resides in an "else" range of - *OpBranchSpecConstantWithElseINTEL*, let `G(i) = G(i) && !S` where `S` is the - specialization constant referenced by *OpBranchSpecConstantWithElseINTEL*. - -* If the instruction resides in an *OpFunction* that is decorated with - *ConditionalINTEL*, let `G(i) = G(i) && S` where `S` is the specialization - constant referenced by the *ConditionalINTEL* decoration. - -* If the instruction itself is decorated with *ConditionalINTEL*, let - `G(i) = G(i) && S` where `S` is the specialization constant referenced by the - *ConditionalINTEL* decoration. - -We then apply the following validation rules: - -* If a module requires a capability `C` that can be statically checked, and if - that capability is required only through *OpConditionalCapabilityINTEL* - instructions, we compute the specialization constant expression `G(c)` that - is the logical "or" of the specialization constants used by each of these - *OpConditionalCapabilityINTEL* instructions. We then scan through the code - looking for instructions that use capability `C`. For each such instruction - `i`, validate that `G(i)` can never be true unless `G(c)` is also true. - -* For each instruction `idef` that defines an SSA __, search for all other - instructions `i` that use __. Compute `G(iuse)` as: -+ --- - - Let `G(iuse) = G(i)`. - - If instruction `i` is *OpPhiSpecConstantINTEL*, compute `G(iuse)` - separately for each _Parent i_ as `G(iuse) = G(i) && S` where `S` is the - specialization constant _Cond i_ associated with _Parent i_. --- -+ -Validate that each `G(iuse)` can never be true unless `G(idef)` is also true. - -* For each block that is contained by the "then" range of - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, verify - that each parent block is also contained by that same "then" range (allowing, - of course, that the *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL* instruction is a parent of the first - block in that range). - -* For each block that is contained by the "else" range of - *OpBranchSpecConstantWithElseINTEL*, verify that each parent block is also - contained by that same "else" range (allowing, of course, that the - *OpBranchSpecConstantWithElseINTEL* instruction is a parent of the first - block in that range). - -* For each (_Condition i_, _Variable i_, _Parent i_) triplet of - *OpPhiSpecConstantINTEL*: - - If _Parent i_ resides in the "then" range of *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL*, verify that _Condition i_ is the same - specialization constant __ as the *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL* instruction. - - - If _Parent i_ is the _False Label_ of *OpBranchSpecConstantINTEL* or - _Parent i_ resides in the "else" range of - *OpBranchSpecConstantWithElseINTEL*, verify that _Condition i_ is the - __ of a specialization constant that is the logical negation of the - specialization constant __ used by the *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL* instruction. - - - Otherwise, verify that _Condition i_ has the value zero. - - - If _Variable i_ is defined by a block in the "then" range of - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL*, or if - _Variable i_ is defined by a block in the "else" range of - *OpBranchSpecConstantWithElseINTEL*, verify that _Parent i_ is contained by - that same "then" or "else" range. - -* For each specialization constant (*OpSpecConstantXXX*) that is decorated with - *ConditionalINTEL*, verify that the specialization constant's __ is not - also used as a _Condition_ for *OpConditionalCapabilityINTEL*, - *OpBranchSpecConstantINTEL*, *OpBranchSpecConstantWithElseINTEL*, - *OpPhiSpecConstantINTEL*, or *ConditionalINTEL*. - -*TODO*: I'm looking for feedback on the validation rules that involve -comparison of abstract specialization constants such as "validate that `G(i)` -can never be true unless `G(c)` is also true". These validations are likely -very difficult in the general case, but most cases will be very easy. (In most -cases, I think that `G(i)` and `G(c)` will both be so simple that it will be -easy to prove whether one implies the other.) How should the spec handle this? -Are the validation rules _requirements_ that a validation tool must perform, or -are they just rules about what SPIR-V is not valid? If they are not -requirements, then we could leave the wording as I have it and then each -validation tool would be free to implement the validation checks as thoroughly -as it wants. - - -== New tokens defined by this extension - -=== New capabilities - -The module must declare that it uses the following capability in order to use -any of the decorations or instructions defined in the sections below. If -declared, this capability must be unconditionally declared via *OpCapability*. -It may not be conditionally declared via *OpConditionalCapabilityINTEL*. - -[cols="1,15,5",options="header",width="100%"] -|=== -2+^| Capability | Implicitly Declares -| ???? -| *SpecConditionalINTEL* + -Module conditionally enables code based on the value of a specialization -constant. -| -|=== - -=== New decorations - -[cols="1,10,5,5",options="header",width="100%"] -|==== -2+^| Decoration | Extra Operands | Enabling Capabilities -| ???? -| *ConditionalINTEL* + -May be applied only to *OpFunction*, global (module scope) *OpVariable*, type -declarations (*OpTypeXXX*), or constant instructions (*OpConstantXXX* or -*OpSpecConstantXXX*). Indicates that the decorated instruction must be removed -if the value of the specialization constant identified by _Condition_ is -*false*. The _Condition_ must be a _Boolean type_ scalar. - -If the decorated instruction is *OpFunction*, the function and all of the -instructions it contains are removed when the specialization constant is -*false*. If the function has an associated *OpEntryPoint*, that is also -removed. -| __ _Condition_ -|*SpecConditionalINTEL* -|==== - -=== New instructions - -[cols="1,1,2*3",width="100%"] -|=== -3+|*OpConditionalCapabilityINTEL* + - + -Declare a capability that is conditionally used by this module, depending on -the value of a specialization constant. - -The _Capability_ is used by this module only if the specialization constant -identified by _Condition_ is *true*. The _Condition_ must be a _Boolean type_ -scalar. - -1+|Capability: + -*SpecConditionalINTEL* -| 3 -| ???? -| __ _Condition_ -| _Capability_ -|=== - -[cols="1,1,4*3",width="100%"] -|=== -5+|*OpBranchSpecConstantINTEL* + - + -If the specialization constant _Condition_ is *true*, branch to _True Label_, -otherwise branch to _False Label_. The _Condition_ must be a _Boolean type_ -scalar. - -The consecutive blocks from _True Label_ to _True End_ (inclusive) are called -the "then" range of this instruction. If the _Condition_ is *false*, this -range of blocks is removed from the module. The module need not declare any -capabilities used by these instructions if they are removed. - -No block in the "then" range may have a parent that is outside of that range, -except for the _True Label_ reference from this *OpBranchSpecConstantINTEL* -instruction. - -This instruction must be the last instruction in a block. - -1+|Capability: + -*SpecConditionalINTEL* -| 5 -| ???? -| __ _Condition_ -| __ _True Label_ -| __ _True End_ -| __ _False Label_ -|=== - -[cols="1,1,5*3",width="100%"] -|=== -6+|*OpBranchSpecConstantWithElseINTEL* + - + -If the specialization constant _Condition_ is *true*, branch to _True Label_, -otherwise branch to _False Label_. The _Condition_ must be a _Boolean type_ -scalar. - -The consecutive blocks from _True Label_ to _True End_ (inclusive) are called -the "then" range of this instruction. The consecutive blocks from -_False Label_ to _False End_ (inclusive) are called the "else" range of this -instruction. If the _Condition_ is *false*, the "then" range is removed from -the module. If the _Condition_ is *true*, the "else" range is removed from the -module. The module need not declare any capabilities used by these -instructions if they are removed. - -No block in the "then" range may have a parent that is outside of that range, -except for the _True Label_ reference from this -*OpBranchSpecConstantWithElseINTEL* instruction. No block in the "else" range -may have a parent that is outside of that range, except for the _False Label_ -reference from this *OpBranchSpecConstantWithElseINTEL* instruction. - -This instruction must be the last instruction in a block. - -1+|Capability: + -*SpecConditionalINTEL* -| 6 -| ???? -| __ _Condition_ -| __ _True Label_ -| __ _True End_ -| __ _False Label_ -| __ _False End_ -|=== - -[cols="1a,1,3*3",width="100%"] -|=== -4+|*OpPhiSpecConstantINTEL* + - + -The SSA phi function, when one or more of the merged values is conditionally -gated by a specialization constant. This instruction must be used instead of -*OpPhi* when any of the following are true: - -* At least one of the parent blocks is the _False Label_ of an - *OpBranchSpecConstantINTEL* instruction. - -* At least one of the parent blocks is contained by the "then" range of an - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* - instruction or is contained by the "else" range of an - *OpBranchSpecConstantWithElseINTEL* instruction. - -* At least one of the _Variable i_ is defined by a block that is contained by - the "then" range of an *OpBranchSpecConstantINTEL* or - *OpBranchSpecConstantWithElseINTEL* instruction or is defined by a block that - is contained by the "else" range of an *OpBranchSpecConstantWithElseINTEL* - instruction. - -The result is selected based on control flow: If control reached the current -block from _Parent i_, _Result Id_ gets the value that _Variable i_ had at the -end of _Parent i_. - -_Result Type_ can be any type. - -Operands are a sequence of triplets: (_Cond 1_, _Variable 1_, _Parent 1_ -block), (_Cond 2_, _Variable 2_, _Parent 2_ block), ... Each _Parent i_ block -is the label of an immediate predecessor in the CFG of the current block. -There must be exactly one _Parent i_ for each parent block of the current block -in the CFG. If _Parent i_ is reachable in the CFG and _Variable i_ is defined -in a block, that defining block must dominate _Parent i_. All Variables must -have a type matching _Result Type_. - -If _Variable i_ is defined by a block that is contained by the "then" or "else" -range of an *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* -instruction, then the associated _Parent i_ must be contained by that same -"then" or "else" range. - -Each _Cond i_ is the __ of a specialization constant that gates the -associated _Variable i_ definition. This parameter must be set as follows: - -* If the _Parent i_ is the _False Label_ of an *OpBranchSpecConstantINTEL* - instruction, _Cond i_ must be the logical negation of the specialization - constant used by that *OpBranchSpecConstantINTEL* instruction. - -* If the _Parent i_ is contained by the "then" range of an - *OpBranchSpecConstantINTEL* or *OpBranchSpecConstantWithElseINTEL* - instruction, _Cond i_ must be the same specialization __ used by that - instruction. - -* If the _Parent i_ is contained by the "else" range of an - *OpBranchSpecConstantWithElseINTEL* instruction, _Cond i_ must be the logical - negation of the specialization constant used by that - *OpBranchSpecConstantWithElseINTEL* instruction. - -* Otherwise, _Variable i_ is not gated by a specialization constant, and - _Cond i_ must have the value zero (which is not a legal value for any - __). - -Within a block, this instruction must appear before all other instructions -aside from *OpPhi*, other instances of *OpPhiSpecConstantINTEL*, *OpLine*, or -*OpNoLine*. - -1+|Capability: + -*SpecConditionalINTEL* -| 3 + variable -| ???? -| __ _Result Type_ -| _Result _ -| {__ \| _0_}, __, __, ... + - _Cond_, _Variable_, _Parent_, ... -|=== - - -== Modifications to the SPIR-V Specification - -*TODO*: Exact wording changes for the SPIR-V specification will be proposed -once there is agreement on the semantics of this extension. - - -== Issues - -1) The motivation section currently lists only the use cases that are important - to SYCL. However, there have been previous proposals for conditional code - in SPIR-V that were motivated by shader use cases. Should these use cases - also be listed in the motivation section? Some of the shader use cases - would require small additions to this extension, so if we add those - motivating use cases, we would also need to add a few more instructions to - this extension proposal (instructions that would not be useful for SYCL). - -2) Some of the validation rules listed above would be difficult to implement in - the general case. (See *TODO* comment above.) How should this be resolved? - -3) Need to assign real numbers to the new tokens, replacing the "????" - placeholders. - - -== Revision History - -[cols="5,15,15,70"] -[grid="rows"] -[options="header"] -|======================================== -|Rev|Date|Author|Changes -|1|2021-05-18|Greg Lueck|*First public draft* -|======================================== From 212ef89504a0f7c75bacdeab8f9e78c53823f5bd Mon Sep 17 00:00:00 2001 From: kbobrovs Date: Sat, 19 Jun 2021 16:38:30 -0700 Subject: [PATCH 11/22] [SYCL] Fixed AOT details in optional feature support design. - There are other AOT compilers beside of ocloc and opencl-aot - AOT input can be bitcode as well - Configuration file should allow symbolic identification of aspects - Each entry in the config file should have an ID of the associated AOT compiler Signed-off-by: kbobrovs --- sycl/doc/OptionalDeviceFeatures.md | 56 ++++++++++++++++++++---------- 1 file changed, 37 insertions(+), 19 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 7c64cf416d657..b44ef55538219 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -487,39 +487,59 @@ sizes. ### Changes specific to AOT mode -In AOT mode, DPC++ normally invokes either the `ocloc` command or the -`opencl-aot` command on each SPIR-V device image to compile the SPIR-V into -native code for the devices specified by the `-fsycl-targets` command line -option. This causes a problem, though, for device images that use optional -features because these commands could fail if they attempt to compile SPIR-V -using an optional feature that is not supported by the target device. We -therefore need some way to avoid calling these commands in these cases. +In AOT mode, for each AOT target specified by the `-fsycl-targets` command +line option, DPC++ normally invokes the AOT compiler for each device IR module +resulting from the sycl-post-link tool. For example, this is `ocloc` command +for Intel Gen AOT target and `opencl-aot` command for the x86 AOT target with +SPIR-V as the input, or other specific tools for the PTX target with LLVMIR +bitcode input. This causes a problem, though, for IR modules that use optional +features because these commands could fail if they attempt to compile IR using +an optional feature that is not supported by the target device. We therefore +need some way to avoid calling these commands in these cases. The overall design is as follows. The DPC++ installation includes a -configuration file that has one entry for each device that we support. Each +configuration file that has one entry for each device that it supports. Each entry lists the set of aspects that the device supports and a list of the sub-group sizes that it supports. DPC++ then consults this configuration -file to decide whether to invoke `ocloc` or `opencl-aot` on each SPIR-V device -image, using the information from the device image's "SYCL/image-requirements" +file to decide whether to invoke a particular AOT compiler on each device IR +module, using the information from the module's "SYCL/image-requirements" property set. -#### Format of the configuration file +#### Device configuration file The configuration file uses a simple YAML format where each top-level key is -the name of a device. There are sub-keys under each device for the supported -aspects and sub-group sizes. For example: +a name of a device architecture. These names correspond to SYCL aspect enum +identifiers as defined in the [TBD] API header. There are sub-keys under each +device for the supported aspects, sub-group sizes and AOT compiler ID. For +example: ``` -gen9: +gen11_1: aspects: [1, 2, 3] sub-group-sizes: [8, 16] -avx512: + aot-compiler-id: gen-spir64 +gen_icl: + aspects: [2, 3] + sub-group-sizes: [8, 16] + aot-compiler-id: gen-spir64 +x86_64_avx512: aspects: [1, 2, 3, 9, 11] sub-group-sizes: [8, 32] + aot-compiler-id: x86-spir64 ``` -The values of the aspects in this configuration file are just the numerical -values from the `enum class aspect` enumeration. +The values of the aspects in this configuration file can be the numerical +values from the `enum class aspect` enumeration or the enum identifer itself. +For each valid AOT compiler ID the driver has a built-in rule how to construct +an AOT compilation command line based on given architecture name. For example, +for the `gen11_1` and `gen_icl` architectures, the driver sees `gen-spir64` +as the AOT compiler ID, so it knows that the `ocloc` tool must be used, and it +also knows how to translate the `gen11_1` or `gen_icl` to proper `ocloc` +architecture specification option. + +*NOTE: new kinds of AOT compilers are expected to appear very rarely, so +developing some kind of "AOT compiler plugin" mechanism is impractical, and +hardcoding AOT compiler types in the driver is resonable.* One advantage to encoding this information in a textual configuration file is that customers can update the file if necessary. This could be useful, for @@ -534,7 +554,6 @@ and related tools. Other things to describe are: * The name of the DPC++ driver option that selects an alternate configuration file. - ### Changes to the DPC++ runtime The DPC++ runtime must be changed to check if a kernel uses any optional @@ -561,7 +580,6 @@ include this property in the set nonetheless for possible future use. If the runtime throws an exception, it happens even before the runtime tries to access the contents of the device image. - ### Clang static analyzer to diagnose unexpected aspect usage When a device function is decorated with the `[[sycl::requires()]]` attribute, From 23b649d74d96f384dbab62bf1de5255e3e3c211f Mon Sep 17 00:00:00 2001 From: kbobrovs Date: Sun, 20 Jun 2021 02:06:06 -0700 Subject: [PATCH 12/22] [SYCL] Design of optional features support in clang driver. - Updated clang action graph. - Introduced a new 'aspect-filter' tool Signed-off-by: kbobrovs --- sycl/doc/OptionalDeviceFeatures.md | 88 +- sycl/doc/images/DeviceLinkAOTAndWrap.svg | 5821 ++++++++++++++++++++++ 2 files changed, 5904 insertions(+), 5 deletions(-) create mode 100644 sycl/doc/images/DeviceLinkAOTAndWrap.svg diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index b44ef55538219..f93f2108b5db4 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -547,12 +547,90 @@ example, if a new device is released before there is a new DPC++ release. In fact, the DPC++ driver supports a command line option which allows the user to select an alternate configuration file. -**TODO**: Add more sections here describing the changes to the DPC++ driver -and related tools. Other things to describe are: - +**TODO**: * The names of the devices in the configuration file. -* The name of the DPC++ driver option that selects an alternate configuration - file. +* Location of the default device configuration file + +#### New features in clang compilation driver and tools + +NOTE: the term *device binary image* or *image* is used to refer to a device +code form consumable by the SYCL runtime. Earlier device code forms are referred +to as *device code module* or *device IR module*. In case of AOT, device binary +image is a natively compiled binary, and IR module - either a SPIR-V or LLVMIR +bitcode module. + +##### Overview +After the `sycl-post-link` performs necessary aspect usage analysis and splits +the incoming monolythic device code module into pieces - smaller device code +modules - it outputs a file table as a result. Each row in the table corresponds +to an individual output module, and each element of a row is a name of a file +containing necessary information about the module, such as the code itself, its +properties. + +At the action graph building stage for each requested AOT compilation target - +SPIR-V-based (such as Gen targets) and/or non-SPIR-V-based (such as PTX) - the +driver adds an `aspect-filter` action which filters out input file table rows +with device code modules using features unsupported on current target. Then the +output table goes as input into the AOT stage, and the prior filtering +guarantees that the AOT compiler will not encounter device code it can't +compile. In the extreme case when all device code +modules use unsupported aspects, the input file table will be empty. The picture +below illustrates the action graph built by the clang driver along with file +lists and tables generated and consumed by various nodes of the graph. The +example set of targets used for the illustration is 4 targets +- spir64 (runtime JITted SPIR-V) +- AOT targets + - non-SPIR-V based + - nvptx (PTX) + - SPIR-V based + - gen_11 (Intel Gen) + - x86_64_avx512 (AVX512) +
+
+ +![Device SPIRV translation and AOT compilation](images/DeviceLinkAOTAndWrap.svg) + +##### Aspect filter tool + +This tool transforms an input file table by removing rows with device code files +that use features unsupported for the target architecture given as tool's +argument. + +*Input*: +- file table, normally coming out of `sycl-post-link` or `file-table-tform` + tools + +*Command line arguments* +- target device architecture to filter for +- path to the device configuration file + +*Output* +- the input file table filtered as needed + +In more details, the tool performs the following actions: +1) Checks if the input file table contains "Properties" column. If not, copies + the input file table to output and exits without error. +1) Reads in the device configuration file and finds some entry `E` corresponding + to the architecture given on the command line. If there is no such entry - + reports and error and exits. +1) For each row in the input file table: + - loads the properties file from the "Properties" column + - checks if there is the `SYCL/image-requirements` property + - if no, copies current row to the output file table and goes to the next + - if yes, checks if all the requirements listed in the property are supported + by the target architecture as specified in the device configuration file + - if yes, copies current row to the output file table and goes to the + next + - otherwise skips this row + +##### Configuration file location and driver option + +A default device configuration file is supplied as a part of OneAPI SDK. It is +located in the TBD directory. Users may override the defalt using the +``` +-fsycl-device-config-file= +``` +option. ### Changes to the DPC++ runtime diff --git a/sycl/doc/images/DeviceLinkAOTAndWrap.svg b/sycl/doc/images/DeviceLinkAOTAndWrap.svg new file mode 100644 index 0000000000000..0f2c27a1f0579 --- /dev/null +++ b/sycl/doc/images/DeviceLinkAOTAndWrap.svg @@ -0,0 +1,5821 @@ + + + + + + image/svg+xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + (f r o m l lv m-l i n k)(t o h o s t l i n k e r) + + + + + + + + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp1.bc + + + props1.txt + + + asp2.bc + + + props2.txt + + + asp3.bc + + + props3.txt + + + asp4.bc + + + props4.txt + + + + + + + + + + + + + + Device config file-fsycl-targets=spir64,ptx,gen_11,x86_64_avx512 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp1_gen.bin + + + props1.txt + + + asp2_gen.bin + + + props2. + + + txt + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp1 + + + .spv + + + props1.txt + + + asp2 + + + .spv + + + props2. + + + txt + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp1.bc + + + props1.txt + + + asp3.bc + + + props3.txt + + + + + + + + + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp1 + + .spv + + props1.txt + + + asp2 + + .spv + + props2.txt + + + asp3 + + .spv + + props3.txt + + + asp4 + + .spv + + props4.txt + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp2.spv + + + props2. + + + txt + + + asp3.spv + + + props3. + + + txt + + + + + + + + + + + + + + + + + + + CodeProps + + … + + + asp2_x86.bin + + + props2. + + + txt + + + asp3_x86.bin + + + props3. + + + txt + + + + + + + + + + + + + + + + + + + + Code + + + Props + + + … + + + asp1.ptx + + + props1.txt + + + asp3.ptx + + + props3. + + + txt + + + + + + + + + + asp1.spv + + + asp2.spv + + + + + + + + + + asp1.bc + + + asp3.bc + + + + + + + + + + + + + + asp1.bc + + + asp2.bc + + + asp3.bc + + + asp4.bc + + + + + + + + + + + + + + asp1.spv + + + asp2.spv + + + asp3.spv + + + asp4.spv + + + + + + + + + + asp1. + + + ptx + + + asp3. + + + ptx + + + + + + + + + + asp2.spv + + + asp3.spv + + + + + + + + + + asp1_gen. + + bin + + asp2_gen. + + bin + + + + + + + + + asp2_x86. + + bin + + asp3_x86. + + bin + + From 4326dfdb1028c234f61410b847c24486333b712d Mon Sep 17 00:00:00 2001 From: kbobrovs Date: Tue, 22 Jun 2021 12:52:10 -0700 Subject: [PATCH 13/22] Revert "[SYCL] Design of optional features support in clang driver." This reverts commit 23b649d74d96f384dbab62bf1de5255e3e3c211f. Separate patch for the clang driver changes will be created. --- sycl/doc/OptionalDeviceFeatures.md | 88 +- sycl/doc/images/DeviceLinkAOTAndWrap.svg | 5821 ---------------------- 2 files changed, 5 insertions(+), 5904 deletions(-) delete mode 100644 sycl/doc/images/DeviceLinkAOTAndWrap.svg diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index f93f2108b5db4..b44ef55538219 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -547,90 +547,12 @@ example, if a new device is released before there is a new DPC++ release. In fact, the DPC++ driver supports a command line option which allows the user to select an alternate configuration file. -**TODO**: +**TODO**: Add more sections here describing the changes to the DPC++ driver +and related tools. Other things to describe are: + * The names of the devices in the configuration file. -* Location of the default device configuration file - -#### New features in clang compilation driver and tools - -NOTE: the term *device binary image* or *image* is used to refer to a device -code form consumable by the SYCL runtime. Earlier device code forms are referred -to as *device code module* or *device IR module*. In case of AOT, device binary -image is a natively compiled binary, and IR module - either a SPIR-V or LLVMIR -bitcode module. - -##### Overview -After the `sycl-post-link` performs necessary aspect usage analysis and splits -the incoming monolythic device code module into pieces - smaller device code -modules - it outputs a file table as a result. Each row in the table corresponds -to an individual output module, and each element of a row is a name of a file -containing necessary information about the module, such as the code itself, its -properties. - -At the action graph building stage for each requested AOT compilation target - -SPIR-V-based (such as Gen targets) and/or non-SPIR-V-based (such as PTX) - the -driver adds an `aspect-filter` action which filters out input file table rows -with device code modules using features unsupported on current target. Then the -output table goes as input into the AOT stage, and the prior filtering -guarantees that the AOT compiler will not encounter device code it can't -compile. In the extreme case when all device code -modules use unsupported aspects, the input file table will be empty. The picture -below illustrates the action graph built by the clang driver along with file -lists and tables generated and consumed by various nodes of the graph. The -example set of targets used for the illustration is 4 targets -- spir64 (runtime JITted SPIR-V) -- AOT targets - - non-SPIR-V based - - nvptx (PTX) - - SPIR-V based - - gen_11 (Intel Gen) - - x86_64_avx512 (AVX512) -
-
- -![Device SPIRV translation and AOT compilation](images/DeviceLinkAOTAndWrap.svg) - -##### Aspect filter tool - -This tool transforms an input file table by removing rows with device code files -that use features unsupported for the target architecture given as tool's -argument. - -*Input*: -- file table, normally coming out of `sycl-post-link` or `file-table-tform` - tools - -*Command line arguments* -- target device architecture to filter for -- path to the device configuration file - -*Output* -- the input file table filtered as needed - -In more details, the tool performs the following actions: -1) Checks if the input file table contains "Properties" column. If not, copies - the input file table to output and exits without error. -1) Reads in the device configuration file and finds some entry `E` corresponding - to the architecture given on the command line. If there is no such entry - - reports and error and exits. -1) For each row in the input file table: - - loads the properties file from the "Properties" column - - checks if there is the `SYCL/image-requirements` property - - if no, copies current row to the output file table and goes to the next - - if yes, checks if all the requirements listed in the property are supported - by the target architecture as specified in the device configuration file - - if yes, copies current row to the output file table and goes to the - next - - otherwise skips this row - -##### Configuration file location and driver option - -A default device configuration file is supplied as a part of OneAPI SDK. It is -located in the TBD directory. Users may override the defalt using the -``` --fsycl-device-config-file= -``` -option. +* The name of the DPC++ driver option that selects an alternate configuration + file. ### Changes to the DPC++ runtime diff --git a/sycl/doc/images/DeviceLinkAOTAndWrap.svg b/sycl/doc/images/DeviceLinkAOTAndWrap.svg deleted file mode 100644 index 0f2c27a1f0579..0000000000000 --- a/sycl/doc/images/DeviceLinkAOTAndWrap.svg +++ /dev/null @@ -1,5821 +0,0 @@ - - - - - - image/svg+xml - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - (f r o m l lv m-l i n k)(t o h o s t l i n k e r) - - - - - - - - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp1.bc - - - props1.txt - - - asp2.bc - - - props2.txt - - - asp3.bc - - - props3.txt - - - asp4.bc - - - props4.txt - - - - - - - - - - - - - - Device config file-fsycl-targets=spir64,ptx,gen_11,x86_64_avx512 - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp1_gen.bin - - - props1.txt - - - asp2_gen.bin - - - props2. - - - txt - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp1 - - - .spv - - - props1.txt - - - asp2 - - - .spv - - - props2. - - - txt - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp1.bc - - - props1.txt - - - asp3.bc - - - props3.txt - - - - - - - - - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp1 - - .spv - - props1.txt - - - asp2 - - .spv - - props2.txt - - - asp3 - - .spv - - props3.txt - - - asp4 - - .spv - - props4.txt - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp2.spv - - - props2. - - - txt - - - asp3.spv - - - props3. - - - txt - - - - - - - - - - - - - - - - - - - CodeProps - - … - - - asp2_x86.bin - - - props2. - - - txt - - - asp3_x86.bin - - - props3. - - - txt - - - - - - - - - - - - - - - - - - - - Code - - - Props - - - … - - - asp1.ptx - - - props1.txt - - - asp3.ptx - - - props3. - - - txt - - - - - - - - - - asp1.spv - - - asp2.spv - - - - - - - - - - asp1.bc - - - asp3.bc - - - - - - - - - - - - - - asp1.bc - - - asp2.bc - - - asp3.bc - - - asp4.bc - - - - - - - - - - - - - - asp1.spv - - - asp2.spv - - - asp3.spv - - - asp4.spv - - - - - - - - - - asp1. - - - ptx - - - asp3. - - - ptx - - - - - - - - - - asp2.spv - - - asp3.spv - - - - - - - - - - asp1_gen. - - bin - - asp2_gen. - - bin - - - - - - - - - asp2_x86. - - bin - - asp3_x86. - - bin - - From 105ff570131dc9a64694112260175216bc8c31d9 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 22 Jun 2021 10:12:21 -0400 Subject: [PATCH 14/22] Update section on changes to front-end Update the section "Changes to the DPC++ front-end" after meeting with the front-end team: * Add a TODO capturing the front-end team's concerns and outlining our current thoughts. We plan to mull this over and then meet again before deciding if this is the right solution. * Remove the requirement for the front-end to look at the target of each call to see if the called function is decorated with `[[sycl_detail::uses_aspects()]]`. The post-link tool can propagate this information up the call graph, just as it propagates information about other use aspects. * Simplify the language describing when a type is "used". We now say that a type is used if an expression "creates an object" of that type. This includes all of the following ways to create an object: - Defines an object of that type (e.g. declares a variable) - Calls a "new expression" of that type. - Throws an expression of that type. --- sycl/doc/OptionalDeviceFeatures.md | 123 ++++++++++++++--------------- 1 file changed, 59 insertions(+), 64 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index b44ef55538219..c8a93cdf756bd 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -265,73 +265,68 @@ define void @foo() !intel_allowed_aspects !1 !intel_used_aspects !2 {} The front-end of the device compiler is responsible for parsing the `[[sycl::requires()]]` and `[[sycl_detail::uses_aspects()]]` attributes and transferring the information to the LLVM IR `!intel_allowed_aspects` and -`!intel_used_aspects` metadata. Processing the `[[sycl::requires()]]` -attribute is straightforward. When a device function is decorated with -this attribute, the front-end emits an `!intel_allowed_aspects` metadata -on the function definition with the numerical values of the aspects in -the attribute. +`!intel_used_aspects` metadata according to the following rules: -The front-end also emits an `!intel_uses_aspects` metadata for a function *F* -listing all the aspects that the function "uses". A function "uses" an aspect -in the following cases: - -* The function *F* contains a potentially evaluated expression that makes a - direct call (i.e. not through a function pointer) to some other function *C* - that is decorated with the `[[sycl_detail::uses_aspects()]]` attribute, and - that expression is not in a statement that is discarded through - `constexpr if`. In this case, the function *F* uses all of the aspects named - in that attribute. - -* The function *F* contains a potentially evaluated expression that does any of - the following with a type *T* that is decorated with the - `[[sycl_detail::uses_aspects()]]` attribute, and that expression is not in a - statement that is discarded through `constexpr if`. In this case, the - function *F* uses all of the aspects named in that attribute: - - - Defines an object (including a temporary object) of type *U*. - - Calls a "new expression" of type *U*. - - Throws an expression of type *U*. - - Contains a cast to type *U*. - - References a literal of type *T*. - - Where the type *U* is any of the following: - - - The type *T*. - - A cv-qualified version of type *T*. - - An array of, pointer to, or reference to type *T*. - - A type that derives from type *T*. - - A class type that contains a non-static member object of type *T*. - - Any type that applies these rules recursively to type *T* (e.g. array of - pointers to type *T*, etc.) - - When applying these rules, the front-end treats any use of the `double` type - as though it was implicitly decorated with - `[[sycl_detail::uses_aspects(has(aspect::fp64))]]`. - -If the `[[sycl_detail::uses_aspects()]]` attribute decorates a base class -member function and a derived class overrides the member function, the -overriding member function does not automatically inherit the attribute. -Therefore, when the front-end considers the set of aspects used by a call to a -member function, it need not consider any `[[sycl_detail::uses_aspects()]]` -attributes that decorate overridden versions of the function. - -As noted earlier, standard SYCL does not allow indirect function calls or -virtual functions in device code, although a DPC++ extension that adds some -limited form of indirect function call is being contemplated. If this -extension allows virtual functions, we expect that when -`[[sycl_detail::uses_aspects()]]` decorates a virtual function, it applies only -to the static type of the class. Therefore, when the front-end considers the -set of aspects used by a virtual function call like the following: +* If a function is decorated with the `[[sycl::requires()]]` attribute, the + front-end emits an `!intel_allowed_aspects` metadata on the function's LLVM + IR definition with the numerical values of the aspects listed in the + attribute. -``` -void foo(Base *b) { - b->bar(); -} -``` +* If a function is decorated with the `[[sycl_detail::uses_aspects()]]` + attribute, the front-end emits an `!intel_uses_aspects` metadata on the + function's LLVM IR definition with the numerical values of the aspects + listed in the attribute. -It considers only the `[[sycl_detail::uses_aspects()]]` attribute that may -decorate the definition of `Base::foo()` even though the application may pass a -pointer to a derived class which decorates `foo()` differently. +In both of these cases, if the attribute decorates a base class member function +and a derived class overrides the member function, the overriding member +function does not automatically inherit the attribute. Therefore, the +front-end need not consider any overridden base class functions when generating +these metadata. + +The front-end also emits an `!intel_uses_aspects` metadata for a function *F* +(or augments the existing `!intel_uses_aspects` metadata for function *F*) if +the function's body "uses" a type *T* that is decorated with the +`[[sycl_detail::uses_aspects()]]` attribute. When such a type is "used", the +function *F* uses all the aspects in the attribute that decorates the type. A +function "uses" a type *T* if it contains a potentially evaluated expression +that is not discarded through `constexpr if` and if that expression does any of +the following: + +* Creates an object of type *U* (including a temporary object). +* Contains a cast to type *U*. +* References a literal of type *T*. + +Where the type *U* is any of the following: + +* The type *T*. +* A cv-qualified version of type *T*. +* An array of, pointer to, or reference to type *T*. +* A type that derives from type *T*. +* A class type that contains a non-static member object of type *T*. +* Any type that applies these rules recursively to type *T* (e.g. array of + pointers to type *T*, etc.) + +When applying these rules, the front-end treats any use of the `double` type as +though it was implicitly decorated with +`[[sycl_detail::uses_aspects(has(aspect::fp64))]]`. + +**TODO**: We are still discussing this section with the front-end team. One of +their main concerns is that they want to avoid adding a new pass over the AST. +However, it's difficult to construct the `!intel_uses_aspects` metadata as the +source is parsed for two reasons. One challenge comes from the fact that we +only consider types that are used in *potentially evaluated* expressions. This +means that the front-end must ignore types used in unevaluated expressions such +as `sizeof`. However, the parser doesn't know whether an expression is +potentially evaluated when it encounters the expression. The other challenge +is that we must not consider types used in statements that are discarded +through `constexpr if`. Again, the parser doesn't know whether a statement is +discarded at the time it encounters the statement. + +We think we can solve both of these problems by generating the +`!intel_uses_aspects` metadata during the front-end's code generation phase +(the point at which the front-end converts the AST into LLVM IR). At this +point we have already discarded statements due to `constexpr if`, and we +already know whether an expression is potentially evaluated. ### Changes to other phases of clang From 35f866acae7f7f690f3d90143a9053281dcd3c8f Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 23 Jun 2021 09:09:42 -0400 Subject: [PATCH 15/22] Minor editorial changes Fix spelling mistakes, make formatting consistent, etc. --- sycl/doc/OptionalDeviceFeatures.md | 18 ++++++++---------- 1 file changed, 8 insertions(+), 10 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index c8a93cdf756bd..3a70959d7be36 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -524,7 +524,7 @@ x86_64_avx512: ``` The values of the aspects in this configuration file can be the numerical -values from the `enum class aspect` enumeration or the enum identifer itself. +values from the `enum class aspect` enumeration or the enum identifier itself. For each valid AOT compiler ID the driver has a built-in rule how to construct an AOT compilation command line based on given architecture name. For example, for the `gen11_1` and `gen_icl` architectures, the driver sees `gen-spir64` @@ -532,9 +532,9 @@ as the AOT compiler ID, so it knows that the `ocloc` tool must be used, and it also knows how to translate the `gen11_1` or `gen_icl` to proper `ocloc` architecture specification option. -*NOTE: new kinds of AOT compilers are expected to appear very rarely, so +**NOTE**: New kinds of AOT compilers are expected to appear very rarely, so developing some kind of "AOT compiler plugin" mechanism is impractical, and -hardcoding AOT compiler types in the driver is resonable.* +hard coding AOT compiler types in the driver is reasonable. One advantage to encoding this information in a textual configuration file is that customers can update the file if necessary. This could be useful, for @@ -542,12 +542,10 @@ example, if a new device is released before there is a new DPC++ release. In fact, the DPC++ driver supports a command line option which allows the user to select an alternate configuration file. -**TODO**: Add more sections here describing the changes to the DPC++ driver -and related tools. Other things to describe are: +**TODO**: More information will be inserted here when we merge +[this separate PR][6] into this design document. -* The names of the devices in the configuration file. -* The name of the DPC++ driver option that selects an alternate configuration - file. +[6]: ### Changes to the DPC++ runtime @@ -612,7 +610,7 @@ have information about source location of "used" aspects. ## Appendix: Adding an attribute to 8-byte `atomic_ref` -As described above under ["Changes to DPC++ headers"][6], we need to decorate +As described above under ["Changes to DPC++ headers"][7], we need to decorate any SYCL type representing an optional device feature with the `[[sycl_detail::uses_aspects()]]` attribute. This is somewhat tricky for `atomic_ref`, though, because it is only an optional feature when specialized @@ -620,7 +618,7 @@ for a 8-byte type. However, we can accomplish this by using partial specialization techniques. The following code snippet demonstrates (best read from bottom to top): -[6]: <#changes-to-dpc-headers> +[7]: <#changes-to-dpc-headers> ``` namespace sycl { From f64acdafdaa9f05f8287e3aa88da935bb5a20774 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 23 Jun 2021 10:45:01 -0400 Subject: [PATCH 16/22] Rename "!intel_allowed_aspects" Rename the `!intel_allowed_aspects` LLVM IR metadata to `!intel_declared_aspects`. --- sycl/doc/OptionalDeviceFeatures.md | 36 +++++++++++++++--------------- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 3a70959d7be36..219810aaba8be 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -244,7 +244,7 @@ In order to communicate the information from `[[sycl::requires()]]` and introduce two new LLVM IR metadata that can be attached to a function definition, similar to the existing `!intel_reqd_sub_group_size`. -These new metadata are named `!intel_allowed_aspects` and +These new metadata are named `!intel_declared_aspects` and `!intel_used_aspects`. In each case, the parameter is an (unnamed) metadata node, and the value of the metadata node is a list of `i32` constants, where each constant is a value from `enum class aspect`. For example, the following @@ -254,7 +254,7 @@ illustrates the IR that corresponds to a function `foo` that is decorated with corresponds to an aspect with numerical value `8`. ``` -define void @foo() !intel_allowed_aspects !1 !intel_used_aspects !2 {} +define void @foo() !intel_declared_aspects !1 !intel_used_aspects !2 {} !1 = !{i32 8, i32 9} !2 = !{i32 8} ``` @@ -264,11 +264,11 @@ define void @foo() !intel_allowed_aspects !1 !intel_used_aspects !2 {} The front-end of the device compiler is responsible for parsing the `[[sycl::requires()]]` and `[[sycl_detail::uses_aspects()]]` attributes and -transferring the information to the LLVM IR `!intel_allowed_aspects` and +transferring the information to the LLVM IR `!intel_declared_aspects` and `!intel_used_aspects` metadata according to the following rules: * If a function is decorated with the `[[sycl::requires()]]` attribute, the - front-end emits an `!intel_allowed_aspects` metadata on the function's LLVM + front-end emits an `!intel_declared_aspects` metadata on the function's LLVM IR definition with the numerical values of the aspects listed in the attribute. @@ -332,7 +332,7 @@ already know whether an expression is potentially evaluated. ### Changes to other phases of clang Any clang phases that do function inlining will need to be changed, so that the -`!intel_allowed_aspects` and `!intel_uses_aspects` metadata are transferred +`!intel_declared_aspects` and `!intel_uses_aspects` metadata are transferred from the inlined function to the function that receives the inlined function body. Presumably, there is already similar logic for the existing `!reqd_work_group_size` metadata, which already decorates device functions. @@ -376,24 +376,24 @@ Linking][5]. This pass operates on the static call graph for each kernel and each exported device function, propagating the aspects from the `!intel_used_aspects` and -`!intel_allowed_aspects` metadata from the leaves of the call graph up to their -callers. The result of this pass is that each device function is labeled with -a *Used* set of aspects which is computed as the union of the following: +`!intel_declared_aspects` metadata from the leaves of the call graph up to +their callers. The result of this pass is that each device function is labeled +with a *Used* set of aspects which is computed as the union of the following: * The aspects in the function's `!intel_used_aspects` metadata (if any). -* The aspects in the function's `!intel_allowed_aspects` metadata (if any). +* The aspects in the function's `!intel_declared_aspects` metadata (if any). * The aspects in the *Used* set of all functions called by this function. Once the *Used* set of aspects is known for each function, the post-link tool -compares this set of aspects with the aspects from any `!intel_allowed_aspects` -metadata. If the function has this metadata and if the *Used* set contains -aspects not in that set, it issues a warning indicating that the function uses -aspects that are not in the `[[sycl::requires()]]` list. Unfortunately, the -post-link tool is unable to include the source position of the code that uses -the aspect in question. To compensate, the warning message must include -instructions telling the user how to run the clang static analyzer which -provides a better diagnostic. This analysis phase is described in more detail -below. +compares this set of aspects with the aspects from any +`!intel_declared_aspects` metadata. If the function has this metadata and if +the *Used* set contains aspects not in that set, it issues a warning indicating +that the function uses aspects that are not in the `[[sycl::requires()]]` list. +Unfortunately, the post-link tool is unable to include the source position of +the code that uses the aspect in question. To compensate, the warning message +must include instructions telling the user how to run the clang static analyzer +which provides a better diagnostic. This analysis phase is described in more +detail below. #### Changes to the device code split algorithm From da9d05e9a9462006e64953b476be663457b14dea Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 29 Jun 2021 12:22:19 -0400 Subject: [PATCH 17/22] More updates to front-end design More updates after meeting (again) with the front-end team: * Reduce the responsibilities of the front-end proper. The front-end now does only simple transcription of attributes into IR metadata. * Create a new LLVM IR pass that: - Propagates "used aspect" information from types into the containing function (moved from front-end responsibility). - Propagates "used aspect" information through the static call graph (moved from post-link responsibility). - Issues warning messages when "used aspects" do not match `[[sycl::requires()]]` attributes. * Remove the proposed static analysis tool. The new LLVM IR pass can provide the same diagnostics, though the source location information may not be quite a good. * Eliminate the diagnostic in post-link if the used aspects in the definition of a `SYCL_EXTERNAL` function does not match the used aspects in its declaration. I think this would be difficult without changing the LLVM IR linker. * Add a new section defining terms. --- sycl/doc/OptionalDeviceFeatures.md | 510 ++++++++++++++++------------- 1 file changed, 276 insertions(+), 234 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 219810aaba8be..c5431290cc1e6 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -18,14 +18,59 @@ attributes"][2] and [section 5.8.2 "Device function attributes"][3]. [3]: +## Definition of terms + +### A kernel's static call graph + +The term "static call graph" of a kernel means the set of all functions that +a kernel may call, including functions that are called transitively from other +functions. In standard SYCL, device code is not allowed to contain function +pointers, virtual functions, or indirect function calls. It is therefore easy +to compute the static call graph of a kernel. By starting at the kernel +function itself (e.g. the function passed to `parallel_for`), the compiler can +identify all functions called by that function, then it can find all functions +called by those functions, etc. Depending on the tool which does the analysis, +the "static call graph" could include only those functions that reside in the +same translation unit as the kernel, or it could include all functions that +reside in the same executable image (or shared library) as the kernel. In the +sections below, we try to make the distinction clear whenever we refer to a +kernel's static call graph. + +We are contemplating a DPC++ extension that would allow some limited use of +function pointers in device code. This feature is not yet fully defined or +supported. We expect that the semantics of this feature will include some way +for the compiler to deduce a limited set of possible targets for each indirect +function call. Therefore, it is still possible for the compiler to construct a +"static call graph" for each kernel, the only difference is that each call site +now adds a set of possible target functions to a kernel's static call graph. +The details about how this will work are expected to be included in the DPC++ +extension specification that enables indirect function calls. + +### An exported device function + +The term "exported device function" means a device function that is exported +from a shared library as defined by [Device Code Dynamic Linking][4]. + +[4]: + +### The FE compiler + +The term "FE compiler" refers to the entire DPC++ compiler chain that runs +when the user executes the `clang++` command. This includes the clang +front-end itself, all passes over LLVM IR, the post-link tool, and any AOT +compilation phases (when the user compiles in AOT mode). The FE compiler does +not include the JIT compiler which translates SPIR-V (or another IL format) +into native code when the application executes. + + ## Requirements There are several categories of requirements covered by this design. Each of these is described in more detail in the sections that follow: -* The front-end compiler must issue a diagnostic in some cases when a kernel or - device function uses an optional feature. However, the front-end compiler - must **not** generate a diagnostic in other cases. +* The FE compiler must issue a diagnostic in some cases when a kernel or device + function uses an optional feature. However, the FE compiler must **not** + generate a diagnostic in other cases. * The runtime must raise an exception when a kernel using optional features is submitted to a device that does not support those features. This @@ -37,49 +82,20 @@ these is described in more detail in the sections that follow: not specifically submit the kernel to that device. -### Clarification of a kernel's static call graph - -Some of the requirements below refer to the "static call graph" of a kernel. -In standard SYCL, device code is not allowed to contain function pointers, -virtual functions, or indirect function calls. It is therefore easy to compute -the static call graph of a kernel. By starting at the kernel function itself -(e.g. the function passed to `parallel_for`), the compiler can identify all -functions called by that function, then it can find all functions called by -those functions, etc. Depending on the tool which does the analysis, the -"static call graph" could include only those functions that reside in the same -translation unit as the kernel, or it could include all functions that reside -in the same executable image (or shared library) as the kernel. In the -sections below, we try to make the distinction clear whenever we refer to a -kernel's static call graph. +### Diagnostics from the FE compiler -We are contemplating a DPC++ extension that would allow some limited use of -function pointers in device code. This feature is not yet fully defined or -supported. We expect that the semantics of this feature will include some way -for the compiler to deduce a limited set of possible targets for each indirect -function call. Therefore, it is still possible for the compiler to construct a -"static call graph" for each kernel, the only difference is that each call site -now adds a set of possible target functions to a kernel's static call graph. -The details about how this will work are expected to be included in the DPC++ -extension specification that enables indirect function calls. - - -### Diagnostics from the front-end compiler - -By "front-end compiler", we mean the DPC++ compiler which parses DPC++ source -code, not the JIT compiler that translates SPIR-V into native code. - -In general, the front-end compiler does not know which kernels the application -will submit to which devices. Therefore, the front-end compiler does not -generally know which optional features a kernel can legally use. Thus, in -general, the front-end compiler must not issue any diagnostic simply because a -kernel uses an optional feature. +In general, the FE compiler does not know which kernels the application will +submit to which devices. Therefore, the FE compiler does not generally know +which optional features a kernel can legally use. Thus, in general, the FE +compiler must not issue any diagnostic simply because a kernel uses an optional +feature. The only exception to this rule occurs when the application uses the C++ attribute `[[sycl::requires()]]`. When the application decorates a kernel or device function with this attribute, it is an assertion that the kernel or device function is allowed to use only those optional features which are listed -by the attribute. Therefore, the front-end compiler must issue a diagnostic if -the kernel or device function uses any other optional kernel features. +by the attribute. Therefore, the FE compiler must issue a diagnostic if the +kernel or device function uses any other optional kernel features. The SYCL 2020 specification only mandates this diagnostic when a kernel or device function that is decorated with `[[sycl::requires()]]` uses an optional @@ -89,11 +105,6 @@ the kernel function. Thus, the compiler is not required to issue a diagnostic if the use is in a `SYCL_EXTERNAL` function that is defined in another translation unit. -It turns out, though, that DPC++ can diagnose this case at link time, when we -have visibility into device functions that are defined in other translation -units. Since the design proposed below allows this diagnostic to be checked -with minimal extra effort, it is desirable to do so. - Note that this behavior does not change when the compiler runs in AOT mode. Even if the user specifies a target device via "-fsycl-targets", that does not necessarily mean that the user expects all the code in the application to be @@ -137,12 +148,12 @@ circumstances: not actually use a feature corresponding to the aspect, and it must be thrown even if the aspect does not correspond to any optional feature. -* For a kernel that is decorated with `[[sycl::requires()]]`, the compiler will - mostly check (at compile time) whether the kernel uses any features that are - not listed in the attribute. However, this check only results in a warning, - so the runtime is still responsible for throwing the exception if any of the - functions called by the kernel uses an optional feature that the device does - not support. +* For a kernel that is decorated with `[[sycl::requires()]]`, the FE compiler + will mostly check (at compile time) whether the kernel uses any features that + are not listed in the attribute. However, this check only results in a + warning, so the runtime is still responsible for throwing the exception if + any of the functions called by the kernel uses an optional feature that the + device does not support. * For a kernel that is decorated with the `[[sycl::reqd_work_group_size(W)]]` or `[[sycl::reqd_sub_group_size(S)]]` attribute, the exception must be thrown @@ -219,11 +230,11 @@ void amx_multiply(); This attribute can also be used to decorate class templates where only certain instantiations correspond to optional features. See ["Appendix: Adding an -attribute to 8-byte `atomic_ref`"][4] for an illustration of how this attribute +attribute to 8-byte `atomic_ref`"][5] for an illustration of how this attribute can be used in conjunction with partial specialization to mark only certain instantiations of `sycl::atomic_ref` as an optional feature. -[4]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> +[5]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> As you can see from the examples above, the syntax for the parameter to the `[[sycl_detail::uses_aspects()]]` attribute is identical to the syntax for the @@ -231,27 +242,59 @@ standard `[[sycl::requires()]]` attribute. Unfortunately, the fundamental type `double` is also an optional kernel feature. Since there is no type alias for `double`, there is no convenient -place to add an attribute. Instead, the front-end device compiler must behave -as though there was an implicit -`[[sycl_detail::uses_aspects(has(aspect::fp64))]]` attribute for any device -code that uses the `double` type. +place to add an attribute. Instead, the FE device compiler must behave as +though there was an implicit `[[sycl_detail::uses_aspects(has(aspect::fp64))]]` +attribute for any device code that uses the `double` type. ### New LLVM IR metadata In order to communicate the information from `[[sycl::requires()]]` and `[[sycl_detail::uses_aspects()]]` attributes to the DPC++ post-link tool, we -introduce two new LLVM IR metadata that can be attached to a function -definition, similar to the existing `!intel_reqd_sub_group_size`. - -These new metadata are named `!intel_declared_aspects` and -`!intel_used_aspects`. In each case, the parameter is an (unnamed) metadata -node, and the value of the metadata node is a list of `i32` constants, where -each constant is a value from `enum class aspect`. For example, the following -illustrates the IR that corresponds to a function `foo` that is decorated with -`[[sycl::requires()]]` where the required aspects have the numerical values -`8` and `9`. In addition, the function uses an optional feature that -corresponds to an aspect with numerical value `8`. +introduce several new LLVM IR metadata. + +The named metadata `!intel_types_that_use_aspects` conveys information about +types that are decorated with `[[sycl_detail::uses_aspects()]]`. This metadata +is not referenced by any instruction in the module, so it must be looked up by +name. The format looks like this: + +``` +!intel_types_that_use_aspects = !{!0, !1, !2} +!0 = !{!"class.cl::sycl::detail::half_impl::half", i32 8} +!1 = !{!"class.cl::sycl::amx_type", i32 9} +!2 = !{!"class.cl::sycl::other_type", i32 8, i32 9} +``` + +The value of the `!intel_types_that_use_aspects` metadata is a list of unnamed +metadata nodes, each of which describes one type that is decorated with +`[[sycl_detail::uses_aspects()]]`. The value of each unnamed metadata node +starts with a string giving the name of the type which is followed by a list of +`i32` constants where each constant is a value from `enum class aspect` telling +the numerical value of an aspect from the type's +`[[sycl_detail::uses_aspects()]]` attribute. In the example above, the type +`cl::sycl::detail::half_impl::half` uses an aspect whose numerical value is +`8` and the type `cl::sycl::other_type` uses two aspects `8` and `9`. + +**NOTE**: The reason we choose this representation is because LLVM IR does not +allow metadata to be attached directly to types. This representation works +around that limitation by creating global named metadata that references the +type's name. + +We also introduce two metadata that can be attached to a function definition +similar to the existing `!intel_reqd_sub_group_size`. The +`!intel_declared_aspects` metadata is used for functions that are decorated +with `[[sycl::requires()]]`, and the `!intel_used_aspects` metadata is used to +store the propagated information about all aspects used by a kernel or exported +device function. + +In each case, the metadata's parameter is an unnamed metadata node, and the +value of the metadata node is a list of `i32` constants, where each constant is +a value from `enum class aspect`. + +For example, the following illustrates the IR that corresponds to a function +`foo` that is decorated with `[[sycl::requires()]]` where the required aspects +have the numerical values `8` and `9`. In addition, the function uses an +optional feature that corresponds to an aspect with numerical value `8`. ``` define void @foo() !intel_declared_aspects !1 !intel_used_aspects !2 {} @@ -264,78 +307,140 @@ define void @foo() !intel_declared_aspects !1 !intel_used_aspects !2 {} The front-end of the device compiler is responsible for parsing the `[[sycl::requires()]]` and `[[sycl_detail::uses_aspects()]]` attributes and -transferring the information to the LLVM IR `!intel_declared_aspects` and -`!intel_used_aspects` metadata according to the following rules: - -* If a function is decorated with the `[[sycl::requires()]]` attribute, the - front-end emits an `!intel_declared_aspects` metadata on the function's LLVM - IR definition with the numerical values of the aspects listed in the - attribute. - -* If a function is decorated with the `[[sycl_detail::uses_aspects()]]` - attribute, the front-end emits an `!intel_uses_aspects` metadata on the - function's LLVM IR definition with the numerical values of the aspects - listed in the attribute. - -In both of these cases, if the attribute decorates a base class member function -and a derived class overrides the member function, the overriding member -function does not automatically inherit the attribute. Therefore, the -front-end need not consider any overridden base class functions when generating -these metadata. - -The front-end also emits an `!intel_uses_aspects` metadata for a function *F* -(or augments the existing `!intel_uses_aspects` metadata for function *F*) if -the function's body "uses" a type *T* that is decorated with the -`[[sycl_detail::uses_aspects()]]` attribute. When such a type is "used", the -function *F* uses all the aspects in the attribute that decorates the type. A -function "uses" a type *T* if it contains a potentially evaluated expression -that is not discarded through `constexpr if` and if that expression does any of -the following: - -* Creates an object of type *U* (including a temporary object). -* Contains a cast to type *U*. -* References a literal of type *T*. - -Where the type *U* is any of the following: - -* The type *T*. -* A cv-qualified version of type *T*. -* An array of, pointer to, or reference to type *T*. -* A type that derives from type *T*. -* A class type that contains a non-static member object of type *T*. -* Any type that applies these rules recursively to type *T* (e.g. array of - pointers to type *T*, etc.) - -When applying these rules, the front-end treats any use of the `double` type as -though it was implicitly decorated with -`[[sycl_detail::uses_aspects(has(aspect::fp64))]]`. - -**TODO**: We are still discussing this section with the front-end team. One of -their main concerns is that they want to avoid adding a new pass over the AST. -However, it's difficult to construct the `!intel_uses_aspects` metadata as the -source is parsed for two reasons. One challenge comes from the fact that we -only consider types that are used in *potentially evaluated* expressions. This -means that the front-end must ignore types used in unevaluated expressions such -as `sizeof`. However, the parser doesn't know whether an expression is -potentially evaluated when it encounters the expression. The other challenge -is that we must not consider types used in statements that are discarded -through `constexpr if`. Again, the parser doesn't know whether a statement is -discarded at the time it encounters the statement. - -We think we can solve both of these problems by generating the -`!intel_uses_aspects` metadata during the front-end's code generation phase -(the point at which the front-end converts the AST into LLVM IR). At this -point we have already discarded statements due to `constexpr if`, and we -already know whether an expression is potentially evaluated. - - -### Changes to other phases of clang - -Any clang phases that do function inlining will need to be changed, so that the -`!intel_declared_aspects` and `!intel_uses_aspects` metadata are transferred -from the inlined function to the function that receives the inlined function -body. Presumably, there is already similar logic for the existing -`!reqd_work_group_size` metadata, which already decorates device functions. +transferring the information to the LLVM IR metadata described above according +to the following rules: + +* If the translation unit contains any type definitions that are decorated with + `[[sycl_detail::uses_aspects()]]`, the front-end creates an + `!intel_types_that_use_aspects` metadata describing the aspects used by all + such types. + +* If a function is decorated with `[[sycl_detail::uses_aspects()]]`, the + front-end adds an `!intel_used_aspects` metadata to the function's definition + listing the aspects from that attribute. + +* If a function is decorated with `[[sycl::requires()]]`, the front-end adds + an `!intel_declared_aspects` metadata to the function's definition listing + the aspects from that attribute. + + +### New LLVM IR pass to propagate aspect usage + +We add a new IR phase to the device compiler which does the following: + +* Creates (or augments) a function's `!intel_used_aspects` metadata with + aspects that come from references to types in the + `intel_types_that_use_aspects` list. + +* Propagates each function's `!intel_used_aspects` metadata up the static call + graph so that each function lists the aspects used by that function and by + any functions it calls. + +* Diagnoses a warning if any function that has `!intel_declared_aspects` uses + an aspect not listed in that declared set. + +It is important that this IR phase runs before any other optimization phase +that might eliminate a reference to a type or inline a function call because +such optimizations will cause us to miss information about aspects that are +used. Therefore, it is recommended that this new phase run first, before all +other IR phases. + +Implementing the first bullet point is straightforward. The implementation can +scan the IR for each function looking for instructions that reference a type. +It can then see if that type is in the `!intel_types_that_use_aspects` set; if +so it adds the type's aspects to the function's `!intel_used_aspects` set. +While doing this, the implementation must have a special case for the `double` +type because the front-end does not include that type in the +`!intel_types_that_use_aspects` set. If a function references the `double` +type, the implementation implicitly assumes that the function uses +`aspect::fp64` and adds that aspect to the function's `!intel_used_aspects` +set. + +The second bullet point requires building the static call graph, but the +implementation need not scan the instructions in each function. Instead, it +need only look at the `!intel_used_aspects` metadata for each function, +propagating the aspects used by each function up to it callers and augmenting +the caller's `!intel_used_aspects` set. + +Diagnosing warnings is then straightforward. The implement looks for functions +that have `!intel_declared_aspects` and compares that set with the +`!intel_used_aspects` set (if any). If a function uses an aspect that is not +in the declared set, the implementation issues a warning. + +One weakness of this design is that the warning message will only be able to +contain the source location of the problem if the compiler was invoked with +`-g` because this is the only time when the front-end propagates source +location information into the IR. To compensate, the warning message displays +the static call chain that leads to the problem. For example: + +``` +warning: function 'foo' uses aspect 'fp64' not listed in 'sycl::requires' +use is from this call chain: + foo() + bar() + boo() +compile with '-g' to get source location +``` + +Including the call chain in the warning message will require maintaining some +additional information during the traversal of the static call graph described +above. + +When the compiler is invoked with `-g`, the implementation uses the +`!DILocation` metadata to improve the warning message with source file, line, +and column information like so: + +``` +hw.cpp:27:4: warning: function 'foo' uses aspect 'fp64' not listed in 'sycl::requires' +use is from this call chain: + foo() + bar() hw.cpp:15:3 + boo() hw.cpp:25:5 +``` + +In the example above, the location `hw.cpp:27:4` gives the source location of +the code that uses the `fp64` aspect, in this case somewhere in the `boo()` +function. The location `hw.cpp:15:3` tells the location in `foo()` of the call +to `bar()`, etc. + +**NOTE**: Issuing this warning message from an IR pass is a compromise. We +would get better source location if the front-end diagnosed this warning. +However, we feel that the analysis required to diagnose this warning would be +too expensive in the front-end because it requires an additional pass over the +AST. By contrast, we can diagnose the warning more efficiently in an IR pass +because traversal of the IR is much more efficient than traversal of the AST. +The downside, though, is that the warning message is less informative. + + +### Assumptions on other phases of clang + +The post-link tool (described below) uses the `!intel_used_aspects` and +`!intel_declared_aspects` metadata, so this metadata must be retained by any +other clang passes. However, post-link only uses this metadata when it +decorates the definition of a kernel function or the definition of an exported +device function, so it does not matter if intervening clang passes discard the +metadata on other device functions. + +We think this is a safe assumption for two reasons. First, the existing design +must already preserve the `!reqd_work_group_size` metadata that decorates +kernel functions. Second, the kernel functions and exported device functions +always have external linkage, so there is no possibility that a clang phase +will optimize them away. + +**NOTE**: Ideally, we would change the llvm-link tool to somehow preserve the +`!intel_declared_aspects` and `!intel_used_aspects` metadata for functions +marked `SYCL_EXTERNAL` so that we could compare the declared aspects (in the +module that imports the function) with the used aspects (in the module the +exports the function). This would allow us to diagnose errors where the +importing translation unit's declared aspects do not match the aspects actually +used by the function. + +We do not propose this change as part of this design, though. We expect that +this will not be a common error because applications can avoid this problem by +declaring the `SYCL_EXTERNAL` function in a common header that is included by +both the importing and the exporting translation unit. If the declaration (in +the header) is decorated with `[[sycl::requires()]]`, the shared declaration +will ensure that the definition stays in sync with the declaration. ### Changes to the post-link tool @@ -351,56 +456,23 @@ expect this to work, but DPC++ currently fails because JIT-compiling *K1* causes the entire bundle to be compiled, and this fails when trying to compile *K2* for a device that does not have aspect *A*. -We solve this problem by changing the post-link tool to bundle kernels -according to the aspects that they use. - -The post-link tool is also a convenient place to issue a diagnostic when a -function uses aspects that it is not allowed to use (i.e. the function is -decorated with `[[sycl::requires()]]` and it uses some optional feature that -corresponds to an aspect that is not in the `[[sycl::requires()]]` list). - -The post-link tool achieves this by examining the static call graph of each -kernel and each exported device function. When this tool computes the static -call graph, it considers all code in any of the translation units that are -being linked together. This may not be the complete call graph, however, in -cases where a kernel calls out to a device function that is defined in a -different shared library. - -**NOTE**: In this context, "exported device function" means a device function -that is exported from a shared library as defined by [Device Code Dynamic -Linking][5]. - -[5]: - -#### Pass to identify aspects used by each device function - -This pass operates on the static call graph for each kernel and each exported -device function, propagating the aspects from the `!intel_used_aspects` and -`!intel_declared_aspects` metadata from the leaves of the call graph up to -their callers. The result of this pass is that each device function is labeled -with a *Used* set of aspects which is computed as the union of the following: - -* The aspects in the function's `!intel_used_aspects` metadata (if any). -* The aspects in the function's `!intel_declared_aspects` metadata (if any). -* The aspects in the *Used* set of all functions called by this function. - -Once the *Used* set of aspects is known for each function, the post-link tool -compares this set of aspects with the aspects from any -`!intel_declared_aspects` metadata. If the function has this metadata and if -the *Used* set contains aspects not in that set, it issues a warning indicating -that the function uses aspects that are not in the `[[sycl::requires()]]` list. -Unfortunately, the post-link tool is unable to include the source position of -the code that uses the aspect in question. To compensate, the warning message -must include instructions telling the user how to run the clang static analyzer -which provides a better diagnostic. This analysis phase is described in more -detail below. +We solve this problem by changing the post-link tool to bundle kernels and +exported device functions according to the aspects that they use. #### Changes to the device code split algorithm The algorithm for splitting device functions into images must be changed to -account for the *Used* aspects of each kernel or exported device function. The +account for the aspects used by each kernel or exported device function. The goal is to ensure that two kernels or exported device functions are only -bundled together into the same device image if their *Used* sets are identical. +bundled together into the same device image if they use exactly the same set +of aspects. + +For the purposes of this analysis, the set of *Used* aspects is computed by +taking the union of the aspects listed in the kernel's (or device function's) +`!intel_used_aspects` and `!intel_declared_aspects` sets. This is consistent +with the SYCL specification, which says that a kernel decorated with +`[[sycl::requires()]]` may only be submitted to a device that provides the +listed aspects, regardless of whether the kernel actually uses those aspects. We must also split two kernels into different device images if they have different `[[sycl::reqd_sub_group_size()]]` or different @@ -484,17 +556,18 @@ sizes. In AOT mode, for each AOT target specified by the `-fsycl-targets` command line option, DPC++ normally invokes the AOT compiler for each device IR module -resulting from the sycl-post-link tool. For example, this is `ocloc` command -for Intel Gen AOT target and `opencl-aot` command for the x86 AOT target with -SPIR-V as the input, or other specific tools for the PTX target with LLVMIR -bitcode input. This causes a problem, though, for IR modules that use optional -features because these commands could fail if they attempt to compile IR using -an optional feature that is not supported by the target device. We therefore -need some way to avoid calling these commands in these cases. +resulting from the sycl-post-link tool. For example, this is the `ocloc` +command for Intel Gen AOT target and the `opencl-aot` command for the x86 AOT +target with SPIR-V as the input, or other specific tools for the PTX target +with LLVM IR bitcode input. This causes a problem, though, for IR modules that +use optional features because these commands could fail if they attempt to +compile IR using an optional feature that is not supported by the target +device. We therefore need some way to avoid calling these commands in these +cases. The overall design is as follows. The DPC++ installation includes a -configuration file that has one entry for each device that it supports. Each -entry lists the set of aspects that the device supports and a list of the +configuration file that has one entry for each device that it supports. Each +entry contains the set of aspects that the device supports and the set of sub-group sizes that it supports. DPC++ then consults this configuration file to decide whether to invoke a particular AOT compiler on each device IR module, using the information from the module's "SYCL/image-requirements" @@ -503,10 +576,12 @@ property set. #### Device configuration file The configuration file uses a simple YAML format where each top-level key is -a name of a device architecture. These names correspond to SYCL aspect enum -identifiers as defined in the [TBD] API header. There are sub-keys under each -device for the supported aspects, sub-group sizes and AOT compiler ID. For -example: +a name of a device architecture. We expect to define a set of device +architecture names that are used consistently in many places (in this +configuration file, in the names of device-specific aspects, as parameters for +the `-fsycl-targets` command line option, etc.) However, we have not yet +agreed on these architecture names. There are sub-keys under each device for +the supported aspects, sub-group sizes and AOT compiler ID. For example: ``` gen11_1: @@ -547,6 +622,7 @@ to select an alternate configuration file. [6]: + ### Changes to the DPC++ runtime The DPC++ runtime must be changed to check if a kernel uses any optional @@ -554,8 +630,8 @@ features that the device does not support. If this happens, the runtime must raise a synchronous `errc::kernel_not_supported` exception. When the application submits a kernel to a device, the runtime identifies all -the other device images that export device functions that are needed by the -kernel as described in [Device Code Dynamic Linking][5]. Before the runtime +the other device images that export device functions which are needed by the +kernel as described in [Device Code Dynamic Linking][4]. Before the runtime actually links these images together, it compares each image's "SYCL/image-requirements" against the features provided by the target device. If any of the following checks fail, the runtime throws @@ -573,40 +649,6 @@ include this property in the set nonetheless for possible future use. If the runtime throws an exception, it happens even before the runtime tries to access the contents of the device image. -### Clang static analyzer to diagnose unexpected aspect usage - -When a device function is decorated with the `[[sycl::requires()]]` attribute, -it is an assertion that the function (and all of the functions it calls) do not -use any optional features beyond those listed in the attribute. The post-link -tool diagnoses a warning if the function does use additional aspects, but this -diagnostic is not user-friendly for two reasons: - -* It does not contain the source position of the offending code, so it is - difficult for the user to identify the location of the problem. - -* The diagnostic happens at link time instead of compile-time, so the user - doesn't learn about it until late in the build process. - -Ideally, we would diagnose these cases in the compiler front-end, but we -believe this is contrary to the clang design principles because -inter-procedural analysis is required to identify these cases, and the clang -front-end does not normally do inter-procedural analysis. Instead, clang -normally uses a static analyzer phase to diagnose cases like this which require -deeper analysis. - -The downside is that clang does not execute the static analyzer by default, so -the user won't see these diagnostics unless they ask for the analysis. This is -one reason we issue these diagnostics in the post-link tool, which is always -run for SYCL applications. - -If this static analyzer phase is enabled, it operates on a single translation -unit, analyzing the static call graph of any device function that is decorated -with `[[sycl::requires()]]`. - -**TODO**: More information here about how it works. Presumably, the static -analyzer operates on LLVM IR, but the IR we generate in the front end doesn't -have information about source location of "used" aspects. - ## Appendix: Adding an attribute to 8-byte `atomic_ref` From a818156a5c94c178791334b76bca83bb59d70e56 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Tue, 29 Jun 2021 17:37:49 -0400 Subject: [PATCH 18/22] Support `automatic` / `primary` sub-group sizes Describe how we will bundle kernels that have a "named" required sub-group size (i.e. decorated with `[[intel::named_sub_group_size(NAME)]]`. --- sycl/doc/OptionalDeviceFeatures.md | 18 +++++++++++------- 1 file changed, 11 insertions(+), 7 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index c5431290cc1e6..b5146543e0b97 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -489,10 +489,15 @@ Therefore, two kernels or exported device functions are only bundled together into the same device image if all of the following are true: * They share the same set of *Used* aspects, -* They either both have no required sub-group size or both have the same - required sub-group size, and * They either both have no required work-group size or both have the same - required work-group size. + required work-group size, and +* They either both have the same numeric value for their required sub-group + size or neither has a numeric value for a required sub-group size. (Note + that this implies that kernels decorated with + `[[intel::named_sub_group_size(automatic)]]` can be bundled together with + kernels that are decorated with `[[intel::named_sub_group_size(primary)]]` + and that either of these kernels could be bundled with a kernel that has no + required sub-group size.) These criteria are an additional filter applied to the device code split algorithm after taking into account the `-fsycl-device-code-split` command line @@ -526,10 +531,9 @@ property (which is always divisible by `4`) tells the number of aspects in the array. There is a "reqd\_sub\_group\_size" property if the image contains any kernels -with a required sub-group size. The value of the property is a `uint32` value -that tells the required size. (The device code split algorithm ensures that -there are never two kernels with different required sub-group sizes in the same -image.) +with a numeric required sub-group size. (I.e. this excludes kernels where the +required sub-group size is a named value like `automatic` or `primary`.) The +value of the property is a `uint32` value that tells the required size. There is a "reqd\_work\_group\_size" property if the image contains any kernels with a required work-group size. The value of the property is a `BYTE_ARRAY` From 722e4ca76464489ee4062209e481d212d33aeef2 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Wed, 30 Jun 2021 14:29:03 -0400 Subject: [PATCH 19/22] Remove "aot-compiler-id" from config file We decided that the `aot-compiler-id` entry in the config file did not add enough value, so remove it. We can add it later if we want without breaking backward compatibility. --- sycl/doc/OptionalDeviceFeatures.md | 13 ------------- 1 file changed, 13 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index b5146543e0b97..a53b6be841e9b 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -591,29 +591,16 @@ the supported aspects, sub-group sizes and AOT compiler ID. For example: gen11_1: aspects: [1, 2, 3] sub-group-sizes: [8, 16] - aot-compiler-id: gen-spir64 gen_icl: aspects: [2, 3] sub-group-sizes: [8, 16] - aot-compiler-id: gen-spir64 x86_64_avx512: aspects: [1, 2, 3, 9, 11] sub-group-sizes: [8, 32] - aot-compiler-id: x86-spir64 ``` The values of the aspects in this configuration file can be the numerical values from the `enum class aspect` enumeration or the enum identifier itself. -For each valid AOT compiler ID the driver has a built-in rule how to construct -an AOT compilation command line based on given architecture name. For example, -for the `gen11_1` and `gen_icl` architectures, the driver sees `gen-spir64` -as the AOT compiler ID, so it knows that the `ocloc` tool must be used, and it -also knows how to translate the `gen11_1` or `gen_icl` to proper `ocloc` -architecture specification option. - -**NOTE**: New kinds of AOT compilers are expected to appear very rarely, so -developing some kind of "AOT compiler plugin" mechanism is impractical, and -hard coding AOT compiler types in the driver is reasonable. One advantage to encoding this information in a textual configuration file is that customers can update the file if necessary. This could be useful, for From 29a2e9f504372d40c1487a569f95b4079d1ab4ba Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Thu, 15 Jul 2021 08:36:59 -0400 Subject: [PATCH 20/22] Add a note about comparing types Alexey Sachkov proposed an optimization for comparing the types in each IR instruction with the `!intel_types_that_use_aspects` set. Capture this in a note. --- sycl/doc/OptionalDeviceFeatures.md | 9 +++++++++ 1 file changed, 9 insertions(+) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index a53b6be841e9b..d1e5268f92ced 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -356,6 +356,15 @@ type, the implementation implicitly assumes that the function uses `aspect::fp64` and adds that aspect to the function's `!intel_used_aspects` set. +**NOTE**: This scan of the IR will require comparing the type referenced by +each IR instruction with the names of the types in the +`!intel_types_that_use_aspects` metadata. It would be very inefficient if we +did a string comparison each time. As an optimization, the implementation can +first lookup up each type name in the `!intel_types_that_use_aspects` metadata +set, finding the "type pointer" that corresponds to each type name. Then the +pass over the IR can compare the type pointer in each IR instruction with the +type pointers from the `!intel_types_that_use_aspects` metadata set. + The second bullet point requires building the static call graph, but the implementation need not scan the instructions in each function. Instead, it need only look at the `!intel_used_aspects` metadata for each function, From 0e479767de1486f42f36c621cee5e2f3a18208fb Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 23 Jul 2021 08:07:33 -0400 Subject: [PATCH 21/22] Rename property set Rename the property set from "SYCL/image-requirements" to "SYCL/device-requirements" to avoid confusion with the SYCL types `unsampled_image` and `sampled_image`. --- sycl/doc/OptionalDeviceFeatures.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index d1e5268f92ced..57dfb9f0fe3cd 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -517,12 +517,12 @@ that option, and then another split is performed to ensure that each device image contains only kernels or exported device functions that meet the criteria listed above. -#### Create the "SYCL/image-requirements" property set +#### Create the "SYCL/device-requirements" property set The DPC++ runtime needs some way to know about the *Used* aspects, required sub-group size, and required work-group size of an image. Therefore, the post-link tool provides this information in a new property set named -"SYCL/image-requirements". +"SYCL/device-requirements". The following table lists the properties that this set may contain and their types: @@ -583,7 +583,7 @@ configuration file that has one entry for each device that it supports. Each entry contains the set of aspects that the device supports and the set of sub-group sizes that it supports. DPC++ then consults this configuration file to decide whether to invoke a particular AOT compiler on each device IR -module, using the information from the module's "SYCL/image-requirements" +module, using the information from the module's "SYCL/device-requirements" property set. #### Device configuration file @@ -633,7 +633,7 @@ When the application submits a kernel to a device, the runtime identifies all the other device images that export device functions which are needed by the kernel as described in [Device Code Dynamic Linking][4]. Before the runtime actually links these images together, it compares each image's -"SYCL/image-requirements" against the features provided by the target +"SYCL/device-requirements" against the features provided by the target device. If any of the following checks fail, the runtime throws `errc::kernel_not_supported`: From 1e927c107a7ac5c486a5620fba4ce80912701785 Mon Sep 17 00:00:00 2001 From: Greg Lueck Date: Fri, 23 Jul 2021 08:30:54 -0400 Subject: [PATCH 22/22] Simplify "[[sycl::uses_aspects()]]" We are planning to change `[[sycl::requires()]]` in the SYCL 2020 spec to omit the `has()` clause, which means the parameter to that attribute will just be a list of aspects. Make the same change to `[[sycl::uses_aspects()]]` in this design document. Also add a note that we plan to change the name of `[[sycl::requires()]]` to `[[sycl::device_has()]]`. We can update this design document again once that change is adopted into the SYCL 2020 spec. --- sycl/doc/OptionalDeviceFeatures.md | 46 +++++++++++++++++++----------- 1 file changed, 29 insertions(+), 17 deletions(-) diff --git a/sycl/doc/OptionalDeviceFeatures.md b/sycl/doc/OptionalDeviceFeatures.md index 57dfb9f0fe3cd..549010f51e8a8 100644 --- a/sycl/doc/OptionalDeviceFeatures.md +++ b/sycl/doc/OptionalDeviceFeatures.md @@ -17,6 +17,14 @@ attributes"][2] and [section 5.8.2 "Device function attributes"][3]. [2]: [3]: +**NOTE**: At the time this document was written, there is a +[proposed change][4] to the SYCL 2020 specification that will rename +`[[sycl::requires()]]` to `[[sycl::device_has()]]`. Since that proposal has +not yet been adopted, this design document continues to use the +`[[sycl::requires()]]` name. + +[4]: + ## Definition of terms @@ -49,9 +57,9 @@ extension specification that enables indirect function calls. ### An exported device function The term "exported device function" means a device function that is exported -from a shared library as defined by [Device Code Dynamic Linking][4]. +from a shared library as defined by [Device Code Dynamic Linking][5]. -[4]: +[5]: ### The FE compiler @@ -207,7 +215,7 @@ To illustrate, the type `sycl::half` is an optional feature whose associated aspect is `aspect::fp16`. We therefore decorate the declaration like this: ``` -using half [[__sycl_detail__::__uses_aspects__(has(aspect::fp16))]] = +using half [[__sycl_detail__::__uses_aspects__(aspect::fp16)]] = cl::sycl::detail::half_impl::half; ``` @@ -215,7 +223,7 @@ If an optional feature is expressed as a class type, it can be similarly decorated (here illustrating a hypothetical AMX type): ``` -class [[__sycl_detail__::__uses_aspects__(has(aspect::ext_intel_amx))]] amx_type { +class [[__sycl_detail__::__uses_aspects__(aspect::ext_intel_amx)]] amx_type { /* ... */ }; ``` @@ -224,26 +232,30 @@ This attribute is also used to decorate function declarations that correspond to optional features. Again, illustrating a hypothetical AMX extension: ``` -[[__sycl_detail__::__uses_aspects__(has(aspect::ext_intel_amx))]] +[[__sycl_detail__::__uses_aspects__(aspect::ext_intel_amx)]] void amx_multiply(); ``` This attribute can also be used to decorate class templates where only certain instantiations correspond to optional features. See ["Appendix: Adding an -attribute to 8-byte `atomic_ref`"][5] for an illustration of how this attribute +attribute to 8-byte `atomic_ref`"][6] for an illustration of how this attribute can be used in conjunction with partial specialization to mark only certain instantiations of `sycl::atomic_ref` as an optional feature. -[5]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> +[6]: <#appendix-adding-an-attribute-to-8-byte-atomic_ref> -As you can see from the examples above, the syntax for the parameter to the -`[[sycl_detail::uses_aspects()]]` attribute is identical to the syntax for the -standard `[[sycl::requires()]]` attribute. +Although the examples above show only a single aspect parameter to the +`[[sycl_detail::uses_aspects()]]` attribute, this attribute should support a +list of aspects, similar to the `[[sycl::requires()]]` attribute. This will +allow us to support future features that depend on a conjunction of aspects +(e.g. a feature that does atomic operations on 64-bit floating point values +might be decorated with +`[[sycl_detail::uses_aspects(aspect::fp64, aspect::atomic64)]]`). Unfortunately, the fundamental type `double` is also an optional kernel feature. Since there is no type alias for `double`, there is no convenient place to add an attribute. Instead, the FE device compiler must behave as -though there was an implicit `[[sycl_detail::uses_aspects(has(aspect::fp64))]]` +though there was an implicit `[[sycl_detail::uses_aspects(aspect::fp64)]]` attribute for any device code that uses the `double` type. @@ -618,9 +630,9 @@ fact, the DPC++ driver supports a command line option which allows the user to select an alternate configuration file. **TODO**: More information will be inserted here when we merge -[this separate PR][6] into this design document. +[this separate PR][7] into this design document. -[6]: +[7]: ### Changes to the DPC++ runtime @@ -631,7 +643,7 @@ raise a synchronous `errc::kernel_not_supported` exception. When the application submits a kernel to a device, the runtime identifies all the other device images that export device functions which are needed by the -kernel as described in [Device Code Dynamic Linking][4]. Before the runtime +kernel as described in [Device Code Dynamic Linking][5]. Before the runtime actually links these images together, it compares each image's "SYCL/device-requirements" against the features provided by the target device. If any of the following checks fail, the runtime throws @@ -652,7 +664,7 @@ access the contents of the device image. ## Appendix: Adding an attribute to 8-byte `atomic_ref` -As described above under ["Changes to DPC++ headers"][7], we need to decorate +As described above under ["Changes to DPC++ headers"][8], we need to decorate any SYCL type representing an optional device feature with the `[[sycl_detail::uses_aspects()]]` attribute. This is somewhat tricky for `atomic_ref`, though, because it is only an optional feature when specialized @@ -660,7 +672,7 @@ for a 8-byte type. However, we can accomplish this by using partial specialization techniques. The following code snippet demonstrates (best read from bottom to top): -[7]: <#changes-to-dpc-headers> +[8]: <#changes-to-dpc-headers> ``` namespace sycl { @@ -688,7 +700,7 @@ class atomic_ref_impl : public atomic_ref_impl_base { // Explicit specialization for 8-byte types. Only this specialization has the // attribute. template -class [[__sycl_detail__::__uses_aspects__(has(aspect::atomic64))]] +class [[__sycl_detail__::__uses_aspects__(aspect::atomic64)]] atomic_ref_impl : public atomic_ref_impl_base { public: using atomic_ref_impl_base::atomic_ref_impl_base;