-
Notifications
You must be signed in to change notification settings - Fork 13.6k
[MLIR][OpenMP] Add omp.simd operation #79843
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch introduces the `omp.simd` operation. In contrast to the existing `omp.simdloop` operation, it is intended to hold SIMD information within worksharing loops, rather than representing a SIMD-only loop. Some examples of such loops are "omp do/for simd", "omp distribute simd", "omp target teams distribute parallel do/for simd", etc. For more context on this work, refer to PR #79559. This operation must always be nested within an `omp.wsloop` operation as its only non-terminator child. It follows the same approach as the `omp.distribute` operation, by serving as a simple wrapper operation holding clause information.
@llvm/pr-subscribers-flang-openmp @llvm/pr-subscribers-mlir Author: Sergio Afonso (skatrak) ChangesThis patch introduces the This operation must always be nested within an Patch is 28.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79843.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 96c15e775a3024b..50b316a21554e0b 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -505,6 +505,9 @@ def WsLoopOp : OpenMP_Op<"wsloop", [AttrSizedOperandSegments,
/// Returns the number of reduction variables.
unsigned getNumReductionVars() { return getReductionVars().size(); }
+
+ /// Returns its nested 'omp.simd' operation, if present.
+ SimdOp getNestedSimd();
}];
let hasCustomAssemblyFormat = 1;
let assemblyFormat = [{
@@ -617,11 +620,84 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
let hasVerifier = 1;
}
+def SimdOp : OpenMP_Op<"simd",
+ [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>,
+ HasParent<"WsLoopOp">]> {
+ let summary = "simd construct";
+ let description = [{
+ The simd construct can be applied to a loop to indicate that the loop can be
+ transformed into a SIMD loop (that is, multiple iterations of the loop can
+ be executed concurrently using SIMD instructions).
+
+ This operation is intended to hold SIMD information for a worksharing loop
+ (i.e. "omp for simd"), so it must always be nested inside of a parent
+ "omp.wsloop" operation as its only child. For SIMD loops not combined with a
+ worksharing loop (i.e. "omp simd"), the "omp.simdloop" is used instead.
+
+ The body region can contain any number of blocks. The region is terminated
+ by "omp.yield" instruction without operands.
+
+ The `alignment_values` attribute additionally specifies alignment of each
+ corresponding aligned operand. Note that `aligned_vars` and
+ `alignment_values` should contain the same number of elements.
+
+ When an if clause is present and evaluates to false, the preferred number of
+ iterations to be executed concurrently is one, regardless of whether
+ a simdlen clause is specified.
+
+ The optional `nontemporal` attribute specifies variables which have low
+ temporal locality across the iterations where they are accessed.
+
+ The optional `order` attribute specifies which order the iterations of the
+ associate loops are executed in. Currently the only option for this
+ attribute is "concurrent".
+
+ When a simdlen clause is present, the preferred number of iterations to be
+ executed concurrently is the value provided to the simdlen clause.
+
+ The safelen clause specifies that no two concurrent iterations within a
+ SIMD chunk can have a distance in the logical iteration space that is
+ greater than or equal to the value given in the clause.
+ ```
+ omp.wsloop for (%i) : index = (%c0) to (%c10) step (%c1) {
+ omp.simd <clauses> {
+ // block operations
+ omp.yield
+ }
+ omp.yield
+ ```
+ }];
+
+ // TODO: Add other clauses
+ let arguments = (ins Variadic<OpenMP_PointerLikeType>:$aligned_vars,
+ OptionalAttr<I64ArrayAttr>:$alignment_values,
+ Optional<I1>:$if_expr,
+ Variadic<OpenMP_PointerLikeType>:$nontemporal_vars,
+ OptionalAttr<OrderKindAttr>:$order_val,
+ ConfinedAttr<OptionalAttr<I64Attr>, [IntPositive]>:$simdlen,
+ ConfinedAttr<OptionalAttr<I64Attr>, [IntPositive]>:$safelen
+ );
+
+ let regions = (region AnyRegion:$region);
+ let assemblyFormat = [{
+ oilist(`aligned` `(`
+ custom<AlignedClause>($aligned_vars, type($aligned_vars),
+ $alignment_values) `)`
+ |`if` `(` $if_expr `)`
+ |`nontemporal` `(` $nontemporal_vars `:` type($nontemporal_vars) `)`
+ |`order` `(` custom<ClauseAttr>($order_val) `)`
+ |`simdlen` `(` $simdlen `)`
+ |`safelen` `(` $safelen `)`
+ ) $region attr-dict
+ }];
+
+ let hasVerifier = 1;
+}
def YieldOp : OpenMP_Op<"yield",
[Pure, ReturnLike, Terminator,
ParentOneOf<["WsLoopOp", "ReductionDeclareOp",
- "AtomicUpdateOp", "SimdLoopOp"]>]> {
+ "AtomicUpdateOp", "SimdLoopOp", "SimdOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"omp.yield" yields SSA values from the OpenMP dialect op region and
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 13cc16125a2733e..caa888d030f7019 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1131,28 +1131,33 @@ void printLoopControl(OpAsmPrinter &p, Operation *op, Region ®ion,
}
//===----------------------------------------------------------------------===//
-// Verifier for Simd construct [2.9.3.1]
+// Verifier for Simd constructs [2.9.3.1]
//===----------------------------------------------------------------------===//
-LogicalResult SimdLoopOp::verify() {
- if (this->getLowerBound().empty()) {
- return emitOpError() << "empty lowerbound for simd loop operation";
- }
- if (this->getSimdlen().has_value() && this->getSafelen().has_value() &&
- this->getSimdlen().value() > this->getSafelen().value()) {
- return emitOpError()
+template <typename OpTy>
+static LogicalResult verifySimdOp(OpTy op) {
+ if (op.getSimdlen().has_value() && op.getSafelen().has_value() &&
+ op.getSimdlen().value() > op.getSafelen().value()) {
+ return op.emitOpError()
<< "simdlen clause and safelen clause are both present, but the "
"simdlen value is not less than or equal to safelen value";
}
- if (verifyAlignedClause(*this, this->getAlignmentValues(),
- this->getAlignedVars())
+ if (verifyAlignedClause(op, op.getAlignmentValues(), op.getAlignedVars())
.failed())
return failure();
- if (verifyNontemporalClause(*this, this->getNontemporalVars()).failed())
+ if (verifyNontemporalClause(op, op.getNontemporalVars()).failed())
return failure();
return success();
}
+LogicalResult SimdLoopOp::verify() {
+ if (this->getLowerBound().empty())
+ return emitOpError() << "empty lowerbound for simd loop operation";
+ return verifySimdOp(*this);
+}
+
+LogicalResult SimdOp::verify() { return verifySimdOp(*this); }
+
//===----------------------------------------------------------------------===//
// Verifier for Distribute construct [2.9.4.1]
//===----------------------------------------------------------------------===//
@@ -1329,7 +1334,34 @@ void WsLoopOp::build(OpBuilder &builder, OperationState &state,
state.addAttributes(attributes);
}
+SimdOp WsLoopOp::getNestedSimd() {
+ auto ops = this->getOps<SimdOp>();
+ assert(std::distance(ops.begin(), ops.end()) <= 1 &&
+ "There can only be a single omp.simd child at most");
+ return ops.empty() ? SimdOp() : *ops.begin();
+}
+
LogicalResult WsLoopOp::verify() {
+ // Check that, if it has an omp.simd child, it must be the only one.
+ bool hasSimd = false, hasOther = false;
+ for (auto &op : this->getOps()) {
+ if (isa<SimdOp>(op)) {
+ if (hasSimd)
+ return emitOpError() << "cannot have multiple 'omp.simd' child ops";
+ hasSimd = true;
+
+ if (hasOther)
+ break;
+ } else if (!op.hasTrait<OpTrait::IsTerminator>()) {
+ hasOther = true;
+ if (hasSimd)
+ break;
+ }
+ }
+ if (hasSimd && hasOther)
+ return emitOpError() << "if 'omp.simd' is a child, it must be the only "
+ "non-terminator child op";
+
return verifyReductionVarList(*this, getReductions(), getReductionVars());
}
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index 812b79e35595f04..29a6a078fad8344 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -192,6 +192,50 @@ llvm.func @test_omp_wsloop_dynamic_wrong_modifier3(%lb : i64, %ub : i64, %step :
// -----
+llvm.func @test_omp_wsloop_simd_multiple(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op cannot have multiple 'omp.simd' child ops}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ omp.simd {
+ omp.yield
+ }
+ omp.simd {
+ omp.yield
+ }
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
+llvm.func @test_omp_wsloop_simd_invalid_before(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op if 'omp.simd' is a child, it must be the only non-terminator child op}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ %c1 = arith.constant 1 : i32
+ omp.simd {
+ omp.yield
+ }
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
+llvm.func @test_omp_wsloop_simd_invalid_after(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op if 'omp.simd' is a child, it must be the only non-terminator child op}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ omp.simd {
+ omp.yield
+ }
+ %c1 = arith.constant 1 : i32
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
func.func @omp_simdloop(%lb : index, %ub : index, %step : i32) -> () {
// expected-error @below {{op failed to verify that all of {lowerBound, upperBound, step} have same type}}
"omp.simdloop" (%lb, %ub, %step) ({
@@ -205,6 +249,18 @@ func.func @omp_simdloop(%lb : index, %ub : index, %step : i32) -> () {
// -----
+func.func @omp_simd(%lb : index, %ub : index, %step : i32) -> () {
+ // expected-error @below {{'omp.simd' op expects parent op 'omp.wsloop'}}
+ "omp.simd" () ({
+ ^bb0(%iv: index):
+ omp.yield
+ }) {operandSegmentSizes = array<i32: 0, 0, 0>} : () -> ()
+
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_aligned(%lb : index, %ub : index, %step : index,
%data_var : memref<i32>) -> () {
// expected-error @below {{expected '->'}}
@@ -217,6 +273,20 @@ func.func @omp_simdloop_pretty_aligned(%lb : index, %ub : index, %step : index,
// -----
+func.func @omp_simd_pretty_aligned(%lb : index, %ub : index, %step : index,
+ %data_var : memref<i32>) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{expected '->'}}
+ omp.simd aligned(%data_var : memref<i32>) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_mismatch(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -231,6 +301,22 @@ func.func @omp_simdloop_aligned_mismatch(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_mismatch(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{op expected as many alignment values as aligned variables}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [128],
+ operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_negative(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -244,6 +330,21 @@ func.func @omp_simdloop_aligned_negative(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_negative(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{op alignment should be greater than 0}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [-1, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_unexpected_alignment(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -257,6 +358,21 @@ func.func @omp_simdloop_unexpected_alignment(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_unexpected_alignment(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{unexpected alignment values attribute}}
+ "omp.simd"() ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1, 128], operandSegmentSizes = array<i32: 0, 0, 0>} : () -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_float(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -270,6 +386,21 @@ func.func @omp_simdloop_aligned_float(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_float(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{failed to satisfy constraint: 64-bit integer array attribute}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1.5, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_the_same_var(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -283,6 +414,21 @@ func.func @omp_simdloop_aligned_the_same_var(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_the_same_var(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{aligned variable used more than once}}
+ "omp.simd"(%arg3, %arg3) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_nontemporal_the_same_var(%arg0 : index,
%arg1 : index,
%arg2 : index,
@@ -297,6 +443,22 @@ func.func @omp_simdloop_nontemporal_the_same_var(%arg0 : index,
// -----
+func.func @omp_simd_nontemporal_the_same_var(%arg0 : index,
+ %arg1 : index,
+ %arg2 : index,
+ %arg3 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{nontemporal variable used more than once}}
+ "omp.simd"(%arg3, %arg3) ({
+ "omp.yield"() : () -> ()
+ }) {operandSegmentSizes = array<i32: 0, 0, 2>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_order_value(%lb : index, %ub : index, %step : index) {
// expected-error @below {{invalid clause value: 'default'}}
omp.simdloop order(default) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -307,6 +469,19 @@ func.func @omp_simdloop_order_value(%lb : index, %ub : index, %step : index) {
// -----
+func.func @omp_simd_order_value(%lb : index, %ub : index, %step : index) {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{invalid clause value: 'default'}}
+ omp.simd order(default) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_simdlen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{op attribute 'simdlen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
omp.simdloop simdlen(0) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -317,6 +492,19 @@ func.func @omp_simdloop_pretty_simdlen(%lb : index, %ub : index, %step : index)
// -----
+func.func @omp_simd_pretty_simdlen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{op attribute 'simdlen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
+ omp.simd simdlen(0) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_safelen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{op attribute 'safelen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
omp.simdloop safelen(0) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -327,6 +515,19 @@ func.func @omp_simdloop_pretty_safelen(%lb : index, %ub : index, %step : index)
// -----
+func.func @omp_simd_pretty_safelen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{op attribute 'safelen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
+ omp.simd safelen(0) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_simdlen_safelen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{'omp.simdloop' op simdlen clause and safelen clause are both present, but the simdlen value is not less than or equal to safelen value}}
omp.simdloop simdlen(2) safelen(1) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -337,6 +538,19 @@ func.func @omp_simdloop_pretty_simdlen_safelen(%lb : index, %ub : index, %step :
// -----
+func.func @omp_simd_pretty_simdlen_safelen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{'omp.simd' op simdlen clause and safelen clause are both present, but the simdlen value is not less than or equal to safelen value}}
+ omp.simd simdlen(2) safelen(1) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
// expected-error @below {{op expects initializer region with one argument of the reduction type}}
omp.reduction.declare @add_f32 : f64
init {
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index ccf72ae31d439ed..2487f0eb5654e13 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -176,6 +176,21 @@ func.func @omp_wsloop(%lb : index, %ub : index, %step : index, %data_var : memre
}) {operandSegmentSizes = array<i32: 1,1,1,0,0,0,0>, nowait, schedule_val = #omp<schedulekind auto>} :
(index, index, index) -> ()
+ // CHECK: omp.wsloop
+ // CHECK-SAME: for (%{{.*}}) : index = (%{{.*}}) to (%{{.*}}) step (%{{.*}})
+ // CHECK-NEXT: omp.simd
+ // CHECK-NEXT: omp.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: omp.yield
+ "omp.wsloop" (%lb, %ub, %step) ({
+ ^bb0(%iv: index):
+ "omp.simd" () ({
+ omp.yield
+ }) : () -> ()
+ omp.yield
+ }) {operandSegmentSizes = array<i32: 1,1,1,0,0,0,0>} :
+ (index, index, index) -> ()
+
return
}
@@ -339,6 +354,19 @@ func.func @omp_simdloop(%lb : index, %ub : index, %step : index) -> () {
return
}
+// CHECK-LABEL: omp_simd
+func.func @omp_simd(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // CHECK: omp.simd {
+ "omp.simd" () ({
+ omp.yield
+ }) {operandSeg...
[truncated]
|
@llvm/pr-subscribers-mlir-openmp Author: Sergio Afonso (skatrak) ChangesThis patch introduces the This operation must always be nested within an Patch is 28.95 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79843.diff 4 Files Affected:
diff --git a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
index 96c15e775a3024b..50b316a21554e0b 100644
--- a/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
+++ b/mlir/include/mlir/Dialect/OpenMP/OpenMPOps.td
@@ -505,6 +505,9 @@ def WsLoopOp : OpenMP_Op<"wsloop", [AttrSizedOperandSegments,
/// Returns the number of reduction variables.
unsigned getNumReductionVars() { return getReductionVars().size(); }
+
+ /// Returns its nested 'omp.simd' operation, if present.
+ SimdOp getNestedSimd();
}];
let hasCustomAssemblyFormat = 1;
let assemblyFormat = [{
@@ -617,11 +620,84 @@ def SimdLoopOp : OpenMP_Op<"simdloop", [AttrSizedOperandSegments,
let hasVerifier = 1;
}
+def SimdOp : OpenMP_Op<"simd",
+ [AttrSizedOperandSegments, MemoryEffects<[MemWrite]>,
+ HasParent<"WsLoopOp">]> {
+ let summary = "simd construct";
+ let description = [{
+ The simd construct can be applied to a loop to indicate that the loop can be
+ transformed into a SIMD loop (that is, multiple iterations of the loop can
+ be executed concurrently using SIMD instructions).
+
+ This operation is intended to hold SIMD information for a worksharing loop
+ (i.e. "omp for simd"), so it must always be nested inside of a parent
+ "omp.wsloop" operation as its only child. For SIMD loops not combined with a
+ worksharing loop (i.e. "omp simd"), the "omp.simdloop" is used instead.
+
+ The body region can contain any number of blocks. The region is terminated
+ by "omp.yield" instruction without operands.
+
+ The `alignment_values` attribute additionally specifies alignment of each
+ corresponding aligned operand. Note that `aligned_vars` and
+ `alignment_values` should contain the same number of elements.
+
+ When an if clause is present and evaluates to false, the preferred number of
+ iterations to be executed concurrently is one, regardless of whether
+ a simdlen clause is specified.
+
+ The optional `nontemporal` attribute specifies variables which have low
+ temporal locality across the iterations where they are accessed.
+
+ The optional `order` attribute specifies which order the iterations of the
+ associate loops are executed in. Currently the only option for this
+ attribute is "concurrent".
+
+ When a simdlen clause is present, the preferred number of iterations to be
+ executed concurrently is the value provided to the simdlen clause.
+
+ The safelen clause specifies that no two concurrent iterations within a
+ SIMD chunk can have a distance in the logical iteration space that is
+ greater than or equal to the value given in the clause.
+ ```
+ omp.wsloop for (%i) : index = (%c0) to (%c10) step (%c1) {
+ omp.simd <clauses> {
+ // block operations
+ omp.yield
+ }
+ omp.yield
+ ```
+ }];
+
+ // TODO: Add other clauses
+ let arguments = (ins Variadic<OpenMP_PointerLikeType>:$aligned_vars,
+ OptionalAttr<I64ArrayAttr>:$alignment_values,
+ Optional<I1>:$if_expr,
+ Variadic<OpenMP_PointerLikeType>:$nontemporal_vars,
+ OptionalAttr<OrderKindAttr>:$order_val,
+ ConfinedAttr<OptionalAttr<I64Attr>, [IntPositive]>:$simdlen,
+ ConfinedAttr<OptionalAttr<I64Attr>, [IntPositive]>:$safelen
+ );
+
+ let regions = (region AnyRegion:$region);
+ let assemblyFormat = [{
+ oilist(`aligned` `(`
+ custom<AlignedClause>($aligned_vars, type($aligned_vars),
+ $alignment_values) `)`
+ |`if` `(` $if_expr `)`
+ |`nontemporal` `(` $nontemporal_vars `:` type($nontemporal_vars) `)`
+ |`order` `(` custom<ClauseAttr>($order_val) `)`
+ |`simdlen` `(` $simdlen `)`
+ |`safelen` `(` $safelen `)`
+ ) $region attr-dict
+ }];
+
+ let hasVerifier = 1;
+}
def YieldOp : OpenMP_Op<"yield",
[Pure, ReturnLike, Terminator,
ParentOneOf<["WsLoopOp", "ReductionDeclareOp",
- "AtomicUpdateOp", "SimdLoopOp"]>]> {
+ "AtomicUpdateOp", "SimdLoopOp", "SimdOp"]>]> {
let summary = "loop yield and termination operation";
let description = [{
"omp.yield" yields SSA values from the OpenMP dialect op region and
diff --git a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
index 13cc16125a2733e..caa888d030f7019 100644
--- a/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
+++ b/mlir/lib/Dialect/OpenMP/IR/OpenMPDialect.cpp
@@ -1131,28 +1131,33 @@ void printLoopControl(OpAsmPrinter &p, Operation *op, Region ®ion,
}
//===----------------------------------------------------------------------===//
-// Verifier for Simd construct [2.9.3.1]
+// Verifier for Simd constructs [2.9.3.1]
//===----------------------------------------------------------------------===//
-LogicalResult SimdLoopOp::verify() {
- if (this->getLowerBound().empty()) {
- return emitOpError() << "empty lowerbound for simd loop operation";
- }
- if (this->getSimdlen().has_value() && this->getSafelen().has_value() &&
- this->getSimdlen().value() > this->getSafelen().value()) {
- return emitOpError()
+template <typename OpTy>
+static LogicalResult verifySimdOp(OpTy op) {
+ if (op.getSimdlen().has_value() && op.getSafelen().has_value() &&
+ op.getSimdlen().value() > op.getSafelen().value()) {
+ return op.emitOpError()
<< "simdlen clause and safelen clause are both present, but the "
"simdlen value is not less than or equal to safelen value";
}
- if (verifyAlignedClause(*this, this->getAlignmentValues(),
- this->getAlignedVars())
+ if (verifyAlignedClause(op, op.getAlignmentValues(), op.getAlignedVars())
.failed())
return failure();
- if (verifyNontemporalClause(*this, this->getNontemporalVars()).failed())
+ if (verifyNontemporalClause(op, op.getNontemporalVars()).failed())
return failure();
return success();
}
+LogicalResult SimdLoopOp::verify() {
+ if (this->getLowerBound().empty())
+ return emitOpError() << "empty lowerbound for simd loop operation";
+ return verifySimdOp(*this);
+}
+
+LogicalResult SimdOp::verify() { return verifySimdOp(*this); }
+
//===----------------------------------------------------------------------===//
// Verifier for Distribute construct [2.9.4.1]
//===----------------------------------------------------------------------===//
@@ -1329,7 +1334,34 @@ void WsLoopOp::build(OpBuilder &builder, OperationState &state,
state.addAttributes(attributes);
}
+SimdOp WsLoopOp::getNestedSimd() {
+ auto ops = this->getOps<SimdOp>();
+ assert(std::distance(ops.begin(), ops.end()) <= 1 &&
+ "There can only be a single omp.simd child at most");
+ return ops.empty() ? SimdOp() : *ops.begin();
+}
+
LogicalResult WsLoopOp::verify() {
+ // Check that, if it has an omp.simd child, it must be the only one.
+ bool hasSimd = false, hasOther = false;
+ for (auto &op : this->getOps()) {
+ if (isa<SimdOp>(op)) {
+ if (hasSimd)
+ return emitOpError() << "cannot have multiple 'omp.simd' child ops";
+ hasSimd = true;
+
+ if (hasOther)
+ break;
+ } else if (!op.hasTrait<OpTrait::IsTerminator>()) {
+ hasOther = true;
+ if (hasSimd)
+ break;
+ }
+ }
+ if (hasSimd && hasOther)
+ return emitOpError() << "if 'omp.simd' is a child, it must be the only "
+ "non-terminator child op";
+
return verifyReductionVarList(*this, getReductions(), getReductionVars());
}
diff --git a/mlir/test/Dialect/OpenMP/invalid.mlir b/mlir/test/Dialect/OpenMP/invalid.mlir
index 812b79e35595f04..29a6a078fad8344 100644
--- a/mlir/test/Dialect/OpenMP/invalid.mlir
+++ b/mlir/test/Dialect/OpenMP/invalid.mlir
@@ -192,6 +192,50 @@ llvm.func @test_omp_wsloop_dynamic_wrong_modifier3(%lb : i64, %ub : i64, %step :
// -----
+llvm.func @test_omp_wsloop_simd_multiple(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op cannot have multiple 'omp.simd' child ops}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ omp.simd {
+ omp.yield
+ }
+ omp.simd {
+ omp.yield
+ }
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
+llvm.func @test_omp_wsloop_simd_invalid_before(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op if 'omp.simd' is a child, it must be the only non-terminator child op}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ %c1 = arith.constant 1 : i32
+ omp.simd {
+ omp.yield
+ }
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
+llvm.func @test_omp_wsloop_simd_invalid_after(%lb : i64, %ub : i64, %step : i64) -> () {
+ // expected-error @+1 {{op if 'omp.simd' is a child, it must be the only non-terminator child op}}
+ omp.wsloop for (%iv) : i64 = (%lb) to (%ub) step (%step) {
+ omp.simd {
+ omp.yield
+ }
+ %c1 = arith.constant 1 : i32
+ omp.yield
+ }
+ llvm.return
+}
+
+// -----
+
func.func @omp_simdloop(%lb : index, %ub : index, %step : i32) -> () {
// expected-error @below {{op failed to verify that all of {lowerBound, upperBound, step} have same type}}
"omp.simdloop" (%lb, %ub, %step) ({
@@ -205,6 +249,18 @@ func.func @omp_simdloop(%lb : index, %ub : index, %step : i32) -> () {
// -----
+func.func @omp_simd(%lb : index, %ub : index, %step : i32) -> () {
+ // expected-error @below {{'omp.simd' op expects parent op 'omp.wsloop'}}
+ "omp.simd" () ({
+ ^bb0(%iv: index):
+ omp.yield
+ }) {operandSegmentSizes = array<i32: 0, 0, 0>} : () -> ()
+
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_aligned(%lb : index, %ub : index, %step : index,
%data_var : memref<i32>) -> () {
// expected-error @below {{expected '->'}}
@@ -217,6 +273,20 @@ func.func @omp_simdloop_pretty_aligned(%lb : index, %ub : index, %step : index,
// -----
+func.func @omp_simd_pretty_aligned(%lb : index, %ub : index, %step : index,
+ %data_var : memref<i32>) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{expected '->'}}
+ omp.simd aligned(%data_var : memref<i32>) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_mismatch(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -231,6 +301,22 @@ func.func @omp_simdloop_aligned_mismatch(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_mismatch(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{op expected as many alignment values as aligned variables}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [128],
+ operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_negative(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -244,6 +330,21 @@ func.func @omp_simdloop_aligned_negative(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_negative(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{op alignment should be greater than 0}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [-1, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_unexpected_alignment(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -257,6 +358,21 @@ func.func @omp_simdloop_unexpected_alignment(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_unexpected_alignment(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{unexpected alignment values attribute}}
+ "omp.simd"() ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1, 128], operandSegmentSizes = array<i32: 0, 0, 0>} : () -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_float(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -270,6 +386,21 @@ func.func @omp_simdloop_aligned_float(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_float(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{failed to satisfy constraint: 64-bit integer array attribute}}
+ "omp.simd"(%arg3, %arg4) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1.5, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_aligned_the_same_var(%arg0 : index, %arg1 : index,
%arg2 : index, %arg3 : memref<i32>,
%arg4 : memref<i32>) -> () {
@@ -283,6 +414,21 @@ func.func @omp_simdloop_aligned_the_same_var(%arg0 : index, %arg1 : index,
// -----
+func.func @omp_simd_aligned_the_same_var(%arg0 : index, %arg1 : index,
+ %arg2 : index, %arg3 : memref<i32>,
+ %arg4 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{aligned variable used more than once}}
+ "omp.simd"(%arg3, %arg3) ({
+ "omp.yield"() : () -> ()
+ }) {alignment_values = [1, 128], operandSegmentSizes = array<i32: 2, 0, 0>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_nontemporal_the_same_var(%arg0 : index,
%arg1 : index,
%arg2 : index,
@@ -297,6 +443,22 @@ func.func @omp_simdloop_nontemporal_the_same_var(%arg0 : index,
// -----
+func.func @omp_simd_nontemporal_the_same_var(%arg0 : index,
+ %arg1 : index,
+ %arg2 : index,
+ %arg3 : memref<i32>) -> () {
+ omp.wsloop for (%arg5) : index = (%arg0) to (%arg1) step (%arg2) {
+ // expected-error @below {{nontemporal variable used more than once}}
+ "omp.simd"(%arg3, %arg3) ({
+ "omp.yield"() : () -> ()
+ }) {operandSegmentSizes = array<i32: 0, 0, 2>} : (memref<i32>, memref<i32>) -> ()
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_order_value(%lb : index, %ub : index, %step : index) {
// expected-error @below {{invalid clause value: 'default'}}
omp.simdloop order(default) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -307,6 +469,19 @@ func.func @omp_simdloop_order_value(%lb : index, %ub : index, %step : index) {
// -----
+func.func @omp_simd_order_value(%lb : index, %ub : index, %step : index) {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{invalid clause value: 'default'}}
+ omp.simd order(default) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_simdlen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{op attribute 'simdlen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
omp.simdloop simdlen(0) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -317,6 +492,19 @@ func.func @omp_simdloop_pretty_simdlen(%lb : index, %ub : index, %step : index)
// -----
+func.func @omp_simd_pretty_simdlen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{op attribute 'simdlen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
+ omp.simd simdlen(0) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_safelen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{op attribute 'safelen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
omp.simdloop safelen(0) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -327,6 +515,19 @@ func.func @omp_simdloop_pretty_safelen(%lb : index, %ub : index, %step : index)
// -----
+func.func @omp_simd_pretty_safelen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{op attribute 'safelen' failed to satisfy constraint: 64-bit signless integer attribute whose value is positive}}
+ omp.simd safelen(0) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
func.func @omp_simdloop_pretty_simdlen_safelen(%lb : index, %ub : index, %step : index) -> () {
// expected-error @below {{'omp.simdloop' op simdlen clause and safelen clause are both present, but the simdlen value is not less than or equal to safelen value}}
omp.simdloop simdlen(2) safelen(1) for (%iv): index = (%lb) to (%ub) step (%step) {
@@ -337,6 +538,19 @@ func.func @omp_simdloop_pretty_simdlen_safelen(%lb : index, %ub : index, %step :
// -----
+func.func @omp_simd_pretty_simdlen_safelen(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // expected-error @below {{'omp.simd' op simdlen clause and safelen clause are both present, but the simdlen value is not less than or equal to safelen value}}
+ omp.simd simdlen(2) safelen(1) {
+ omp.yield
+ }
+ omp.yield
+ }
+ return
+}
+
+// -----
+
// expected-error @below {{op expects initializer region with one argument of the reduction type}}
omp.reduction.declare @add_f32 : f64
init {
diff --git a/mlir/test/Dialect/OpenMP/ops.mlir b/mlir/test/Dialect/OpenMP/ops.mlir
index ccf72ae31d439ed..2487f0eb5654e13 100644
--- a/mlir/test/Dialect/OpenMP/ops.mlir
+++ b/mlir/test/Dialect/OpenMP/ops.mlir
@@ -176,6 +176,21 @@ func.func @omp_wsloop(%lb : index, %ub : index, %step : index, %data_var : memre
}) {operandSegmentSizes = array<i32: 1,1,1,0,0,0,0>, nowait, schedule_val = #omp<schedulekind auto>} :
(index, index, index) -> ()
+ // CHECK: omp.wsloop
+ // CHECK-SAME: for (%{{.*}}) : index = (%{{.*}}) to (%{{.*}}) step (%{{.*}})
+ // CHECK-NEXT: omp.simd
+ // CHECK-NEXT: omp.yield
+ // CHECK-NEXT: }
+ // CHECK-NEXT: omp.yield
+ "omp.wsloop" (%lb, %ub, %step) ({
+ ^bb0(%iv: index):
+ "omp.simd" () ({
+ omp.yield
+ }) : () -> ()
+ omp.yield
+ }) {operandSegmentSizes = array<i32: 1,1,1,0,0,0,0>} :
+ (index, index, index) -> ()
+
return
}
@@ -339,6 +354,19 @@ func.func @omp_simdloop(%lb : index, %ub : index, %step : index) -> () {
return
}
+// CHECK-LABEL: omp_simd
+func.func @omp_simd(%lb : index, %ub : index, %step : index) -> () {
+ omp.wsloop for (%iv) : index = (%lb) to (%ub) step (%step) {
+ // CHECK: omp.simd {
+ "omp.simd" () ({
+ omp.yield
+ }) {operandSeg...
[truncated]
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenMP standard: https://www.openmp.org/spec-html/5.0/openmpsu42.html adds some restrictions for ordered
, linear
and firstprivate
clauses for omp simd do
.
I see no test for these restrictions. Could you add it to the verifier? ordered
, linear
and firstprivate
relate to omp do
: https://www.openmp.org/spec-html/5.0/openmpsu41.html
|
@@ -1131,28 +1131,33 @@ void printLoopControl(OpAsmPrinter &p, Operation *op, Region ®ion, | |||
} | |||
|
|||
//===----------------------------------------------------------------------===// | |||
// Verifier for Simd construct [2.9.3.1] | |||
// Verifier for Simd constructs [2.9.3.1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: 2.9.3.1 refers to omp simd
construct. Could you add referral to omp do simd
as well? It's 2.9.3.2
Thanks Kiran for the feedback, I'll try to answer your concerns below.
The section describing this in the 5.0 spec is 2.9.3.2, as @DominikAdamski pointed out. I'll add references to it to the relevant places.
I think these are some of the options that we were considering, so it's a matter of choosing what seems more reasonable to everyone. The representation proposed here is something like the following: omp.wsloop for (%i) : index = (%lb) to (%up) step (%step) <workshare-loop-specific clauses> {
omp.simd <simd-specific clauses> {
...
omp.yield
}
omp.yield
} The idea behind that is to follow what the same approach as the One other thing that could be done is to extend the omp.wsloop for simd (%i) : index = (%lb) to (%ub) step (%step) <workshare-loop + simd clauses> {
...
omp.yield
} The addition of "simd" after "for" is a way to convey there being an MLIR I'm not against using ops to represent composite constructs, since they have their own unique behavior. So, in this case, my understanding is that it should look similar to this: omp.wsloop_simd for (%i) : index = (%lb) to (%ub) step (%step) <workshare-loop-simd-specific clauses> {
...
omp.yield
} I guess the main reason not to go this route is that it is not done for any other composite construct. For this proposal I went with the alternative for which there is some precedent in the OpenMP dialect. Though it may make sense to do this for all composite constructs, assuming the number of them isn't too large and make the dialect unnecessarily complicated/redundant.
These are just the clauses that were supported by Thinking about it a bit more, it would be possible to make some sort of 2-level representation system for composite constructs. The first level would be to just represent each allowed composite construct as its own MLIR operation, which is what the frontend would produce. Then, there would be an OpenMP dialect MLIR pass to split them up according to their semantics. In this case, from the omp.wsloop (%ii) : index = (%lb) to (%ub) step (%block_size) <worksharing-loop-specific clauses> {
<calculate bounds of SIMD loop>
omp.simdloop (%i): index = (%block_lb) to (%block_ub) step (%block_step) <simd-specific clauses> {
...
omp.yield
}
omp.yield
} The good thing about something like this is that when reaching the MLIR to LLVM IR translation, we will get the same input MLIR for equivalent source code regardless of whether app developers used a set of composite/combined constructs or they defined each one of the constructs into various nested loops. Then, the problem would be to recognize these patterns again to be able to target the OpenMP runtime, where certain construct combinations can already be targeted independently. |
While it's not strictly related to this PR, since this is a short-term implementation that likely will replace with the final implementation, I'd be careful with this in the long run. The OpenMP API has several composite constructs and the list might be growing with 6.0. If each of them needs its own MLIR operation, we will end up with many MLIR operations. So, we need to find a way to come up with a scalable representation, that can accommodate for all combinations possible w/o polluting the MLIR operations space too much. |
I think the representation of composite constructs in the dialect would be necessary. Whether we can split in the MLIR layer without inserting runtime calls, I am not 100% sure. If we have to insert runtime calls then that has to be delayed till we interface with OpenMPIRBuilder. Also, the existing lowering of worksharing-loops in the OpenMPIRBuilder/Translation might already be forming the inner loops. So this might not directly apply. We also have other transformations like collapse that can affect the structure of the loop. |
I've gone through all the legal OpenMP combined and composite constructs (I hope I didn't miss any) to see which additional operations would be needed in order to represent composite constructs independently, and to also compare how they would look in MLIR compared to "wrapper" ops similar what's been proposed for The table below lists all these combined/composite constructs as they would be represented using composite ops or wrapper ops (taking some liberties with ops that aren't in the dialect yet), but as a summary what I have found out is that as of OpenMP 5.2, we'd only need to add 5 of these operations: The main problem I see with these is redundancy. The set of accepted arguments for these operations would be the union of all accepted arguments for their leaf constructs, and also they all turned out to represent loops (with indices, ranges, etc). I suppose the second concern would be addressed by keeping all loop-related information into the upcoming So I think that it might be a good idea to indeed create these composite MLIR operations, since they represent distinct behaviors and their shortcomings seem solvable in principle. With regards to the options of de-composing operations in an MLIR pass or leaving that to the MLIR to LLVM IR translation stage, I guess we could figure that out over time. Probably the second approach would be easiest to start. It'd be interesting to hear if @mjklemm, @jsjodin, @DominikAdamski, @kparzysz or @agozillon have any comments on this. P.S. Sorry for the massive table, couldn't make it smaller 😅.
|
Wow! That's impressive, thanks for compiling this. I have two comments:
|
@skatrak Thanks for great work with summarizing the OpenMP constructs. Let me express my thoughts:
|
Thanks for giving it a look. With regards to the first point, the 5 composite constructs that I was able to identify are in the first 5 rows of the table. The rest are combined constructs which in some cases contain one of these composite constructs inside. This can be seen by looking at the second column, where nesting of an operation inside another represents 'combination' and 'composition' is represented by an individual operation including the name of multiple leaf constructs. Maybe the table can be split after the first 5 rows, but it seems misleading to me saying that e.g. Concerning the increase in the number of these constructs, what are these additions related to? Are they loop transformations or are there going to be significant additions to parallelism generation/control and work distribution constructs? Combined constructs can already be represented in a scalable way, through nesting of ops, so I was thinking that we'd only have to represent composite constructs related to the last two categories, because loop transformations would be handled independently as well. I don't know whether there are caveats to that, but my thinking was that after introducing %cli = omp.canonical_loop %i... {
BODY
}
// loop transformations (e.g. %1 = omp.tile %cli...) here, before execution resulting in a single loop nest stored in %loop
// !$omp do simd (composite construct)
omp.wssimdloop %loop <do,simd-clauses>
// !$omp parallel do (combined construct)
omp.parallel <parallel-clauses> {
omp.wsloop %loop <do-clauses>
}
// !$omp teams distribute parallel do (combined construct with composite construct inside)
omp.target <target-clauses> {
omp.teams <teams-clauses> {
omp.distparwsloop %loop <distribute,parallel,do-clauses>
}
} In that case, we shouldn't hopefully have to add many new operations. Only the set of 5 above and those related to new parallelism generation/control and work distribution constructs. How Maybe we could instead split even these composite constructs into their leaf ops instead. Here, the composite ops are built in advance by chaining the loop returned by one as the input of the next, but I'm not sure if that's even an improvement: %cli = omp.canonical_loop %i... {
BODY
}
// loop transformations (e.g. %1 = omp.tile %cli...) here, before execution resulting in a single loop nest stored in %loop
// !$omp do simd (composite construct)
%1 = omp.wsloop %loop <do-clauses>
%2 = omp.simdloop %1 <simd-clauses>
omp.execute %2
// !$omp parallel do (combined construct)
%1 = omp.wsloop %loop <do-clauses>
omp.parallel <parallel-clauses> {
omp.execute %1
}
// !$omp target teams distribute parallel do (combined construct with composite construct inside)
%1 = omp.distribute %loop <distribute-clauses>
%2 = omp.parwsloop %1 <parallel,do-clauses> // Not sure if this could be expressed in a better way
omp.target <target-clauses> {
omp.teams <teams-clauses> {
omp.execute %2
}
} |
Ok, that makes sense to me.
Sort of. So, there will be a set of rules that enable many more combinations of combined constructs. While the debate has not quite settled yet and the OpenMP ARB is looking at the number of new combinations, I'd expect that there will quite a few being added, e.g., Plus, there will be new loop transforming directives, such as loop splitting, loop reversal, etc. Those you will be able to freely combine, for instance:
I do not know either. If things are already implemented such that new (loop) constructs and combinations thereof can easily be added, it's great! |
Using wrapper ops seems to be a better option imo since it should be easier to extend and we avoid the combinatorial explosion. I have one question about if there are cases where writing something on two separate omp lines vs a single line would change the semantics, e.g. in one case it is a combined construct but the other it isn't? I believe another issue with not using wrapper ops is that it may not be possible to create the combined ops directly if there are loop transformation ops present, which means the wrapper ops will have to be used anyway. To clarify, if the loop transformations are handled by the OpenMPIRBuilder, then there is no point in the compilation where combined ops could be created. I'm not convinced this is the right approach though, it would make sense to me to to the loop transformations first and then do the lowering. There is another problem which @DominikAdamski was mentioning and that I've been trying to get some clarity on, which is how to communicate information between op lowerings since there are dependencies between e.g. distribute and parallel/wsloop, this will have to be solved. In the case of loops there is a proposal to use CLIs, but this is only one kind of information that needs to be communicated, another is reduction information, and there are other cases as well (collapse?). I think it makes sense that the OpenMPIRBuilder keeps track of these things. @kiranchandramohan suggested using block arguments for CLIs, which probably fits better with this approach, and the solution would be more uniform compared to special handling for CLIs. Another option would be to have other kind of lowering information represented in MLIR as values like CLIs, but it might degenerate if CLIs get expanded to hold other information into having a single value linking all the omp ops, which would just represent the state in the OpenMPIRBuilder. This is why I"m leaning towards wrapper ops with a simple recursive traversal and the OpenMPIRBuilder keeping the information that needs to be passed between the op lowerings. |
Thanks Dominik for sharing your thoughts on this, and excuse the delay in getting back to you. I'll try to share what I think about these.
|
Thanks Jan for adding to the discussion.
Here I think the main factor would be how many new composite constructs would be added, because combined constructs are already and should remain represented as wrapper ops, and they make up most of the table above. If there's a chance that many new composite constructs are added (or even one or two new single constructs are added that can be composite with all existing ones) then the option of having composite ops really makes no sense. As it stands right now, both alternatives still make sense.
Yes, so a combined construct can be split into two constructs where the second is nested directly inside of the first. The first is, as far as I can tell, always a leaf construct (i.e. not combined or composite). So, for example,
I'm not as familiar to the discussions on representing loop transformations in the OpenMP dialect, but I think that whatever it ends up looking like it should be in a way independent from the parallelism-generating and work-distributing operations. By that I mean that it shouldn't matter whether a loop has been transformed or not by the time we want to run it. We should try to split the what from the how (what loop to run vs how to execute the loop).
While I don't currently get the full picture, the dependencies you mention between distribute and parallel/wsloop seem to stem from the fact that |
Let me add some points to the discussion: 1. Exactness representation of wrapper approach.
denotes:
User is not allowed to specify another pragma after
and:
2. Upstream definition of 3. We have pretty good support for omp.wsloop inside OpenMPIRBuilder 4. |
|
||
if (hasOther) | ||
break; | ||
} else if (!op.hasTrait<OpTrait::IsTerminator>()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the omp.wsloop
support other terminators than omp.yield
? If not, should we verify that it is indeed a yield
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By looking at the OpenMP dialect definition and existing verifiers, it appears that omp.terminator
is currently a valid terminator for omp.wsloop
(omp.yield
does restrict which ops can be a parent, but there's no such thing for omp.terminator
), so rather than introducing restrictions that currently don't exist (not sure whether intentionally or not) I decided to just accept any terminator here.
Below is one example that hopefully illustrates the various issues.
is equivalent to:
If the lowering of the |
Thanks for the clarification. However, I still can't quite understand how using composite operations to represent distribute-parallel-for in that example vs wrapper ops is any different with regards to this loop transformation problem. If we're able to represent the loop transformations in MLIR, even if we only actually produce these loops in the OMPIRBuilder due to it being the place where loop transformations are eventually applied, we should be able to then apply the corresponding parallelization/worksharing constructs to the resulting top-level loop. Using your example above, and making assumptions as to how loops and transformations might be represented in MLIR: // Define loop body
%loop = omp.canonical_loop %i : i32 context(...) {
// Use additional block args to access context values defined in the execution site
llvm.call @code(%i) : (i32) -> ()
omp.terminator
} (...) -> !omp.loop
// Represent transformations
%tloop = omp.tile %loop { sizes=[2] } : (!omp.loop) -> (!omp.loop)
// OPTION 1: Composite ops
omp.target {
%n = ... : i32
%c0 = arith.constant 0 : i32
%c1 = arith.constant 1 : i32
omp.teams {
// Execute loop, specifying pre-transformations loop range
omp.distparwsloop %tloop from(%c0) to(%n) step(%c1) context(...) : (!omp.loop, i32, i32, i32, ...) -> ()
omp.terminator
}
omp.terminator
}
// OPTION 2: Wrapper ops
omp.target {
%n = ... : i32
%c0 = arith.constant 0 : i32
%c1 = arith.constant 1 : i32
omp.teams {
omp.distribute {
omp.parallel {
// Execute loop, specifying pre-transformations loop range
omp.wsloop %tloop from(%c0) to(%n) step(%c1) context(...) : (!omp.loop, i32, i32, i32, ...) -> ()
omp.terminator
}
omp.terminator
}
omp.terminator
}
omp.terminator
} In the example above, there are probably some challenges related to how to lower the loop body and transformations to it since it is defined outside of the place where it is eventually executed. Maybe it should be ignored at first and then processed when lowering the ops that run it ( From the MLIR to LLVM IR translation perspective, the wrapper approach would make the following calls:
Whereas the composite approach would make the following calls:
It seems to me that |
We already have some lowering (OpenMP+LLVM dialect -> LLVM IR) where the loop transformation is not directly applied to the immediate loop. The best example of this is collapse followed by worksharing (see links below). There will also be a similar one for simd. llvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp Line 949 in b5a273a
llvm-project/mlir/lib/Target/LLVMIR/Dialect/OpenMP/OpenMPToLLVMIRTranslation.cpp Line 960 in b5a273a
Basically there will be some canonical loop infos that will be processed by various loop transformations (collapse, tile, unroll etc). The CanonicalLoopInfo transformation in the OpenMPIRBuilder is already set up for this purpose. And I believe it already supports tile, unroll, collapse. So for these transformations going ahead with the OpenMPIRBuilder approach might be the easiest for you. I would anticipate that the wrapper operation codegen (openmp + llvm dialect -> llvm ir) will generate canonical loop infos and deposit on the stack if it has a parent operation that will further modify it. |
Yes, I think I misunderstood a bit how to interpret the composite ops. If they simply are collapsed versions of the nested ones both are equivalent. It would also be possible to keep the wrapper ops and have lowering functions that would would identify and lower multiple ops in one step.
If we can keep the information sharing within a single lowering function would be nice. Adding more state to the OMPIRBuilder is a concern, because it is fairly easy to create an inconsistent state if data is not added and cleared in the right places during lowering. If we decide to use wrapper ops, we could still have lowering functions that handle composite constructs to avoid adding state to the OMPIRBuilder. That way the representation is more flexible with no risk of a combinatorial explosion, but the implementation of the lowering would have this problem to some extent, although it can be minimized by having functions to share the implementation as you mentioned. |
Discussion about combined/composite constructs continued here: https://discourse.llvm.org/t/rfc-representing-combined-composite-constructs-in-the-openmp-dialect/76986 |
This patch is replaced by: #87365. The wrapper approach will allow the use of the same |
This patch introduces the
omp.simd
operation. In contrast to the existingomp.simdloop
operation, it is intended to hold SIMD information within worksharing loops, rather than representing a SIMD-only loop. Some examples of such loops are "omp do/for simd", "omp distribute simd", "omp target teams distribute parallel do/for simd", etc. For more context on this work, refer to PR #79559.This operation must always be nested within an
omp.wsloop
operation as its only non-terminator child. It follows the same approach as theomp.distribute
operation, by serving as a simple wrapper operation holding clause information.