Skip to content

[MLIR][Linalg] Introduce transpose/broadcast semantic to linalg.batch… #130944

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -1717,76 +1717,6 @@ structured_op: !LinalgStructuredOpConfig
- !ScalarExpression
scalar_arg: BZp
--- !LinalgOpConfig
metadata: !LinalgOpMetadata
name: batch_reduce_matmul
cpp_class_name: BatchReduceMatmulOp
doc: |-
Performs a batch-reduce matrix multiplication of two 3D inputs.
The partial multiplication results are reduced into a 2D output.

Numeric casting is performed on the operands to the inner multiply, promoting
them to the same data type as the accumulator/output.
implements:
- LinalgContractionOpInterface
structured_op: !LinalgStructuredOpConfig
args:
- !LinalgOperandDefConfig
name: A
kind: input_tensor
type_var: T1
shape_map: affine_map<()[s0, s1, s2, s3] -> (s0, s1, s2)>
- !LinalgOperandDefConfig
name: B
kind: input_tensor
type_var: T2
shape_map: affine_map<()[s0, s1, s2, s3] -> (s0, s2, s3)>
- !LinalgOperandDefConfig
name: C
kind: output_tensor
type_var: U
shape_map: affine_map<()[s0, s1, s2, s3] -> (s1, s3)>
indexing_maps: !LinalgIndexingMapsConfig
static_indexing_maps:
- affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0, d1, d3)>
- affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d0, d3, d2)>
- affine_map<(d0, d1, d2, d3)[s0, s1, s2, s3] -> (d1, d2)>
iterator_types:
- reduction
- parallel
- parallel
- reduction
assignments:
- !ScalarAssign
arg: C
value: !ScalarExpression
scalar_fn:
kind: binary
fn_name: add
operands:
- !ScalarExpression
scalar_arg: C
- !ScalarExpression
scalar_fn:
kind: binary
fn_name: mul
operands:
- !ScalarExpression
scalar_fn:
kind: type
fn_name: cast_signed
type_var: U
operands:
- !ScalarExpression
scalar_arg: A
- !ScalarExpression
scalar_fn:
kind: type
fn_name: cast_signed
type_var: U
operands:
- !ScalarExpression
scalar_arg: B
--- !LinalgOpConfig
metadata: !LinalgOpMetadata
name: matvec
cpp_class_name: MatvecOp
Expand Down
209 changes: 166 additions & 43 deletions mlir/include/mlir/Dialect/Linalg/IR/LinalgStructuredOps.td
Original file line number Diff line number Diff line change
Expand Up @@ -690,34 +690,32 @@ def MatmulOp : LinalgStructuredBase_Op<"matmul", [

Example Transpose:
```mlir
linalg.matmul indexing_maps = [
affine_map<(d0, d1, d2) -> (d2, d0)>, // transpose
affine_map<(d0, d1, d2) -> (d2, d1)>,
affine_map<(d0, d1, d2) -> (d0, d1)>
]
ins(%arg0, %arg1 : memref<5x3xf32>,memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
linalg.matmul
indexing_maps = [affine_map<(m, n, k) -> (k, m)>, // transpose
affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<5x3xf32>,memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
```

Example Broadcast:
```mlir
linalg.matmul indexing_maps = [
affine_map<(d0, d1, d2) -> (d2)>, // broadcast
affine_map<(d0, d1, d2) -> (d2, d1)>,
affine_map<(d0, d1, d2) -> (d0, d1)>
]
ins(%arg0, %arg1 : memref<3xf32>, memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
```mlir
linalg.matmul
indexing_maps = [affine_map<(m, n, k) -> (k)>, // broadcast
affine_map<(m, n, k) -> (k, n)>,
affine_map<(m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<3xf32>, memref<5x7xf32>)
outs(%arg2: memref<3x7xf32>)
```

Example Broadcast and transpose:
```mlir
linalg.matmul indexing_maps = [
affine_map<(d0, d1, d2) -> (d2, d0)>, // transpose
affine_map<(d0, d1, d2) -> (d2)>, // broadcast
affine_map<(d0, d1, d2) -> (d0, d1)>
]
ins(%arg0, %arg1 : memref<5x3xf32>, memref<7xf32>) outs(%arg2: memref<3x7xf32>)
linalg.matmul
indexing_maps = [affine_map<(m, n, k) -> (k, m)>, // transpose
affine_map<(m, n, k) -> (k)>, // broadcast
affine_map<(m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<5x3xf32>, memref<7xf32>)
outs(%arg2: memref<3x7xf32>)
```
}];

Expand Down Expand Up @@ -775,7 +773,7 @@ def MatmulOp : LinalgStructuredBase_Op<"matmul", [
static void regionBuilder(ImplicitLocOpBuilder &b,
Block &block, ArrayRef<NamedAttribute> attrs);

/// Returns a list of AffineMap with the typical matmul indexing charactristic.
/// Returns a list of AffineMap with the default matmul indexing charactristic.
static SmallVector<AffineMap> getDefaultIndexingMaps(MLIRContext *context);

/// Returns true if the given broadcast map \p bcastMap is valid for this op.
Expand Down Expand Up @@ -954,35 +952,32 @@ def BatchMatmulOp : LinalgStructuredBase_Op<"batch_matmul", !listconcat([AttrSiz

Example Transpose:
```mlir
linalg.batch_matmul indexing_maps = [
affine_map<(d0, d1, d2, d3) -> (d0, d3, d1)>, // transpose
affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins(%arg0, %arg1 : memref<2x5x3xf32>,memref<2x5x7xf32>)
outs(%arg2: memref<2x3x7xf32>)
linalg.batch_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (batch, k, m)>, // transpose
affine_map<(batch, m, n, k) -> (batch, k, n)>,
affine_map<(batch, m, n, k) -> (batch, m, n)>]
ins(%arg0, %arg1 : memref<2x5x3xf32>,memref<2x5x7xf32>)
outs(%arg2: memref<2x3x7xf32>)
```

Example Broadcast:
```mlir
linalg.batch_matmul indexing_maps = [
affine_map<(d0, d1, d2, d3) -> (d3)>, // broadcast
affine_map<(d0, d1, d2, d3) -> (d0, d3, d2)>,
affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins(%arg0, %arg1 : memref<5xf32>, memref<2x5x7xf32>)
outs(%arg2: memref<2x3x7xf32>)
linalg.batch_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (k)>, // broadcast
affine_map<(batch, m, n, k) -> (batch, k, n)>,
affine_map<(batch, m, n, k) -> (batch, m, n)>]
ins(%arg0, %arg1 : memref<5xf32>, memref<2x5x7xf32>)
outs(%arg2: memref<2x3x7xf32>)
```

Example Broadcast and Transpose:
```mlir
linalg.batch_matmul indexing_maps = [
affine_map<(d0, d1, d2, d3) -> (d1, d3)>, // broadcast
affine_map<(d0, d1, d2, d3) -> (d0, d2, d3)>, // transpose
affine_map<(d0, d1, d2, d3) -> (d0, d1, d2)>
]
ins(%arg0, %arg1 : memref<3x5xf32>, memref<2x7x5xf32>)
outs(%arg2: memref<2x3x7xf32>)
linalg.batch_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (m, k)>, // broadcast
affine_map<(batch, m, n, k) -> (batch, n, k)>, // transpose
affine_map<(batch, m, n, k) -> (batch, m, n)>]
ins(%arg0, %arg1 : memref<3x5xf32>, memref<2x7x5xf32>)
outs(%arg2: memref<2x3x7xf32>)
```
}];

Expand Down Expand Up @@ -1065,6 +1060,134 @@ def BatchMatmulOp : LinalgStructuredBase_Op<"batch_matmul", !listconcat([AttrSiz
}


//===----------------------------------------------------------------------===//
// Op definition for BatchReduceMatmulOp
//===----------------------------------------------------------------------===//

def BatchReduceMatmulOp : LinalgStructuredBase_Op<"batch_reduce_matmul", [
AttrSizedOperandSegments,
LinalgContractionOpInterface]> {

let summary = [{Performs a batch-reduce matrix multiplication on two inputs.
The partial multiplication results are reduced into a 2D output.}];
let description = [{
Numeric casting is performed on the operands to the inner multiply,
promoting them to the same data type as the accumulator/output.

Broadcast and Transpose semantics can be applied by specifying the explicit attribute
'indexing_maps' as shown below. This is a list attribute, so must include maps for all
arguments if specified.

Example Transpose:
```mlir
linalg.batch_reduce_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (batch, k, m)>, // transpose
affine_map<(batch, m, n, k) -> (batch, k, n)>,
affine_map<(batch, m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<2x5x3xf32>,memref<2x5x7xf32>)
outs(%arg2: memref<3x7xf32>)
```

Example Broadcast:
```mlir
linalg.batch_reduce_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (k)>, // broadcast
affine_map<(batch, m, n, k) -> (batch, k, n)>,
affine_map<(batch, m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<5xf32>, memref<2x5x7xf32>)
outs(%arg2: memref<3x7xf32>)
```

Example Broadcast and Transpose:
```mlir
linalg.batch_reduce_matmul
indexing_maps = [affine_map<(batch, m, n, k) -> (m, k)>, // broadcast
affine_map<(batch, m, n, k) -> (batch, n, k)>, // transpose
affine_map<(batch, m, n, k) -> (m, n)>]
ins(%arg0, %arg1 : memref<3x5xf32>, memref<2x7x5xf32>)
outs(%arg2: memref<3x7xf32>)
```
}];

let arguments = (ins
Variadic<AnyType>:$inputs,
Variadic<AnyShaped>:$outputs,
DefaultValuedOptionalAttr<
AffineMapArrayAttr,
"BatchReduceMatmulOp::getDefaultIndexingMaps($_builder.getContext())"
>:$indexing_maps,
DefaultValuedOptionalAttr<TypeFnAttr, "TypeFn::cast_signed">:$cast
);
let results = (outs Variadic<AnyRankedTensor>:$result_tensors);
let regions = (region AnyRegion:$region);

let skipDefaultBuilders = 1;
let builders = [
OpBuilder<
(ins "ValueRange":$inputs, "ValueRange":$outputs,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes),
[{
buildBatchReduceMatmulOp($_builder, $_state, std::nullopt, inputs, outputs,
attributes, BatchReduceMatmulOp::getRegionBuilder(),
BatchReduceMatmulOp::getDefaultIndexingMaps($_builder.getContext()));
}]>,
OpBuilder<
(ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs,
"ValueRange":$outputs,
CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes),
[{
buildBatchReduceMatmulOp($_builder, $_state, resultTensorTypes,
inputs, outputs, attributes, BatchReduceMatmulOp::getRegionBuilder(),
BatchReduceMatmulOp::getDefaultIndexingMaps($_builder.getContext()));
}]>,
OpBuilder<
(ins "TypeRange":$resultTensorTypes, "ValueRange":$inputs,
"ValueRange":$outputs,
"Attribute":$cast, CArg<"ArrayRef<NamedAttribute>", "{}">:$attributes),
[{
$_state.addAttribute("cast", cast);
buildBatchReduceMatmulOp($_builder, $_state, resultTensorTypes, inputs, outputs,
attributes, BatchReduceMatmulOp::getRegionBuilder(),
BatchReduceMatmulOp::getDefaultIndexingMaps($_builder.getContext()));
}]>

];
let hasCustomAssemblyFormat = 1;
let hasFolder = 1;
let hasVerifier = 1;

let extraClassDeclaration = structuredOpsBaseDecls # [{
SmallVector<utils::IteratorType> getIteratorTypesArray();

/// Implements the block region builder.
static void regionBuilder(ImplicitLocOpBuilder &b,
Block &block, ArrayRef<NamedAttribute> attrs);

/// Returns a list of AffineMap with the default batch_reduce_matmul indexing charactristic.
static SmallVector<AffineMap> getDefaultIndexingMaps(MLIRContext *context);

/// Returns true if the given broadcast map \p bcastMap is valid for this op.
bool isValidLhsRhsBroadcastMap(AffineMap bcastMap, bool isLHS = true);

static std::function<void(ImplicitLocOpBuilder &,
Block &, ArrayRef<NamedAttribute>)>
getRegionBuilder() {
return regionBuilder;
}

::mlir::MutableOperandRange getDpsInitsMutable() {
return getOutputsMutable();
}

// Generic methods.
static unsigned getNumRegionArgs();
std::string getLibraryCallName();
bool hasDynamicIndexingMaps() { return true; };
/// Returns true if the user defined indexing maps are not equal to default maps.
bool hasUserDefinedMaps();
}];
}

//===----------------------------------------------------------------------===//
// Named Linalg ops, implemented as a declarative configurations of generic ops.
//===----------------------------------------------------------------------===//
Expand Down
Loading
Loading