-
Notifications
You must be signed in to change notification settings - Fork 13.5k
[mlir][ArmSVE] Add -arm-sve-legalize-vector-storage
pass
#68794
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note: Only the last commit is from this PR! This PR is not as big as it seems 😄 |
How about splitting into 2 changes? |
The alignment change is very small, and both features of the pass are needed to run the integration test. |
(In case it was not clear, I don't expect PRs marked as "draft" to be seriously reviewed until I mark them as "ready") |
40ff7db
to
a6d0fa8
Compare
a6d0fa8
to
74f47fa
Compare
This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller predicate types (`vector<[1|2|4|8]xi1>`) must be converted to/from a full predicate type (referred to as a `svbool`) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. For example: ```mlir %alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> ``` Becomes: ```mlir %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> ``` The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for `f32`s). However, the LLVM backend currently defaults to aligning to `base size` x `element size` bytes. For non-legal vector types like `vector<[8]xf32>` this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue.
74f47fa
to
72f1927
Compare
@llvm/pr-subscribers-mlir-vector @llvm/pr-subscribers-mlir Author: Benjamin Maxwell (MacDue) ChangesThis patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. Loading and storing predicate typesIt is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is For example: %alloca = memref.alloca() : memref<vector<[4]xi1>>
%mask = vector.constant_mask [4] : vector<[4]xi1>
memref.store %mask, %alloca[] : memref<vector<[4]xi1>>
%reload = memref.load %alloca[] : memref<vector<[4]xi1>> Becomes: %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>>
%mask = vector.constant_mask [4] : vector<[4]xi1>
%svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1>
memref.store %svbool, %alloca[] : memref<vector<[16]xi1>>
%reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>>
%reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> Relax alignments for SVE vector allocasThe storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for Depends on: #68586 and #68695 (for testing) Patch is 33.02 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/68794.diff 9 Files Affected:
diff --git a/mlir/include/mlir/Dialect/ArmSVE/CMakeLists.txt b/mlir/include/mlir/Dialect/ArmSVE/CMakeLists.txt
index f33061b2d87cffc..9f57627c321fb0c 100644
--- a/mlir/include/mlir/Dialect/ArmSVE/CMakeLists.txt
+++ b/mlir/include/mlir/Dialect/ArmSVE/CMakeLists.txt
@@ -1 +1,2 @@
add_subdirectory(IR)
+add_subdirectory(Transforms)
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/CMakeLists.txt b/mlir/include/mlir/Dialect/ArmSVE/Transforms/CMakeLists.txt
new file mode 100644
index 000000000000000..7226642daf86172
--- /dev/null
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/CMakeLists.txt
@@ -0,0 +1,5 @@
+set(LLVM_TARGET_DEFINITIONS Passes.td)
+mlir_tablegen(Passes.h.inc -gen-pass-decls -name ArmSVE)
+add_public_tablegen_target(MLIRArmSVEPassIncGen)
+
+add_mlir_doc(Passes ArmSVEPasses ./ -gen-pass-doc)
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.h b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.h
new file mode 100644
index 000000000000000..317fb9021b3c577
--- /dev/null
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.h
@@ -0,0 +1,33 @@
+//===- Passes.h - Pass Entrypoints ------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_H
+#define MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_H
+
+#include "mlir/Conversion/LLVMCommon/TypeConverter.h"
+#include "mlir/Pass/Pass.h"
+
+namespace mlir::arm_sve {
+
+#define GEN_PASS_DECL
+#include "mlir/Dialect/ArmSVE/Transforms/Passes.h.inc"
+
+/// Pass to legalize the types of mask stores.
+std::unique_ptr<Pass> createLegalizeVectorStoragePass();
+
+//===----------------------------------------------------------------------===//
+// Registration
+//===----------------------------------------------------------------------===//
+
+/// Generate the code for registering passes.
+#define GEN_PASS_REGISTRATION
+#include "mlir/Dialect/ArmSVE/Transforms/Passes.h.inc"
+
+} // namespace mlir::arm_sve
+
+#endif // MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_H
diff --git a/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.td b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.td
new file mode 100644
index 000000000000000..35c49607181da0c
--- /dev/null
+++ b/mlir/include/mlir/Dialect/ArmSVE/Transforms/Passes.td
@@ -0,0 +1,67 @@
+//===-- Passes.td - ArmSVE pass definition file ------------*- tablegen -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_TD
+#define MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_TD
+
+include "mlir/Pass/PassBase.td"
+
+def LegalizeVectorStorage
+ : Pass<"arm-sve-legalize-vector-storage", "mlir::func::FuncOp"> {
+ let summary = "Ensures stores of SVE vector types will be legal";
+ let description = [{
+ This pass ensures that loads, stores, and allocations of SVE vector types
+ will be legal in the LLVM backend. It does this at the memref level, so this
+ pass must be applied before lowering all the way to LLVM.
+
+ This pass currently fixes two issues.
+
+ ## Loading and storing predicate types
+
+ It is only legal to load/store predicate types equal to (or greater than) a
+ full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller
+ predicate types (`vector<[1|2|4|8]xi1>`) must be converted to/from a full
+ predicate type (referred to as a `svbool`) before and after storing and
+ loading respectively. This pass does this by widening allocations and
+ inserting conversion intrinsics.
+
+ For example:
+
+ ```mlir
+ %alloca = memref.alloca() : memref<vector<[4]xi1>>
+ %mask = vector.constant_mask [4] : vector<[4]xi1>
+ memref.store %mask, %alloca[] : memref<vector<[4]xi1>>
+ %reload = memref.load %alloca[] : memref<vector<[4]xi1>>
+ ```
+ Becomes:
+ ```mlir
+ %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>>
+ %mask = vector.constant_mask [4] : vector<[4]xi1>
+ %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1>
+ memref.store %svbool, %alloca[] : memref<vector<[16]xi1>>
+ %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>>
+ %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1>
+ ```
+
+ ## Relax alignments for SVE vector allocas
+
+ The storage for SVE vector types only needs to have an alignment that
+ matches the element type (for example 4 byte alignment for `f32`s). However,
+ the LLVM backend currently defaults to aligning to `base size` x
+ `element size` bytes. For non-legal vector types like `vector<[8]xf32>` this
+ results in 8 x 4 = 32-byte alignment, but the backend only supports up to
+ 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller
+ alignment prevents this issue.
+ }];
+ let constructor = "mlir::arm_sve::createLegalizeVectorStoragePass()";
+ let dependentDialects = ["func::FuncDialect",
+ "memref::MemRefDialect", "vector::VectorDialect",
+ "arm_sve::ArmSVEDialect"];
+}
+
+#endif // MLIR_DIALECT_ARMSVE_TRANSFORMS_PASSES_TD
diff --git a/mlir/include/mlir/InitAllPasses.h b/mlir/include/mlir/InitAllPasses.h
index 5489a13a8040bdb..7301905954f56d8 100644
--- a/mlir/include/mlir/InitAllPasses.h
+++ b/mlir/include/mlir/InitAllPasses.h
@@ -19,6 +19,7 @@
#include "mlir/Dialect/Affine/Passes.h"
#include "mlir/Dialect/Arith/Transforms/Passes.h"
#include "mlir/Dialect/ArmSME/Transforms/Passes.h"
+#include "mlir/Dialect/ArmSVE/Transforms/Passes.h"
#include "mlir/Dialect/Async/Passes.h"
#include "mlir/Dialect/Bufferization/Pipelines/Passes.h"
#include "mlir/Dialect/Bufferization/Transforms/Passes.h"
@@ -82,6 +83,7 @@ inline void registerAllPasses() {
transform::registerTransformPasses();
vector::registerVectorPasses();
arm_sme::registerArmSMEPasses();
+ arm_sve::registerArmSVEPasses();
// Dialect pipelines
bufferization::registerBufferizationPipelines();
diff --git a/mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt b/mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt
index 2f1c43fae240d51..a70c489a51fea9a 100644
--- a/mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt
+++ b/mlir/lib/Dialect/ArmSVE/Transforms/CMakeLists.txt
@@ -1,8 +1,10 @@
add_mlir_dialect_library(MLIRArmSVETransforms
LegalizeForLLVMExport.cpp
+ LegalizeVectorStorage.cpp
DEPENDS
MLIRArmSVEConversionsIncGen
+ MLIRArmSVEPassIncGen
LINK_LIBS PUBLIC
MLIRArmSVEDialect
diff --git a/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeVectorStorage.cpp b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeVectorStorage.cpp
new file mode 100644
index 000000000000000..e7edcf6ec789235
--- /dev/null
+++ b/mlir/lib/Dialect/ArmSVE/Transforms/LegalizeVectorStorage.cpp
@@ -0,0 +1,308 @@
+//===- LegalizeVectorStorage.cpp - Ensures SVE loads/stores are legal -----===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "mlir/Dialect/ArmSVE/IR/ArmSVEDialect.h"
+#include "mlir/Dialect/ArmSVE/Transforms/Passes.h"
+#include "mlir/Dialect/Func/IR/FuncOps.h"
+#include "mlir/Dialect/MemRef/IR/MemRef.h"
+#include "mlir/Dialect/Vector/IR/VectorOps.h"
+#include "mlir/Transforms/GreedyPatternRewriteDriver.h"
+
+namespace mlir::arm_sve {
+#define GEN_PASS_DEF_LEGALIZEVECTORSTORAGE
+#include "mlir/Dialect/ArmSVE/Transforms/Passes.h.inc"
+} // namespace mlir::arm_sve
+
+using namespace mlir;
+using namespace mlir::arm_sve;
+
+constexpr StringLiteral kPassLabel("__arm_sve_legalize_vector_storage__");
+
+namespace {
+
+/// A (legal) SVE predicate mask that has a logical size, i.e. the number of
+/// bits match the number of lanes it masks (such as vector<[4]xi1>), but is too
+/// small to be stored to memory.
+bool isLogicalSVEPredicateType(VectorType type) {
+ return type.getRank() > 0 && type.getElementType().isInteger(1) &&
+ type.getScalableDims().back() && type.getShape().back() < 16 &&
+ llvm::isPowerOf2_32(type.getShape().back()) &&
+ !llvm::is_contained(type.getScalableDims().drop_back(), true);
+}
+
+VectorType widenScalableMaskTypeToSvbool(VectorType type) {
+ assert(isLogicalSVEPredicateType(type));
+ return VectorType::Builder(type).setDim(type.getRank() - 1, 16);
+}
+
+template <typename TOp, typename TLegalizerCallback>
+void replaceOpWithLegalizedOp(PatternRewriter &rewriter, TOp op,
+ TLegalizerCallback callback) {
+ // Clone the previous op to preserve any properties/attributes.
+ auto newOp = op.clone();
+ rewriter.insert(newOp);
+ rewriter.replaceOp(op, callback(newOp));
+}
+
+template <typename TOp, typename TLegalizerCallback>
+void replaceOpWithUnrealizedConversion(PatternRewriter &rewriter, TOp op,
+ TLegalizerCallback callback) {
+ replaceOpWithLegalizedOp(rewriter, op, [&](TOp newOp) {
+ // Mark our `unrealized_conversion_casts` with a pass label.
+ return rewriter.create<UnrealizedConversionCastOp>(
+ op.getLoc(), TypeRange{op.getResult().getType()},
+ ValueRange{callback(newOp)},
+ NamedAttribute(rewriter.getStringAttr(kPassLabel),
+ rewriter.getUnitAttr()));
+ });
+}
+
+/// Extracts the legal memref value from the `unrealized_conversion_casts` added
+/// by this pass.
+static FailureOr<Value> getLegalMemRef(Value illegalMemref) {
+ Operation *definingOp = illegalMemref.getDefiningOp();
+ if (!definingOp || !definingOp->hasAttr(kPassLabel))
+ return failure();
+ auto unrealizedConversion =
+ llvm::cast<UnrealizedConversionCastOp>(definingOp);
+ return unrealizedConversion.getOperand(0);
+}
+
+/// The default alignment of an alloca may request overaligned sizes for SVE
+/// types, which will fail during stack frame allocation. This rewrite
+/// explicitly adds a reasonable alignment to allocas of scalable types.
+struct RelaxScalableVectorAllocaAlignment
+ : public OpRewritePattern<memref::AllocaOp> {
+ using OpRewritePattern::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(memref::AllocaOp allocaOp,
+ PatternRewriter &rewriter) const override {
+ auto memrefElementType = allocaOp.getType().getElementType();
+ auto vectorType = llvm::dyn_cast<VectorType>(memrefElementType);
+ if (!vectorType || !vectorType.isScalable() || allocaOp.getAlignment())
+ return failure();
+
+ // Set alignment based on the defaults for SVE vectors and predicates.
+ unsigned aligment = vectorType.getElementType().isInteger(1) ? 2 : 16;
+ allocaOp.setAlignment(aligment);
+
+ return success();
+ }
+};
+
+/// Replaces allocations of SVE predicates smaller than an svbool with a wider
+/// allocation and a tagged unrealized conversion.
+///
+/// Example
+/// ```
+/// %alloca = memref.alloca() : memref<vector<[4]xi1>>
+/// ```
+/// is rewritten into:
+/// ```
+/// %widened = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>>
+/// %alloca = builtin.unrealized_conversion_cast %widened
+/// : memref<vector<[16]xi1>> to memref<vector<[4]xi1>>
+/// {__arm_sve_legalize_vector_storage__}
+/// ```
+template <typename AllocLikeOp>
+struct LegalizeAllocLikeOpConversion : public OpRewritePattern<AllocLikeOp> {
+ using OpRewritePattern<AllocLikeOp>::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(AllocLikeOp allocLikeOp,
+ PatternRewriter &rewriter) const override {
+ auto vectorType =
+ llvm::dyn_cast<VectorType>(allocLikeOp.getType().getElementType());
+
+ if (!vectorType || !isLogicalSVEPredicateType(vectorType))
+ return failure();
+
+ // Replace this alloc-like op of an SVE mask with one of a (storable)
+ // svbool_t mask. A temporary unrealized_conversion_cast is added to the old
+ // type to allow local rewrites.
+ replaceOpWithUnrealizedConversion(
+ rewriter, allocLikeOp, [&](AllocLikeOp newAllocLikeOp) {
+ newAllocLikeOp.getResult().setType(
+ llvm::cast<MemRefType>(newAllocLikeOp.getType().cloneWith(
+ {}, widenScalableMaskTypeToSvbool(vectorType))));
+ return newAllocLikeOp;
+ });
+
+ return success();
+ }
+};
+
+/// Replaces vector.type_casts of unrealized conversions to illegal memref types
+/// with legal type casts, followed by unrealized conversions.
+///
+/// Example:
+/// ```
+/// %alloca = builtin.unrealized_conversion_cast %widened
+/// : memref<vector<[16]xi1>> to memref<vector<[8]xi1>>
+/// {__arm_sve_legalize_vector_storage__}
+/// %cast = vector.type_cast %alloca
+/// : memref<vector<3x[8]xi1>> to memref<3xvector<[8]xi1>>
+/// ```
+/// is rewritten into:
+/// ```
+/// %widened_cast = vector.type_cast %widened
+/// : memref<vector<3x[16]xi1>> to memref<3xvector<[16]xi1>>
+/// %cast = builtin.unrealized_conversion_cast %widened_cast
+/// : memref<3xvector<[16]xi1>> to memref<3xvector<[8]xi1>>
+/// {__arm_sve_legalize_vector_storage__}
+/// ```
+struct LegalizeVectorTypeCastConversion
+ : public OpRewritePattern<vector::TypeCastOp> {
+ using OpRewritePattern::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(vector::TypeCastOp typeCastOp,
+ PatternRewriter &rewriter) const override {
+ auto resultType = typeCastOp.getResultMemRefType();
+ auto vectorType = llvm::dyn_cast<VectorType>(resultType.getElementType());
+
+ if (!vectorType || !isLogicalSVEPredicateType(vectorType))
+ return failure();
+
+ auto legalMemref = getLegalMemRef(typeCastOp.getMemref());
+ if (failed(legalMemref))
+ return failure();
+
+ // Replace this vector.type_cast with one of a (storable) svbool_t mask.
+ replaceOpWithUnrealizedConversion(
+ rewriter, typeCastOp, [&](vector::TypeCastOp newTypeCast) {
+ newTypeCast.setOperand(*legalMemref);
+ newTypeCast.getResult().setType(
+ llvm::cast<MemRefType>(newTypeCast.getType().cloneWith(
+ {}, widenScalableMaskTypeToSvbool(vectorType))));
+ return newTypeCast;
+ });
+
+ return success();
+ }
+};
+
+/// Replaces stores to unrealized conversions to illegal memref types with
+/// `arm_sve.convert_to_svbool`s followed by (legal) wider stores.
+///
+/// Example:
+/// ```
+/// memref.store %mask, %alloca[] : memref<vector<[8]xi1>>
+/// ```
+/// is rewritten into:
+/// ```
+/// %svbool = arm_sve.convert_to_svbool %mask : vector<[8]xi1>
+/// memref.store %svbool, %widened[] : memref<vector<[16]xi1>>
+/// ```
+struct LegalizeMemrefStoreConversion
+ : public OpRewritePattern<memref::StoreOp> {
+ using OpRewritePattern::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(memref::StoreOp storeOp,
+ PatternRewriter &rewriter) const override {
+ auto loc = storeOp.getLoc();
+
+ Value valueToStore = storeOp.getValueToStore();
+ auto vectorType = llvm::dyn_cast<VectorType>(valueToStore.getType());
+
+ if (!vectorType || !isLogicalSVEPredicateType(vectorType))
+ return failure();
+
+ auto legalMemref = getLegalMemRef(storeOp.getMemref());
+ if (failed(legalMemref))
+ return failure();
+
+ auto legalMaskType = widenScalableMaskTypeToSvbool(
+ llvm::cast<VectorType>(valueToStore.getType()));
+ auto convertToSvbool = rewriter.create<arm_sve::ConvertToSvboolOp>(
+ loc, legalMaskType, valueToStore);
+ // Replace this store with a conversion to a storable svbool_t mask,
+ // followed by a wider store.
+ replaceOpWithLegalizedOp(rewriter, storeOp,
+ [&](memref::StoreOp newStoreOp) {
+ newStoreOp.setOperand(0, convertToSvbool);
+ newStoreOp.setOperand(1, *legalMemref);
+ return newStoreOp;
+ });
+
+ return success();
+ }
+};
+
+/// Replaces loads from unrealized conversions to illegal memref types with
+/// (legal) wider loads, followed by `arm_sve.convert_from_svbool`s.
+///
+/// Example:
+/// ```
+/// %reload = memref.load %alloca[] : memref<vector<[4]xi1>>
+/// ```
+/// is rewritten into:
+/// ```
+/// %svbool = memref.load %widened[] : memref<vector<[16]xi1>>
+/// %reload = arm_sve.convert_from_svbool %reload : vector<[4]xi1>
+/// ```
+struct LegalizeMemrefLoadConversion : public OpRewritePattern<memref::LoadOp> {
+ using OpRewritePattern::OpRewritePattern;
+
+ LogicalResult matchAndRewrite(memref::LoadOp loadOp,
+ PatternRewriter &rewriter) const override {
+ auto loc = loadOp.getLoc();
+
+ Value loadedMask = loadOp.getResult();
+ auto vectorType = llvm::dyn_cast<VectorType>(loadedMask.getType());
+
+ if (!vectorType || !isLogicalSVEPredicateType(vectorType))
+ return failure();
+
+ auto legalMemref = getLegalMemRef(loadOp.getMemref());
+ if (failed(legalMemref))
+ return failure();
+
+ auto legalMaskType = widenScalableMaskTypeToSvbool(vectorType);
+ // Replace this load with a legal load of an svbool_t type, followed by a
+ // conversion back to the original type.
+ replaceOpWithLegalizedOp(rewriter, loadOp, [&](memref::LoadOp newLoadOp) {
+ newLoadOp.setMemRef(*legalMemref);
+ newLoadOp.getResult().setType(legalMaskType);
+ return rewriter.create<arm_sve::ConvertFromSvboolOp>(
+ loc, loadedMask.getType(), newLoadOp);
+ });
+
+ return success();
+ }
+};
+
+struct LegalizeVectorStorage
+ : public arm_sve::impl::LegalizeVectorStorageBase<LegalizeVectorStorage> {
+
+ void runOnOperation() override {
+ RewritePatternSet patterns(&getContext());
+ patterns.add<RelaxScalableVectorAllocaAlignment,
+ LegalizeAllocLikeOpConversion<memref::AllocaOp>,
+ LegalizeAllocLikeOpConversion<memref::AllocOp>,
+ LegalizeVectorTypeCastConversion,
+ LegalizeMemrefStoreConversion, LegalizeMemrefLoadConversion>(
+ patterns.getContext());
+ if (failed(applyPatternsAndFoldGreedily(getOperation(),
+ std::move(patterns)))) {
+ signalPassFailure();
+ }
+ ConversionTarget target(getContext());
+ target.addDynamicallyLegalOp<UnrealizedConversionCastOp>(
+ [](UnrealizedConversionCastOp unrealizedConversion) {
+ return !unrealizedConversion->hasAttr(kPassLabel);
+ });
+ // This detects if we failed to completely legalize the IR.
+ if (failed(applyPartialConversion(getOperation(), target, {})))
+ signalPassFailure();
+ }
+};
+
+} // namespace
+
+std::unique_ptr<Pass> mlir::arm_sve::createLegalizeVectorStoragePass() {
+ return std::make_unique<LegalizeVectorStorage>();
+}
diff --git a/mlir/test/Dialect/ArmSVE/legalize-vector-storage.mlir b/mlir/test/Dialect/ArmSVE/legalize-vector-storage.mlir
new file mode 100644
index 000000000000000..61879c48712f4d2
--- /dev/null
+++ b/mlir/test/Dialect/ArmSVE/legalize-vector-storage.mlir
@@ -0,0 +1,160 @@
+// RUN: mlir-opt %s -allow-unregistered-dialect -arm-sve-legalize-vector-storage -split-input-file -verify-diagnostics | FileCheck %s
+
+/// This tests the basic functionality of the -arm-sve-legalize-vector-storage pass.
+
+// -----
+
+// CHECK-LABEL: @store_and_reload_sve_predicate_nxv1i1(
+// CHECK-SAME: %[[MASK:.*]]: vector<[1]xi1>)
+func.func @store_and_reload_sve_predicate_nxv1i1(%mask: vector<[1]xi1>) -> vector<[1]xi1> {
+ // CHECK-NEXT: %[[ALLOCA:.*]] = memref.alloca() {alignment = 2 : i64} : memref<vector<[16]xi1>>
+ %alloca = memref.alloca() : memref<vector<[1]xi1>>
+ // CHECK-NEXT: %[[SVBOOL:.*]] = arm_sve.convert_to_svbool %[[MASK]] : vector<[1]xi1>
+ // CHECK-NEXT: memref.store %[[SVBOOL]], %[[ALLOCA]][] : memref<vector<[16]xi1>>
+ memref.store %mask, %alloca[]...
[truncated]
|
This will be needed in IREE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The overall logic makes perfect sense to me, thanks a ton Ben!
I don't really have any major suggestions regarding the implementation, but I think that it would be good to clarify a few key things in comments:
- what's "legal" and what's "illegal" in this context and why ?
- what's the definition of
svbool_t
? - what about predicates like
vector<[7]xi1>
?
It would also be good to mention somewhere that __arm_sve_legalize_vector_storage__
should only be present during the legalizer pass (i.e. it's the pass that creates tagged ops and then consumes them).
Hope this makes sense :)
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/arrays-of-scalable-vectors.mlir
Outdated
Show resolved
Hide resolved
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/arrays-of-scalable-vectors.mlir
Outdated
Show resolved
Hide resolved
mlir/test/Integration/Dialect/Vector/CPU/ArmSVE/arrays-of-scalable-vectors.mlir
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing my comments 🙏🏻 ! A few more suggestions. Mostly asking for some clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you!
I've made a few more tiny suggestions re comments. The code looks rock solid to me 🙏🏻
Adds an end-2-end test for scalable vectorization of linalg.matmul. This is the most basic case where the dimension along which we vectorize fits perfectly within SVE registers. I will be extending this to more generic cases in the forthcoming patches. Depends on llvm#68794.
Adds an end-to-end test for `vector.contract` that targets SVE (i.e. scalable vectors). Note that this requires lifting the restriction on `vector.outerproduct` (to which `vector.contract` is lowered to) that would deem the following as invalid by the Op verifier (*): ``` vector.outerproduct %27, %28, %26 {kind = #vector.kind<add>} : vector<3xf32>, vector<[2]xf32> ``` This is indeed valid as the end-to-end test demonstrates (at least when compiling for SVE). Depends on llvm#68794
This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM. This pass currently fixes two issues. ## Loading and storing predicate types It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is `vector<[16]xi1>`. Smaller predicate types (`vector<[1|2|4|8]xi1>`) must be converted to/from a full predicate type (referred to as a `svbool`) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics. For example: ```mlir %alloca = memref.alloca() : memref<vector<[4]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> memref.store %mask, %alloca[] : memref<vector<[4]xi1>> %reload = memref.load %alloca[] : memref<vector<[4]xi1>> ``` Becomes: ```mlir %alloca = memref.alloca() {alignment = 1 : i64} : memref<vector<[16]xi1>> %mask = vector.constant_mask [4] : vector<[4]xi1> %svbool = arm_sve.convert_to_svbool %mask : vector<[4]xi1> memref.store %svbool, %alloca[] : memref<vector<[16]xi1>> %reload_svbool = memref.load %alloca[] : memref<vector<[16]xi1>> %reload = arm_sve.convert_from_svbool %reload_svbool : vector<[4]xi1> ``` ## Relax alignments for SVE vector allocas The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for `f32`s). However, the LLVM backend currently defaults to aligning to `base size x element size` bytes. For non-legal vector types like `vector<[8]xf32>` this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue. Depends on: llvm#68586 and llvm#68695 (for testing)
This patch adds a pass that ensures that loads, stores, and allocations of SVE vector types will be legal in the LLVM backend. It does this at the memref level, so this pass must be applied before lowering all the way to LLVM.
This pass currently fixes two issues.
Loading and storing predicate types
It is only legal to load/store predicate types equal to (or greater than) a full predicate register, which in MLIR is
vector<[16]xi1>
. Smaller predicate types (vector<[1|2|4|8]xi1>
) must be converted to/from a full predicate type (referred to as asvbool
) before and after storing and loading respectively. This pass does this by widening allocations and inserting conversion intrinsics.For example:
Becomes:
Relax alignments for SVE vector allocas
The storage for SVE vector types only needs to have an alignment that matches the element type (for example 4 byte alignment for
f32
s). However, the LLVM backend currently defaults to aligning tobase size x element size
bytes. For non-legal vector types likevector<[8]xf32>
this results in 8 x 4 = 32-byte alignment, but the backend only supports up to 16-byte alignment for SVE vectors on the stack. Explicitly setting a smaller alignment prevents this issue.Depends on: #68586 and #68695 (for testing)