Skip to content

[Snippets][CPU] Support runtime offsets in ARM load/store emitters #31112

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

aobolensk
Copy link
Contributor

@aobolensk aobolensk commented Jun 25, 2025

Details:

  • Add support for dynamic offsets in load/store emitters on aarch64 platform
  • Add "SoftmaxAdd" test

Tickets:

  • 169427

@aobolensk aobolensk requested review from a team as code owners June 25, 2025 13:03
@aobolensk aobolensk added the platform: arm OpenVINO on ARM / ARM64 label Jun 25, 2025
@github-actions github-actions bot added the category: CPU OpenVINO CPU plugin label Jun 25, 2025
@aobolensk aobolensk requested review from a team as code owners June 25, 2025 16:30
@github-actions github-actions bot added the category: IE Tests OpenVINO Test: plugins and common label Jun 25, 2025
compiled_byte_offset = memory_access->get_output_offset();
buffer_cluster_id = get_consumer_buffer_cluster_id(expr);
} else {
std::cout << "in_out_type: " << in_out_type_ << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove debug print

}

size_t jit_memory_emitter::get_parent_buffer_cluster_id(const ov::snippets::lowered::ExpressionPtr& expr) {
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_port_connectors().size() == 1, "MemoryAccess must have one parent");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_port_connectors().size() == 1, "MemoryAccess must have one parent");
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_count() == 1, "MemoryAccess must have one parent");

}

size_t jit_memory_emitter::get_consumer_buffer_cluster_id(const ov::snippets::lowered::ExpressionPtr& expr) {
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_port_connectors().size() == 1, "MemoryAccess must have one consumer");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_port_connectors().size() == 1, "MemoryAccess must have one consumer");
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_count() == 1, "MemoryAccess must have one output");

@@ -99,11 +168,11 @@ void jit_load_broadcast_emitter::emit_isa(const std::vector<size_t>& in, const s
auto src = XReg(in[0]);
auto dst = TReg(out[0]);

h->uni_ld1rw(dst.s, src, byte_offset);
h->uni_ld1rw(dst.s, src, compiled_byte_offset);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is happed if compiled_byte_offset is dynamic? We need to support load_broadcast with dynamic offset as well. Could you cover it please?

@@ -129,7 +196,26 @@ template <cpu_isa_t isa>
void jit_store_memory_emitter::emit_isa(const std::vector<size_t>& in, const std::vector<size_t>& out) const {
OV_CPU_JIT_EMITTER_ASSERT(store_emitter != nullptr, "Store CPU emitter isn't initialized!");

if (is_offset_runtime) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

load and store JIT emitters have the same logic about offset increment if it's dynamic (broadcast_load should have the same logic). Can we encapsulate this algorithm in jit_memory_emitter, in base class as it's done on x64?

@@ -129,7 +196,26 @@ template <cpu_isa_t isa>
void jit_store_memory_emitter::emit_isa(const std::vector<size_t>& in, const std::vector<size_t>& out) const {
OV_CPU_JIT_EMITTER_ASSERT(store_emitter != nullptr, "Store CPU emitter isn't initialized!");

if (is_offset_runtime) {
XReg aux_reg(aux_gpr_idxs.back());
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To use aux_gpr registers, you need to update the method get_aux_gprs_count() firstly. It controls the size of the vector aux_gpr_idxs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please be sure that aux_gpr won't be corrupted before sub offset below (during store_emitter execution)

XReg aux_reg(aux_gpr_idxs.back());
XReg base_reg(out[0]);
// load the runtime offset from args.buffer_offsets[buffer_cluster_id]
XReg reg_runtime_params = XReg(Operand::X0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
XReg reg_runtime_params = XReg(Operand::X0);
XReg reg_runtime_params = abi_param1;

@@ -63,6 +63,16 @@ class TransposeSoftmaxEltwiseFunction : public TransposeSoftmaxFunction {
std::shared_ptr<ov::Model> initOriginal() const override;
};

class SoftmaxAddFunction : public SnippetsFunctionBase {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please rename the related classed so?

Suggested change
class SoftmaxAddFunction : public SnippetsFunctionBase {
class SoftmaxSumFunction : public SnippetsFunctionBase {

In my opinion, this is more clear name for current pattern

Comment on lines +121 to +123
if (!configuration.count("SNIPPETS_MODE")) {
configuration.insert({"SNIPPETS_MODE", "IGNORE_CALLBACK"});
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we merge you changes for this setting? If we did, please reuse your helper

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: CPU OpenVINO CPU plugin category: IE Tests OpenVINO Test: plugins and common platform: arm OpenVINO on ARM / ARM64
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants