-
Notifications
You must be signed in to change notification settings - Fork 2.6k
[Snippets][CPU] Support runtime offsets in ARM load/store emitters #31112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
compiled_byte_offset = memory_access->get_output_offset(); | ||
buffer_cluster_id = get_consumer_buffer_cluster_id(expr); | ||
} else { | ||
std::cout << "in_out_type: " << in_out_type_ << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove debug print
} | ||
|
||
size_t jit_memory_emitter::get_parent_buffer_cluster_id(const ov::snippets::lowered::ExpressionPtr& expr) { | ||
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_port_connectors().size() == 1, "MemoryAccess must have one parent"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_port_connectors().size() == 1, "MemoryAccess must have one parent"); | |
OV_CPU_JIT_EMITTER_ASSERT(expr->get_input_count() == 1, "MemoryAccess must have one parent"); |
} | ||
|
||
size_t jit_memory_emitter::get_consumer_buffer_cluster_id(const ov::snippets::lowered::ExpressionPtr& expr) { | ||
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_port_connectors().size() == 1, "MemoryAccess must have one consumer"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_port_connectors().size() == 1, "MemoryAccess must have one consumer"); | |
OV_CPU_JIT_EMITTER_ASSERT(expr->get_output_count() == 1, "MemoryAccess must have one output"); |
@@ -99,11 +168,11 @@ void jit_load_broadcast_emitter::emit_isa(const std::vector<size_t>& in, const s | |||
auto src = XReg(in[0]); | |||
auto dst = TReg(out[0]); | |||
|
|||
h->uni_ld1rw(dst.s, src, byte_offset); | |||
h->uni_ld1rw(dst.s, src, compiled_byte_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is happed if compiled_byte_offset
is dynamic? We need to support load_broadcast
with dynamic offset as well. Could you cover it please?
@@ -129,7 +196,26 @@ template <cpu_isa_t isa> | |||
void jit_store_memory_emitter::emit_isa(const std::vector<size_t>& in, const std::vector<size_t>& out) const { | |||
OV_CPU_JIT_EMITTER_ASSERT(store_emitter != nullptr, "Store CPU emitter isn't initialized!"); | |||
|
|||
if (is_offset_runtime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
load
and store
JIT emitters have the same logic about offset increment if it's dynamic (broadcast_load
should have the same logic). Can we encapsulate this algorithm in jit_memory_emitter
, in base class as it's done on x64?
@@ -129,7 +196,26 @@ template <cpu_isa_t isa> | |||
void jit_store_memory_emitter::emit_isa(const std::vector<size_t>& in, const std::vector<size_t>& out) const { | |||
OV_CPU_JIT_EMITTER_ASSERT(store_emitter != nullptr, "Store CPU emitter isn't initialized!"); | |||
|
|||
if (is_offset_runtime) { | |||
XReg aux_reg(aux_gpr_idxs.back()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use aux_gpr
registers, you need to update the method get_aux_gprs_count()
firstly. It controls the size of the vector aux_gpr_idxs
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please be sure that aux_gpr
won't be corrupted before sub
offset below (during store_emitter
execution)
XReg aux_reg(aux_gpr_idxs.back()); | ||
XReg base_reg(out[0]); | ||
// load the runtime offset from args.buffer_offsets[buffer_cluster_id] | ||
XReg reg_runtime_params = XReg(Operand::X0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
XReg reg_runtime_params = XReg(Operand::X0); | |
XReg reg_runtime_params = abi_param1; |
@@ -63,6 +63,16 @@ class TransposeSoftmaxEltwiseFunction : public TransposeSoftmaxFunction { | |||
std::shared_ptr<ov::Model> initOriginal() const override; | |||
}; | |||
|
|||
class SoftmaxAddFunction : public SnippetsFunctionBase { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please rename the related classed so?
class SoftmaxAddFunction : public SnippetsFunctionBase { | |
class SoftmaxSumFunction : public SnippetsFunctionBase { |
In my opinion, this is more clear name for current pattern
if (!configuration.count("SNIPPETS_MODE")) { | ||
configuration.insert({"SNIPPETS_MODE", "IGNORE_CALLBACK"}); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we merge you changes for this setting? If we did, please reuse your helper
Details:
Tickets: