-
Notifications
You must be signed in to change notification settings - Fork 664
Summary: Add Stateful FC Cortex-m linearOps #14252
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
🔗 Helpful Links🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/14252
Note: Links to docs will display an error until the docs builds have been completed. ✅ No FailuresAs of commit 5309a25 with merge base 0e9d871 ( This comment was automatically generated by Dr. CI and updates every 15 minutes. |
This PR needs a
|
input_zero_point: int, | ||
input_multiplier: int, | ||
input_shift: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to just take Tensor in (even for a single element)? Rationale is to support per-token like quant if we want like per-tensor today.
bias_multiplier: torch.Tensor, | ||
bias_shift: torch.Tensor, | ||
scratch_buffer: torch.Tensor, | ||
output_zero_point: int, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And this one too..
) * weight_scales.unsqueeze(1) | ||
if bias is not None: | ||
if bias_multiplier.numel() == 1: | ||
bias_scale = bias_multiplier.item() * (2.0 ** (-bias_shift.item())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need .item()
specialization for numel == 1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.item() is needed for single-element tensors to extract a Python scalar for correct math .. isn't ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float() can work with numel() == 1 and then you can get float when you do consume it or pass it down as as tensor..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great starting point. Thanks @psiddh for all the back and forth.
Left some comments.
} | ||
|
||
// start of cmsis buffer | ||
ctx.buf = scratch_ptr; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kernel_sum_state->get_scrtach_ptr()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encapsulated everything in the simple helper class now
ctx.buf = scratch_ptr; | ||
ctx.size = scratch_buffer.size(0) - sizeof(kernel_sum_state); | ||
|
||
for (int32_t b = 0; b < batch_size; b++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did we verify this as the right way to call this API>?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Verify as-in , I get the success result from cmsis api call.
Again reading the impl the code,, we must loop over the batch dimension and call the function for each input vector. I think fully connected (FC) functions like arm_fully_connected_s8 and arm_fully_connected_per_channel_s8 process a single input vector at a time.
@AdrianLundell - Just FYI. We need more work to do here but want to give you a heads up. |
Integrate with CMSIS-NN with per-channel quantization support Test Plan: Run e2e test on FVP simulator ./examples/arm/run_mcu_models_fvp.sh --target=cortex-m55 --models=qlinear Reviewers: Subscribers: Tasks: Tags:
public: | ||
CMSISScratchBufferContext( | ||
Tensor& scratch_buffer, | ||
const cmsis_nn_dims& filter_dims) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit just take weight tensor ref as an arg?
BINARY_DIR CMSIS_NN_BINARY_DIR | ||
) | ||
set(CMSIS_NN_INCLUDE_DIR "${CMSIS_NN_SOURCE_DIR}/Include") | ||
set(CMSIS_NN_LIB "${CMSIS_NN_BINARY_DIR}/libcmsis-nn.a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an absolute path to the binary tree which causes portability issues when installing. Could you take another look at this and try again to only use the cmsis-nn target when building?
@@ -114,6 +117,27 @@ inline void validate_quantization_params( | |||
"Single quant Output"); | |||
} | |||
|
|||
inline bool validate_per_channel_quant_params( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to document where these constraints come from.
// ^ ^ ^ | ||
// scratch_ptr(start) scratch_ptr + cmsis_scratch scratch_ptr + total_size | ||
// | ||
// - CMSIS-NN workspace: used by CMSIS-NN kernels for temporary data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice comment, though I feel like it is missing a description of the kernel_sum_state struct?
return out; | ||
} | ||
|
||
// Functional variant (stub, not used at runtime) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should work toward removing the need for these stub functions
@@ -297,6 +303,20 @@ def forward(self, x: torch.Tensor, y: torch.Tensor): | |||
can_delegate = True | |||
|
|||
|
|||
class QuantLinearTest(torch.nn.Module): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are trying to move away from adding testing logic to the aot_arm_compliler, please see #13902 for my suggestions on the cortex_m testing strategy
fc_params.output_offset = output_zp; | ||
fc_params.activation.min = std::numeric_limits<int8_t>::min(); | ||
fc_params.activation.max = std::numeric_limits<int8_t>::max(); | ||
cmsis_nn_dims input_dims = {1, 1, 1, in_feat}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is channels last where as pytorch per default is channels first, how are you handling that?
@@ -223,3 +223,220 @@ def quantized_add_out_impl( | |||
out.copy_(result_quantized) | |||
|
|||
return out | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we start splitting these definitions into separate files?
self._cleanup_nodes(graph) | ||
return fusion_count | ||
|
||
def _find_original_input_placeholder(self, dq_node: Node) -> Node: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not used?
Integrate with CMSIS-NN with per-channel quantization support
Test Plan:
Run e2e test on FVP simulator
./examples/arm/run_mcu_models_fvp.sh --target=cortex-m55 --models=qlinear
Reviewers:
Subscribers:
Tasks:
Tags:
Summary
[PLEASE REMOVE] See CONTRIBUTING.md's Pull Requests for ExecuTorch PR guidelines.
[PLEASE REMOVE] If this PR closes an issue, please add a
Fixes #<issue-id>
line.[PLEASE REMOVE] If this PR introduces a fix or feature that should be the upcoming release notes, please add a "Release notes: " label. For a list of available release notes labels, check out CONTRIBUTING.md's Pull Requests.
Test plan
[PLEASE REMOVE] How did you test this PR? Please write down any manual commands you used and note down tests that you have written if applicable.