-
Notifications
You must be signed in to change notification settings - Fork 136
Add CFI directives to armv8-mont #2584
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #2584 +/- ##
=======================================
Coverage 78.69% 78.69%
=======================================
Files 645 645
Lines 110744 110743 -1
Branches 15662 15662
=======================================
Hits 87150 87150
Misses 22903 22903
+ Partials 691 690 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ldr x29,[sp],#64 | ||
.cfi_restore x29 | ||
.cfi_def_cfa_offset 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see x30 restored?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh that is weird, it looks like bn_mul_mont
stores x30
, never changes/restores it, and can calls ret
to that original x30
address. It also calls __bn_sqr8x_mont
/__bn_mul4x_mont
which also store x30
, and they restore it before returning. But those functions also never use/change x30
. So it seems like I should load it in bn_mul_mont
before returning, or should I update everything to not save it on the stack at all?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out x30 is modified in the 4x and 8x functions but not in the top level bn_mul_mont, I was looking at uses in the perl code and missed my ($cnt,$carry,$topmost)=("x27","x28","x30");
in the 8x and 4x versions. Looking at armv8-mont.S it is clear x30 is modified during execution in the
I did an experiment to remove the store x30 in the top level bn_mul_mont but that caused all the uses of it to fail. I may have messed up the store/restore logic. I tried to remove the store x30 and shift all the other stores over by 8 bytes
bn_mul_mont:
.cfi_startproc
AARCH64_SIGN_LINK_REGISTER
tst $num,#7
b.eq __bn_sqr8x_mont
tst $num,#3
b.eq __bn_mul4x_mont
.Lmul_mont:
stp x29,x19,[sp,#-56]!
.cfi_def_cfa_offset 56
.cfi_offset x19, -48
.cfi_offset x29, -56
add x29,sp,#0
.cfi_def_cfa x29, 56
stp x20,x21,[sp,#16]
.cfi_offset x20, -40
.cfi_offset x21, -32
stp x22,x23,[sp,#32]
.cfi_offset x22, -24
.cfi_offset x23, -16
str x24,[sp,#48]
.cfi_offset x24, -8
...
ldp x20,x21,[x29,#16]
.cfi_restore x20
.cfi_restore x21
mov sp,x29
.cfi_def_cfa sp, 56
ldp x22,x23,[x29,#32]
.cfi_restore x22
.cfi_restore x23
mov x0,#1
ldr x24,[x29,#48]
.cfi_restore x24
ldp x29,x19,[sp],#56
.cfi_restore x19
.cfi_restore x29
.cfi_def_cfa_offset 0
AARCH64_VALIDATE_LINK_REGISTER
ret
.cfi_endproc
.size bn_mul_mont,.-bn_mul_mont
@@ -1007,6 +1043,7 @@ | |||
.Lsqr8x8_post_condition: | |||
adc $carry,xzr,xzr | |||
ldr x30,[x29,#8] // pull return address | |||
.cfi_restore x30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to your change but this ldr
is weirdly placed. shouldn't it be after the .Lsqr8x_done
label?
@@ -1485,6 +1550,7 @@ | |||
// $acc0-3,$carry hold result, $m0-7 hold modulus | |||
subs $a0,$acc0,$m0 | |||
ldr x30,[x29,#8] // pull return address | |||
.cfi_restore x30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as above, seems to me this load is in a weird spot?
Are there any CI checks we can add to ensure CFI directives are added or retained in future assembly changes? I can't think of anything off the top of my head... |
We already have an ABI test for this, there are two possible problems:
CHECK_ABI eventually calls down to abi_test_trampoline Lines 495 to 504 in 1dd4333
Looking at the ARMv8 version it doesn't support performing the unwind part aws-lc/crypto/test/asm/trampoline-armv8.pl Lines 51 to 56 in 1dd4333
|
This aws-lc/crypto/test/asm/trampoline-x86_64.pl Lines 250 to 253 in 1dd4333
sets the trap flag in x86 CPU flags: https://en.wikipedia.org/wiki/Trap_flag. When this flag is set then the CPU executes a single instruction and stops. In our case that will be the instruction just before the function call ( call *%rax ). aws-lc/crypto/test/asm/trampoline-x86_64.pl Line 257 in 1dd4333
So that's a programatic way to make to trigger a breakpoint that x86 CPUs support. I'm not sure if Arm supports something like that. |
Issues:
Addresses #2459
Description of changes:
Some of our assembly functions are missing Call Frame Information (CFI) directives. These are used to generate DWARF (Debug With Arbitrary Record Formats) entries that are used for stack unwinding. This is also important for profilers that rely on stack unwinding to identify code host spots. At a high level we need:
.cfi_startproc
/.cfi_endproc
is required to mark the start/end of each function.cfi_def_cfa_offset
is required to track when and how the stack pointer moves.cfi_offset
is optional and tells the debugger what variables are on the stack and whereCall-outs:
This change does not introduce any new CPU instructions or impact the performance at all. These CFI directives are assembler directives that generate metadata that explains how to unwind the stack.
Testing:
Looking at the flamegraphs things are better but not perfect yet. I figured out why the bn_div_consttime was showing up in the wrong stack: other assembly functions in bn-armv8.pl were missing CFI directives. With that fixed RSA is almost perfect, the last issue appears to be a few helper functions from s2n-bignum that need to be fixed upstream.
Before
After
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.