Skip to content

Add CFI directives to armv8-mont #2584

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Conversation

andrewhop
Copy link
Contributor

@andrewhop andrewhop commented Jul 29, 2025

Issues:

Addresses #2459

Description of changes:

Some of our assembly functions are missing Call Frame Information (CFI) directives. These are used to generate DWARF (Debug With Arbitrary Record Formats) entries that are used for stack unwinding. This is also important for profilers that rely on stack unwinding to identify code host spots. At a high level we need:

  • .cfi_startproc/.cfi_endproc is required to mark the start/end of each function
  • .cfi_def_cfa_offset is required to track when and how the stack pointer moves
  • .cfi_offset is optional and tells the debugger what variables are on the stack and where

Call-outs:

This change does not introduce any new CPU instructions or impact the performance at all. These CFI directives are assembler directives that generate metadata that explains how to unwind the stack.

Testing:

Looking at the flamegraphs things are better but not perfect yet. I figured out why the bn_div_consttime was showing up in the wrong stack: other assembly functions in bn-armv8.pl were missing CFI directives. With that fixed RSA is almost perfect, the last issue appears to be a few helper functions from s2n-bignum that need to be fixed upstream.

Before

image

After

image

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license and the ISC license.

@codecov-commenter
Copy link

codecov-commenter commented Jul 29, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 78.69%. Comparing base (1dd4333) to head (aa40eb5).

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #2584   +/-   ##
=======================================
  Coverage   78.69%   78.69%           
=======================================
  Files         645      645           
  Lines      110744   110743    -1     
  Branches    15662    15662           
=======================================
  Hits        87150    87150           
  Misses      22903    22903           
+ Partials      691      690    -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@andrewhop andrewhop marked this pull request as ready for review July 29, 2025 22:00
@andrewhop andrewhop requested a review from a team as a code owner July 29, 2025 22:00
ldr x29,[sp],#64
.cfi_restore x29
.cfi_def_cfa_offset 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see x30 restored?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh that is weird, it looks like bn_mul_mont stores x30, never changes/restores it, and can calls ret to that original x30 address. It also calls __bn_sqr8x_mont/__bn_mul4x_mont which also store x30, and they restore it before returning. But those functions also never use/change x30. So it seems like I should load it in bn_mul_mont before returning, or should I update everything to not save it on the stack at all?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out x30 is modified in the 4x and 8x functions but not in the top level bn_mul_mont, I was looking at uses in the perl code and missed my ($cnt,$carry,$topmost)=("x27","x28","x30");in the 8x and 4x versions. Looking at armv8-mont.S it is clear x30 is modified during execution in the

I did an experiment to remove the store x30 in the top level bn_mul_mont but that caused all the uses of it to fail. I may have messed up the store/restore logic. I tried to remove the store x30 and shift all the other stores over by 8 bytes

bn_mul_mont:
.cfi_startproc
	AARCH64_SIGN_LINK_REGISTER
	tst	$num,#7
	b.eq	__bn_sqr8x_mont
	tst	$num,#3
	b.eq	__bn_mul4x_mont
.Lmul_mont:
	stp	x29,x19,[sp,#-56]!
.cfi_def_cfa_offset 56
.cfi_offset x19, -48
.cfi_offset x29, -56
	add	x29,sp,#0
.cfi_def_cfa x29, 56
	stp	x20,x21,[sp,#16]
.cfi_offset x20, -40
.cfi_offset x21, -32
	stp	x22,x23,[sp,#32]
.cfi_offset x22, -24
.cfi_offset x23, -16
	str	x24,[sp,#48]
.cfi_offset x24, -8

...

	ldp	x20,x21,[x29,#16]
.cfi_restore x20
.cfi_restore x21
	mov	sp,x29
.cfi_def_cfa sp, 56
	ldp	x22,x23,[x29,#32]
.cfi_restore x22
.cfi_restore x23
	mov	x0,#1
	ldr	x24,[x29,#48]
.cfi_restore x24
	ldp	x29,x19,[sp],#56
.cfi_restore x19
.cfi_restore x29
.cfi_def_cfa_offset 0
	AARCH64_VALIDATE_LINK_REGISTER
	ret
.cfi_endproc
.size	bn_mul_mont,.-bn_mul_mont

@@ -1007,6 +1043,7 @@
.Lsqr8x8_post_condition:
adc $carry,xzr,xzr
ldr x30,[x29,#8] // pull return address
.cfi_restore x30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not related to your change but this ldr is weirdly placed. shouldn't it be after the .Lsqr8x_done label?

@@ -1485,6 +1550,7 @@
// $acc0-3,$carry hold result, $m0-7 hold modulus
subs $a0,$acc0,$m0
ldr x30,[x29,#8] // pull return address
.cfi_restore x30
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as above, seems to me this load is in a weird spot?

@WillChilds-Klein
Copy link
Contributor

Are there any CI checks we can add to ensure CFI directives are added or retained in future assembly changes? I can't think of anything off the top of my head...

@andrewhop
Copy link
Contributor Author

Are there any CI checks we can add to ensure CFI directives are added or retained in future assembly changes? I can't think of anything off the top of my head...

We already have an ABI test for this, there are two possible problems:

  1. Nothing is running that particular test with libunwind in our CI
  2. CHECK_ABI might now be able to catch this issue

CHECK_ABI eventually calls down to abi_test_trampoline

// abi_test_trampoline loads callee-saved registers from |state|, calls |func|
// with |argv|, then saves the callee-saved registers into |state|. It returns
// the result of |func|. If |unwind| is non-zero, this function triggers unwind
// instrumentation.
//
// We give |func| type |crypto_word_t| to avoid tripping MSVC's warning 4191.
crypto_word_t abi_test_trampoline(crypto_word_t func,
abi_test::internal::CallerState *state,
const crypto_word_t *argv, size_t argc,
crypto_word_t unwind);

// abi_test_trampoline loads callee-saved registers from |state|, calls |func|
// with |argv|, then saves the callee-saved registers into |state|. It returns
// the result of |func|. If |unwind| is non-zero, this function triggers unwind
// instrumentation.

Looking at the ARMv8 version it doesn't support performing the unwind part

// abi_test_trampoline loads callee-saved registers from |state|, calls |func|
// with |argv|, then saves the callee-saved registers into |state|. It returns
// the result of |func|. The |unwind| argument is unused.
// uint64_t abi_test_trampoline(void (*func)(...), CallerState *state,
// const uint64_t *argv, size_t argc,
// uint64_t unwind);
It looks like the only platform that supports unwind testing is the x86_64 implementation. Briefly looking at the x86_64 version it depends on a trap flag I'm not familiar with and we'd need to find something that would work on ARM. @dkostic and @nebeid do you have any idea what the ARM version could use?

@dkostic
Copy link
Contributor

dkostic commented Aug 6, 2025

It looks like the only platform that supports unwind testing is the x86_64 implementation. Briefly looking at the x86_64 version it depends on a trap flag I'm not familiar with and we'd need to find something that would work on ARM. @dkostic and @nebeid do you have any idea what the ARM version could use?

This

# Set the trap flag.
pushfq
orq \$0x100, 0(%rsp)
popfq
:
sets the trap flag in x86 CPU flags: https://en.wikipedia.org/wiki/Trap_flag. When this flag is set then the CPU executes a single instruction and stops. In our case that will be the instruction just before the function call (call *%rax). .

So that's a programatic way to make to trigger a breakpoint that x86 CPUs support. I'm not sure if Arm supports something like that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants