-
Notifications
You must be signed in to change notification settings - Fork 3k
SPE: Fix up atomic usage #9617
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SPE: Fix up atomic usage #9617
Conversation
PSA SPE code was using atomics, but not consistently. On the assumption that the atomics are needed, correct the code to be properly atomic. * Tighten up table full case - new_handle was left with a bogus value, and surrounding code (as a result?) used loop index to assert success. Make handle the sole output of the loop, and correct and use it. * Ensure handle in table is cleared last, with atomic store to release. * Make skipping of the invalid handle in handle generator loop atomic. * Use atomic load on state assert, and don't re-read on failure. * Tighten up types, and avoid casts by using new signed atomics.
Requires preceding PR #9600 for atomic updates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kjbracey-arm , thanks for taking a look at our code.
I have couple of small remarks.
uint32_t tmp_handle; | ||
do { | ||
tmp_handle = core_util_atomic_incr_u16(&(handle_mgr->handle_generator), 1); | ||
} while (tmp_handle == PSA_HANDLE_MGR_INVALID_HANDLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSA_HANDLE_MGR_INVALID_HANDLE
defined to be uint16_t
.
i think it is better to compare to PSA_NULL_HANDLE
instead.
PSA_HANDLE_MGR_INVALID_HANDLE
is only used for static handle_generator
initialisation.
} while (tmp_handle == PSA_HANDLE_MGR_INVALID_HANDLE); | |
} while (tmp_handle == PSA_NULL_HANDLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this one was debatable - all those constants are so coupled in their assumptions, it's a bit hard to tell which to use. I went for the PSA_HANDLE_MGR_INVALID_HANDLE
as it is the direct 16-bit value of the handle_generator
, despite having just popped it into a uint32_t
ready for composition.
I think in principle PSA_NULL_HANDLE
doesn't have to be PSA_HANDLE_MGR_INVALID_HANDLE
zero-extended - if it wasn't my form is correct. (It might have been simpler if the invalidness was represented by a too-big value in the slot indicator, rather than a special generator count value, but maybe having the null value be 0 is convenient).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PSA_NULL_HANDLE
is part of PSA FF spec. The purpose of having it as it simplifies programming model. For example it allows us to call psa_close()
regardless of psa_connect()
status, which simplifies error handling logic.
PSA_HANDLE_MGR_INVALID_HANDLE is zero as it is initialises our handle_generator.
The connection between them is rather occasional.
handle_mgr->handles_pool[pool_ix].handle_owner = PSA_HANDLE_MGR_INVALID_FRIEND_OWNER; | ||
handle_mgr->handles_pool[pool_ix].handle_friend = PSA_HANDLE_MGR_INVALID_FRIEND_OWNER; | ||
core_util_atomic_store_s32(&(handle_mgr->handles_pool[pool_ix].handle), PSA_NULL_HANDLE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do agree that .handle_owner
should be set last, as it returns the handle to the pool. And your change is actually fixing a bug in our implementation. Great catch!
But IMO the atomic store is redundant here. .handle
is an aligned 32bit value thus single bus transaction will be sufficient for storing the value.
The code here written with an assumption that handle is owned as long as it not set to PSA_NULL_HANDLE
. Thus only the owner will be accessing it. If this assumption is wrong we need to rewrite the whole code. Otherwise single store instruction is sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handle must be written last. Only the atomic store will guarantee that - it has a memory ordering semantic. Without the atomic store, the compiler is free to reorder those 3 stores, so might not perform the handle write last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for explaining it. You are correct!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from the memory ordering, which is the most likely practical issue, there's the formal issue that doing a non-atomic write to NULL at the same time as a cas
reads it is undefined behaviour. In theory there could be tearing effects. (On a 16-bit CPU this 32-bit handle would tear. On ARM a 64-bit variable would tear).
Plain reads and writes are only permitted if there is no possibility of a write and a read happening at the same time - either due being ordered against other synchronised accesses, like the handle_friend
here is ordered by the atomic handle
accesses, or by mutexes and the like.
// Handle generator uses only 16 bits, and wraps. | ||
// The reason for this is that we use the 16 upper bits to store the handle's index in the handles pool (for performance reasons) | ||
uint16_t handle_generator; /* A counter supplying handle numbers. */ | ||
uint16_t pool_size; /* The maximum number of handles that pool can contain. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i would really prefer to have uint32_t
for a size.
This will waste 4B in RAM but i will sleep better when next time this code will be modified/maintained.
Alternatively, please add casting in the for loop compare statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a fundamental assumption throughout all the shifting and masking that these are both 16 bits, so I felt that making very sure that we don't have more than 16 bits in the state variables seemed like a good plan in itself, aside from the memory saving.
A cast isn't necessary - there's no problem comparing different width types of the same signedness - but the loop could certainly be changed to using uint16_t
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, pool_ix
needs to be 32-bit, else casted before the left shift. Bleurgh.
{ | ||
if (*current_state != expected_state) { | ||
uint8_t actual_state = core_util_atomic_load_u8(current_state); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced atomic load is necessary here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It only wouldn't be necessary if you know for certain no other threads could be modifying it. If you knew you were the "owner" due to a previous cas
, so you knew that you were in a state that no-one else would modify, you'd be fine.
But as this is an assert
, a lot of your assertable cases would actually then make this undefined behaviour if it was a plain load. Getting the atomic load gets you to the assert in a defined way in the event of failure.
I don't actually know if the atomics are needed at all here - I guess it's a sort of mechanism where exactly 1 thread can transition from IDLE to ACTIVE, and then it gets control?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the atomic load isn't actually particularly expensive on current Mbed OS platforms - it will just be inlined as a normal LDRB, with the compiler being a bit stricter about ordering.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
atomic operations at handler allocations are needed for sure.
state change is also required to be atomic since it changed in SPM context and asserted in secure partition context.
atomic for write in destroy handle operation, you have convinced me it is needed to make sure compiler wont optimize stuff.
Atomic read still looks redundant to me.
perhaps may only be valid when using buffered writes...
Better safe than sorry :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state change is also required to be atomic since it changed in SPM context and asserted in secure partition context.
So those are different threads, not just the other side of the security barrier in the same thread? If they are different threads, then the non-atomic read is undefined behaviour.
I think you can construct a convincing argument why the undefined behaviour won't cause a problem in this case, based on assumptions about the platform and how limited the compiler's hostility is, but compilers really are getting quite mean about exploiting undefined behaviour these days, so I really don't want to go anywhere near it. Making the code legal doesn't cost much, and serves as a documentation reminder. Plus atomics are hard enough to reason about as it is without trying to do "unsynchronised atomics".
Even if you never see a problem in practice on current hardware, one day some mean person is going to run you against a testing tool like this which would chew you straight out for that unsynchronised load.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok
@0xc0170, @NirSonnenschein travis failure seems to be irrelevant to the changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
starting CI |
@pan- Would you be able to look at this any time soon? |
@cmonr Not today, but tomorrow I can. |
@pan- review please |
While we wait for the final review, CI started |
Test run: SUCCESSSummary: 12 of 12 test jobs passed |
Description
PSA SPE code was using atomics, but not consistently. On the assumption
that the atomics are needed, correct the code to be properly atomic.
and surrounding code (as a result?) used loop index to assert success.
Make handle the sole output of the loop, and correct and use it.
Pull request type