Skip to content

SPE: Fix up atomic usage #9617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 19, 2019
Merged

SPE: Fix up atomic usage #9617

merged 1 commit into from
Feb 19, 2019

Conversation

kjbracey
Copy link
Contributor

@kjbracey kjbracey commented Feb 5, 2019

Description

PSA SPE code was using atomics, but not consistently. On the assumption
that the atomics are needed, correct the code to be properly atomic.

  • Tighten up table full case - new_handle was left with a bogus value,
    and surrounding code (as a result?) used loop index to assert success.
    Make handle the sole output of the loop, and correct and use it.
  • Ensure handle in table is cleared last, with atomic store to release.
  • Make skipping of the invalid handle in handle generator loop atomic.
  • Use atomic load on state assert, and don't re-read on failure.
  • Tighten up types, and avoid casts by using new signed atomics.

Pull request type

[ ] Fix
[X] Refactor
[ ] Target update
[ ] Functionality change
[ ] Docs update
[ ] Test update
[ ] Breaking change

PSA SPE code was using atomics, but not consistently. On the assumption
that the atomics are needed, correct the code to be properly atomic.

* Tighten up table full case - new_handle was left with a bogus value,
  and surrounding code (as a result?) used loop index to assert success.
  Make handle the sole output of the loop, and correct and use it.
* Ensure handle in table is cleared last, with atomic store to release.
* Make skipping of the invalid handle in handle generator loop atomic.
* Use atomic load on state assert, and don't re-read on failure.
* Tighten up types, and avoid casts by using new signed atomics.
@kjbracey
Copy link
Contributor Author

kjbracey commented Feb 5, 2019

Requires preceding PR #9600 for atomic updates.

@orenc17
Copy link
Contributor

orenc17 commented Feb 5, 2019

@alzix @mikisch81 @danny4478 FYI

Copy link
Contributor

@alzix alzix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kjbracey-arm , thanks for taking a look at our code.
I have couple of small remarks.

uint32_t tmp_handle;
do {
tmp_handle = core_util_atomic_incr_u16(&(handle_mgr->handle_generator), 1);
} while (tmp_handle == PSA_HANDLE_MGR_INVALID_HANDLE);
Copy link
Contributor

@alzix alzix Feb 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PSA_HANDLE_MGR_INVALID_HANDLE defined to be uint16_t.
i think it is better to compare to PSA_NULL_HANDLE instead.
PSA_HANDLE_MGR_INVALID_HANDLE is only used for static handle_generator initialisation.

Suggested change
} while (tmp_handle == PSA_HANDLE_MGR_INVALID_HANDLE);
} while (tmp_handle == PSA_NULL_HANDLE);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this one was debatable - all those constants are so coupled in their assumptions, it's a bit hard to tell which to use. I went for the PSA_HANDLE_MGR_INVALID_HANDLE as it is the direct 16-bit value of the handle_generator, despite having just popped it into a uint32_t ready for composition.

I think in principle PSA_NULL_HANDLE doesn't have to be PSA_HANDLE_MGR_INVALID_HANDLE zero-extended - if it wasn't my form is correct. (It might have been simpler if the invalidness was represented by a too-big value in the slot indicator, rather than a special generator count value, but maybe having the null value be 0 is convenient).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PSA_NULL_HANDLE is part of PSA FF spec. The purpose of having it as it simplifies programming model. For example it allows us to call psa_close() regardless of psa_connect() status, which simplifies error handling logic.
PSA_HANDLE_MGR_INVALID_HANDLE is zero as it is initialises our handle_generator.
The connection between them is rather occasional.

handle_mgr->handles_pool[pool_ix].handle_owner = PSA_HANDLE_MGR_INVALID_FRIEND_OWNER;
handle_mgr->handles_pool[pool_ix].handle_friend = PSA_HANDLE_MGR_INVALID_FRIEND_OWNER;
core_util_atomic_store_s32(&(handle_mgr->handles_pool[pool_ix].handle), PSA_NULL_HANDLE);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i do agree that .handle_owner should be set last, as it returns the handle to the pool. And your change is actually fixing a bug in our implementation. Great catch!
But IMO the atomic store is redundant here. .handle is an aligned 32bit value thus single bus transaction will be sufficient for storing the value.
The code here written with an assumption that handle is owned as long as it not set to PSA_NULL_HANDLE. Thus only the owner will be accessing it. If this assumption is wrong we need to rewrite the whole code. Otherwise single store instruction is sufficient.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handle must be written last. Only the atomic store will guarantee that - it has a memory ordering semantic. Without the atomic store, the compiler is free to reorder those 3 stores, so might not perform the handle write last.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for explaining it. You are correct!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from the memory ordering, which is the most likely practical issue, there's the formal issue that doing a non-atomic write to NULL at the same time as a cas reads it is undefined behaviour. In theory there could be tearing effects. (On a 16-bit CPU this 32-bit handle would tear. On ARM a 64-bit variable would tear).

Plain reads and writes are only permitted if there is no possibility of a write and a read happening at the same time - either due being ordered against other synchronised accesses, like the handle_friend here is ordered by the atomic handle accesses, or by mutexes and the like.

// Handle generator uses only 16 bits, and wraps.
// The reason for this is that we use the 16 upper bits to store the handle's index in the handles pool (for performance reasons)
uint16_t handle_generator; /* A counter supplying handle numbers. */
uint16_t pool_size; /* The maximum number of handles that pool can contain. */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would really prefer to have uint32_t for a size.
This will waste 4B in RAM but i will sleep better when next time this code will be modified/maintained.
Alternatively, please add casting in the for loop compare statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a fundamental assumption throughout all the shifting and masking that these are both 16 bits, so I felt that making very sure that we don't have more than 16 bits in the state variables seemed like a good plan in itself, aside from the memory saving.

A cast isn't necessary - there's no problem comparing different width types of the same signedness - but the loop could certainly be changed to using uint16_t.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, pool_ix needs to be 32-bit, else casted before the left shift. Bleurgh.

{
if (*current_state != expected_state) {
uint8_t actual_state = core_util_atomic_load_u8(current_state);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced atomic load is necessary here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It only wouldn't be necessary if you know for certain no other threads could be modifying it. If you knew you were the "owner" due to a previous cas, so you knew that you were in a state that no-one else would modify, you'd be fine.

But as this is an assert, a lot of your assertable cases would actually then make this undefined behaviour if it was a plain load. Getting the atomic load gets you to the assert in a defined way in the event of failure.

I don't actually know if the atomics are needed at all here - I guess it's a sort of mechanism where exactly 1 thread can transition from IDLE to ACTIVE, and then it gets control?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that the atomic load isn't actually particularly expensive on current Mbed OS platforms - it will just be inlined as a normal LDRB, with the compiler being a bit stricter about ordering.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

atomic operations at handler allocations are needed for sure.
state change is also required to be atomic since it changed in SPM context and asserted in secure partition context.
atomic for write in destroy handle operation, you have convinced me it is needed to make sure compiler wont optimize stuff.
Atomic read still looks redundant to me.
perhaps may only be valid when using buffered writes...
Better safe than sorry :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

state change is also required to be atomic since it changed in SPM context and asserted in secure partition context.

So those are different threads, not just the other side of the security barrier in the same thread? If they are different threads, then the non-atomic read is undefined behaviour.

I think you can construct a convincing argument why the undefined behaviour won't cause a problem in this case, based on assumptions about the platform and how limited the compiler's hostility is, but compilers really are getting quite mean about exploiting undefined behaviour these days, so I really don't want to go anywhere near it. Making the code legal doesn't cost much, and serves as a documentation reminder. Plus atomics are hard enough to reason about as it is without trying to do "unsynchronised atomics".

Even if you never see a problem in practice on current hardware, one day some mean person is going to run you against a testing tool like this which would chew you straight out for that unsynchronised load.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

@alzix
Copy link
Contributor

alzix commented Feb 6, 2019

@0xc0170, @NirSonnenschein travis failure seems to be irrelevant to the changes.

Copy link
Contributor

@alzix alzix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@NirSonnenschein
Copy link
Contributor

starting CI

@cmonr
Copy link
Contributor

cmonr commented Feb 14, 2019

@pan- Would you be able to look at this any time soon?

@pan-
Copy link
Member

pan- commented Feb 14, 2019

@cmonr Not today, but tomorrow I can.

@orenc17
Copy link
Contributor

orenc17 commented Feb 18, 2019

@pan- review please

@0xc0170
Copy link
Contributor

0xc0170 commented Feb 19, 2019

While we wait for the final review, CI started

@mbed-ci
Copy link

mbed-ci commented Feb 19, 2019

Test run: SUCCESS

Summary: 12 of 12 test jobs passed
Build number : 1
Build artifacts

@cmonr cmonr merged commit 19474fc into ARMmbed:master Feb 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants