-
Notifications
You must be signed in to change notification settings - Fork 900
Add simple opal atomic store/load? #9722
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think the assumption is that single-elements loads and stores are atomic, yet not synchronizing. OPAL provides |
I'm thinking of cases where one process is constantly updating (think sequence number style) a value and another is reading it. (not the case where I want to ensure ordering between two variables' acceses, where the barriers are useful). If the load/stores are not atomic, a partial value could theoretically be received. I think writing 64-bit values on arm, or writing to non-64-bit-aligned ones on x86 is not atomic. |
According to https://developer.arm.com/documentation/ddi0487/gb/, B2.2.1 64bit load/stores are atomic on 64-bit arm. Unaligned 64bit load/store is UB in C so I doubt the compiler will come to the rescue here (http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf page 74, 7) Having said that, I am not opposed to introducing atomic load/store to opal. But what would be the memory ordering? Sequential consistency likely will perform worse than |
I'd definitely say relaxed ordering. My specific use-case is about the closest thing I can get to ordinary C writes/reads, without being prone to errors. A variant with other-than-relaxed ordering could also be created, as is also bound to be useful somewhere. Edit: Another point where simple writes/reads may be problematic are 64-bit writes on 32-bit x86 (?). To be honest, for my case I only need atomic loads/stores for size_t, so it's possible (as it turns out!) that for the major architectures I'm absolutely fine without explicit atomics. In any way, it would be nice for future insterested parties if these atomics where implemented, even if just via a simple C assignment (or |
I think we'd be open to a pull request, but I'm dubious they're actually useful. We still support compilers with limited atomic support and there's not a great way to express what you're asking for without compiler support. Yes, you could provide wrappers which are essentially just load and store, and rely on the behaviors we rely on today, but I'd almost rather we make the developer think about it, rather than letting the developer assume the compiler will do things that it won't do (like enforce all the atomicity rules). |
Given the new (to me!) revelations that the compiler automatically naturally aligns types (?) (since it would be UB if it didn't), which did not occur to me at the time, and since more operations than I thought are "tear-free" (eg. 64-bit ops on arm) -- though I don't have a complete image for archs beyond x86 and arm --, I wouldn't be as hard-pressed to add these. However, even if just one case exists (64-bit op on 32-bit x86?) where simple assignments do not cut it, the ideally portable code would optimally use explicit atomic operations. I agree that the developer should be aware of what happens behind the scenes, but from an atomics library viewpoint, the role of which is (?) to take some of this burden of the developer, these basic ops seem like a basic component. Of course the additions would also serve as documentation for future eyes. In any case, it looks to me like they are a good fit and are non-intrusive. For example, there exist a bunch of similar
Apart from If you also agree, I can go ahead and open a PR and take it from there. |
Just to note. Open MPI no longer supports 32-bit platforms (x86, ppc32, arm32, etc). Also keep in mind an atomic load can be done as fetch add with a 0 operand and store with atomic swap. |
Also keep in mind that Open MPI intends to move to using C11 atomics exclusively at some point in the future. Most of the custom atomic code will go away at that point. |
I am not opposed to having simple atomic load/stores in opal. However, I don't think there is a use-case for it right now because inter-thread synchronization is synchronized through locks and/or fences, so it would essentially be dead code. |
@gkatev If you need such infrastructure you can of course add it to your local branch. If that ever gets upstreamed we will have a use-case for atomic load/store and will include the them. I am also planning to revisit our current use of fences and atomics, if I come across a possible use-case I would sync up with you. |
Yes this is all reasonable, and I've already taken care of my local code, so all is well for me. Since it looks like an overhaul will be taking place at some point I do suggest their future addition! (use case: single-writer-based synchronization, in an numerically incremental manner, without extra ordering flags) (not sure if we/you want the issue closed -- feel free to do it) |
Turns out, all increment/decrement/assignment accesses to an I am working on a patch to introduce @gkatev Was that motivating your initial question about loads and stores? This struck me while doing some profiling and I didn't realize that was an issue earlier. |
So, you refer to Are there any opal atomics implementations that rely on the implicit ordering of My initial request would indeed be solved by functions that call atomic_load/store_explicit with relaxed ordering! You mean if my motivation was for opal_atomic_store/load with non-seqcst-implied ordering? From what I remember there where (are?) no opal atomic functions that only store or load (but e.g. there are add_fetch). My motivation here for the existence of opal_atomic_store/load with relaxed ordering was portability. Because while in most architectures a non-atomic write/read would be equivalent, maybe on some bizarre architecture it won't be (??). Essentially, the same way that one uses |
That's right, it's supposed to be non-atomic addition and assignment.
IIRC, the reason we included while (atomic_var == 0) { ... } Not sure where such patterns are used. Alternatively, using My understanding is that throughout the code base we rely on explicit memory barriers ( IMHO, C11 |
Hi, I'm looking through OPAL's atomic utilities (opal/include/opal/sys/atomic.h), and I'm noticing that while there's a large collection of atomic operations, there exist no simple store/load functions (eg.
opal_atomic_store
oropal_atomic_load
).Would it be a good idea / is it possible to add these operations?
I can see some alternative solutions, like using another operation (eg.
opal_atomic_add_fetch
-- but what about performance?), or rolling my own version (gcc/asm), but it definitely sounds nice if the solution was integrated in opal's atomic library (?).The text was updated successfully, but these errors were encountered: