i64.atomic.wait has no matching futex syscall on Linux #135

programmerjake · 2019-02-24T08:57:44Z

I noticed that the i64.atomic.wait instruction can't be translated directly to the futex syscall on Linux, since Linux only supports 32-bit futex operations. I'm concerned that non-web embeddings on Linux will need to implement their own wait/notify functionality instead of being able to use Linux's directly.

Web embeddings will have to implement their own wait/notify functionality anyway to support primitives like Atomics.waitAsync

Related:
#7
StackOverflow: Linux futex for 64-bits

The text was updated successfully, but these errors were encountered:

conrad-watt · 2019-02-24T10:36:40Z

I believe, based on the discussion in #7, that it's possible to implement i64.atomic.wait as a 64-bit atomic read (to confirm the expected value), conditionally followed by a linux 32-bit futex operation on the same address, expecting the low 32-bits of the value read (modulo endianness).

programmerjake · 2019-02-24T17:37:38Z

that doesn't actually work, there can be a lost wakeup:

assuming little endian:

thread 1	thread 2
write 0x100000000
start i64.atomic.wait 0x100000000
read & compare
	write 0x0
	futex wake
futex wait

conrad-watt · 2019-02-24T18:36:00Z

Ah, true. Sorry for the rookie mistake.

lars-t-hansen · 2019-02-25T08:34:09Z

I don't think you can use raw futexes even for the 32-bit wait. The wait/notify mechanism has some guarantees, such as ordered wakeup, that raw futexes do not seem to provide.

binji · 2019-02-25T20:31:44Z

Right, I don't think it was ever the intention that you could directly replace these instructions with raw futex syscalls.

rianhunter · 2020-03-18T08:22:05Z

Another modest plea to drop memory.atomic.wait64 (f.k.a. i64.atomic.wait) for the sake of allowing a straightforward futex()-based implementation. An additional modest plea to ease the requirements of memory.atomic.wait32 to allow a futex()-based implementation, in particular allowing spurious wakeups. Some rationale:

This will ease implementation for futex()-based non-web embeddings
- OpenBSD also has a native futex() calls: https://man.openbsd.org/futex
- Other systems may follow suit, after the addition of C++20 atomic_wait https://en.cppreference.com/w/cpp/atomic/atomic_wait
This will potentially improve performance for futex()-based non-web embeddings
64-bit wait doesn't seem to be widespread in existing native code (C/C++)
Interop with JS Atomics doesn't require 64-bit wait

Additionally, allowing spurious wakeups eases low-level implementations of memory.atomic.wait32 in general, since the sleep condition is only on the initial check of the atomic. Disallowing spurious wakeups makes more sense when the semantics of the wait call are continuously dependent on the value of the atomic, e.g. in C++20 atomic_wait(). Low-level implementations that have no way to prevent spurious wakeups are unable to decide if they should continue to sleep or not after one occurs (even if they weren't notified).

I believe the ordering issue can be resolved by placing manual seqcst fences around the futex calls, though syscalls already provide a memory barrier so that would be redundant. @lars-t-hansen could you elaborate what you mean by "ordered wakeup"?

tlively · 2020-03-18T09:05:09Z

I'm also curious about why we decided to disallow spurious wakeups. WebAssembly/design#1019 (comment) suggests it is because JS doesn't allow them, but I don't know why that is.

lars-t-hansen · 2020-03-18T09:30:40Z

Interop with JS Atomics doesn't require 64-bit wait

JS supports 64-bit TypedArray in wait and notify: https://tc39.es/ecma262/#sec-atomics.wait handles BigInt64Array, and notify is size-agnostic but allows that type of view.

It was never (except maybe the very early days, when we were starting with something similar to what I believe NaCl had) intended for futexes to be able to directly express the semantics of wait and notify.

The prohibition on spurious wakeups and other design artifacts are there to push complexity into the implementation and away from the user (after all this is/was JS). Sticking to seq_cst for the first version is similarly motivated. Simply adding shared memory and atomics to JS was highly controversial, and compromises that reduced nondeterminism at the cost of some performance were necessary.

The intent of the ordered wakeup rule is fairness and (again) managing nondeterminism: if an observer can determine that one thread blocked in a wait before another, then waking one thread on that location will for sure wake the first one.

lars-t-hansen · 2020-03-18T09:38:07Z

I should add: Compatibility with JS is a big deal for wasm on the web and since JS is still much more important than wasm, JS controls the agenda to some extent. But that fact does not prevent there from being additional instructions that work better in other environments. IMO we should be very open to proposals that add instructions that allow for better performance at the expense of nondeterminism across systems and implementations. But I do think those should be framed as new proposals.

The situation is similar to SIMD: At the moment, wasm SIMD has no nondeterminism and SIMD instructions work the same way (with some perf compromises) on different hardware platforms. But it is expected that the next version of wasm SIMD will have instructions that work (and are known to work) well on some hardware and less well on other hardware, or instructions that have variable amount of imprecision and therefore add nondeterminism, all in the interest of performance.

rianhunter · 2020-03-18T19:10:30Z

@lars-t-hansen your prompt response is appreciated and well understood. While it's unfortunate that the semantics of JS atomics wait/notify diverges from kernel-level wait/notify primitives for the sake of eliminating non-determinism, I understand the design constraint of maintaining semantic compatibility with JS.

@tlively Here's one example (of which you may or may not be aware) where disallowing spurious wakeups in the primitive comes in handy:

// thread 1
val = atomic_load_seqcst(&var);
if (!cond(val))
    atomic_wait(&var, val);
assert(cond(val));

// thread 2
set_condition_seqcst(&var);
atomic_notify(&var);

This avoids a recheck of val after the atomic_wait() in thread 1, which is slightly more efficient if the atomic_wait() primitive also efficiently avoids spurious wakeups. I think the tradeoff here is slightly dubious but that's only because I can't think of other examples where disallowing spurious wakeups simplifies user-level code. Would love to see more examples of algorithms that are much expressed much more succintly and/or efficiently with a deterministic/non-spurious wait, if only for my own edification.

tlively · 2020-03-18T19:33:20Z

@rianhunter I also depend on the absence of spurious wakes in the code that initializes memory when threads are enabled: https://github.com/llvm/llvm-project/blob/adcd02683856c30ba6f349279509acecd90063df/lld/wasm/Writer.cpp#L728-L768. That's not user-level code, but it would be expressible in C. However, that code could be trivially made to handle spurious wakeups by just waiting in a loop.

lars-t-hansen · 2020-03-19T08:28:43Z

In general I think that there are some warts in the current JS design. Disallowing spurious wakeups at the JS API level just pushes that checking into the implementation, and it's not obvious that it really simplifies user code all that much -- user code has to be aware of the complicated semantics of waking up in any case. (Though it's probable that fairness is a useful property and it's possible fairness and no-spurious-wakeup are connected.) It's been pointed out that the return value from notify is of questionable utility, if any (tc39/ecma262#1492). And isLockFree() should almost certainly never have been part of the proposal (and is not in wasm).

rianhunter · 2020-03-23T18:41:05Z

Disallowing spurious wakeups at the JS API level just pushes that checking into the implementation, and it's not obvious that it really simplifies user code all that much

Just for clarity's sake, the reason to allow spurious wakeups in the primitive is that not all systems have an efficient way of preventing spurious wakeups. This is less applicable to big systems and more applicable to small low-level / embedded systems. The impact consideration on user code isn't that allowing spurious wakeups simplifies user code but rather that it doesn't significantly complicate user code (e.g. @tlively's example).

rodgert · 2022-09-01T20:04:07Z

Another modest plea to drop memory.atomic.wait64 (f.k.a. i64.atomic.wait) for the sake of allowing a straightforward futex()-based implementation. An additional modest plea to ease the requirements of memory.atomic.wait32 to allow a futex()-based implementation, in particular allowing spurious wakeups. Some rationale:

This will ease implementation for futex()-based non-web embeddings

OpenBSD also has a native futex() calls: https://man.openbsd.org/futex

Other systems may follow suit, after the addition of C++20 atomic_wait https://en.cppreference.com/w/cpp/atomic/atomic_wait

I am the implementer of libstdc++'s atomic wait. libstdc++ currently supports using futex() on platforms which have it, I also anticipate supporting platforms that have __ulock_wait/wake in a future release (possibly GCC13). For platforms which do not have such a mechanism, there is a fallback implementation built on mutex/condvar. For those types which fit into the underlying platform's wait/notify mechanism the futex() is used directly. If the atomic type T does not fit, it is proxied through another waited address. There are limited number of such addresses and they are shared, so there is an additional potential for spurious wakeups in this case.

The standard allows for spurious wakeups and there are no guarantees on ordering.

This will potentially improve performance for futex()-based non-web embeddings

64-bit wait doesn't seem to be widespread in existing native code (C/C++)

Interop with JS Atomics doesn't require 64-bit wait

Additionally, allowing spurious wakeups eases low-level implementations of memory.atomic.wait32 in general, since the sleep condition is only on the initial check of the atomic. Disallowing spurious wakeups makes more sense when the semantics of the wait call are continuously dependent on the value of the atomic, e.g. in C++20 atomic_wait(). Low-level implementations that have no way to prevent spurious wakeups are unable to decide if they should continue to sleep or not after one occurs (even if they weren't notified).

I believe the ordering issue can be resolved by placing manual seqcst fences around the futex calls, though syscalls already provide a memory barrier so that would be redundant. @lars-t-hansen could you elaborate what you mean by "ordered wakeup"?

CharlieTap · 2024-11-26T09:04:16Z

Hey, is waking up waiters in the order they waited still an invariant? I couldn't see this in the spec?

Also how are people approaching this since 64 bit wait (and timeouts) are not available on all platforms? I'm thinking of just using the address to index into a table with a condvar

conrad-watt · 2024-11-26T09:17:50Z

Hey, is waking up waiters in the order they waited still an invariant? I couldn't see this in the spec?

Yes - see the explanatory note here.

The relaxed memory model also describes the concurrent behaviour of WebAssembly’s wait and notify operations. Each memory location is associated with a queue of waiting threads. A thread suspending as the result of a wait operation enters the queue, and a notify operation to that location will attempt to wake up as many threads as possible from the head of the associated queue, up to the maximum specified by the arguments of the notify operation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

i64.atomic.wait has no matching futex syscall on Linux #135

i64.atomic.wait has no matching futex syscall on Linux #135

programmerjake commented Feb 24, 2019

conrad-watt commented Feb 24, 2019 •

edited

Loading

Uh oh!

programmerjake commented Feb 24, 2019

Uh oh!

conrad-watt commented Feb 24, 2019

Uh oh!

lars-t-hansen commented Feb 25, 2019

Uh oh!

binji commented Feb 25, 2019

Uh oh!

rianhunter commented Mar 18, 2020

Uh oh!

tlively commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 18, 2020

Uh oh!

rianhunter commented Mar 18, 2020 •

edited

Loading

Uh oh!

tlively commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 19, 2020

Uh oh!

rianhunter commented Mar 23, 2020

Uh oh!

rodgert commented Sep 1, 2022

Uh oh!

CharlieTap commented Nov 26, 2024

Uh oh!

conrad-watt commented Nov 26, 2024

Uh oh!

i64.atomic.wait has no matching futex syscall on Linux #135

i64.atomic.wait has no matching futex syscall on Linux #135

Comments

programmerjake commented Feb 24, 2019

conrad-watt commented Feb 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

programmerjake commented Feb 24, 2019

Uh oh!

conrad-watt commented Feb 24, 2019

Uh oh!

lars-t-hansen commented Feb 25, 2019

Uh oh!

binji commented Feb 25, 2019

Uh oh!

rianhunter commented Mar 18, 2020

Uh oh!

tlively commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 18, 2020

Uh oh!

rianhunter commented Mar 18, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tlively commented Mar 18, 2020

Uh oh!

lars-t-hansen commented Mar 19, 2020

Uh oh!

rianhunter commented Mar 23, 2020

Uh oh!

rodgert commented Sep 1, 2022

Uh oh!

CharlieTap commented Nov 26, 2024

Uh oh!

conrad-watt commented Nov 26, 2024

Uh oh!

conrad-watt commented Feb 24, 2019 •

edited

Loading

rianhunter commented Mar 18, 2020 •

edited

Loading