Alignment / undefined behavior for loads & stores #30

kg · 2015-05-04T21:57:18Z

Related to #23 , the current proposed bytecode has a single load op and a single store op.
The load/store ops as proposed accept addresses in bytes (not elements).

Discussion with @lukewagner and @kripken in IRC suggests that the currently intended behavior is for aligned loads/stores to work, and for unaligned loads/stores to have undefined behavior, most likely one of the following:

The address is silently truncated to be aligned
The operation fails (unaligned store is discarded; unaligned load returns garbage)
All loads/stores involve a byte-wise memcpy to/from a temporary aligned buffer
The operation just works because we're compiling down to a native instruction set where aligned/unaligned are (at least in terms of instruction set, if not performance) equivalent

As proposed this would mean one of two things for the polyfill:

The polyfill fails to run any applications that perform unaligned loads/stores
All applications are slow in the polyfill because it has to generate unaligned handling (i.e. memcpy) for all heap accesses.

I believe this is unacceptable. I think we need a reasonable solution for this problem, so that the vast majority of applications can work in the polyfill and have acceptable performance. In the long run people will be running wasm applications via a native VM, at which point this won't matter, but it's my belief that the polyfill will stick around for years so it will still be important that these basic scenarios work acceptably.

My preferred solution would be to have 'aligned' and 'unaligned' load/store operations, similarly to movups/movaps on x86 and mirroring what is currently expressible in JS (HEAPU32[i] vs HEAPU32[i >> 2] vs (memcpy(temp, i, 4), temp[0])). This would allow compilers to produce code that will be functional in all cases, and efficient in scenarios where unaligned and aligned accesses are equivalent. In scenarios where they are not, this 'safe' code would still run, at some performance cost. In cases where performance is desirable, a compiler can generate aligned accesses that would have undefined behavior in edge cases, as we have in the current C/C++ spec.

Essentially this would allow the high performance of native C/C++ to be maintained by generating fast, unsafe 'aligned' operations, which matches the current behavior of emscripten and is conformant to the spec. Simultaneously it would allow languages without this UB black hole, like C#, to generate code that uses aligned stores in most cases and unaligned stores in the places where it needs them.

There are some alternate solutions here to consider, for example:

A 'fast'/'safe' mode switch for the polyfill, where 'fast' mode introduces silent undefined behavior for loads/stores that happen to not be aligned, and 'safe' mode makes all heap operations intolerably-slow-but-correct. This would at least provide a path forward where applications with unaligned accesses will function.
Users ship their own polyfill that does the right thing.

The proposed behavior isn't a divergence from current JS, since unaligned accesses against typed arrays don't exist. On the other hand, there are enough scenarios where unaligned accesses matter (file IO, byte-wise unpacking - like the proposed binary format - and GPU buffers among others) that I think we should handle it with care.

The text was updated successfully, but these errors were encountered:

kripken · 2015-05-04T22:30:00Z

Let me try to summarize this, to check if I understand you:

The current proposal has e.g. a load32. It is going to work if the pointer is unaligned in native wasm, but fail in the polyfill, which will emit HEAP32[ptr >> 2]. The chance of this happening is rare, as it is undefined behavior in C and C++.
In other languages, unaligned reads/writes are possible and are not undefined. This is also true in C and C++, with extensions, like __packed__.
We could add e.g. a loadUnaligned32, which is a load that is implied to be, with high probability, actually unaligned. Native wasm would treat it 100% the same as load32. But the polyfill could emit something that works, like HEAP8[ptr] | (HEAP8[ptr+1] << 8) | .. (manually read bytes and join them). In practice, this would let the polyfill be both fast - as most code would use load32 - and work properly on unaligned data, albeit more slowly, but only more slowly exactly there. (Otherwise, a safe thing would be to do the slow thing all the time, if there is just one load instr.)

Is that right? If so, I like it.

lukewagner · 2015-05-04T22:37:10Z

Remember, the polyfill is just JS so we can do whatever we want and use this flexibility to avoid adding a wart to WebAssembly that would live on in perpetuity. For example, we could have a non-standard (and thus ignored by native impls) "polyfill" section of the module that listed info that is only useful to the polyfill. I already have an example of this: the exact length of the decoded asm.js chars (so that a right-size buffer can be malloc'd up front by the polyfill). To address the issue at hand, we could extend the polyfill section to store a sorted list of offsets of unaligned-safe loads/stores which the polyfill would then use to easily generate byte-loads-and-ors.

kg · 2015-05-04T22:37:27Z

@kripken Precisely. My ideal solution is to go from two opcodes:

load
store

to four:

loadAligned
storeAligned
loadUnaligned
storeUnaligned

Where the aligned/unaligned hints are promises to the VM/runtime, much like they are on x86. In practice once the polyfill is no longer in use, the aligned/unaligned opcodes will be effectively equivalent.

IIRC the aligned-only restrictions on JS are in service of janky architectures like (old) ARM, so we definitely shouldn't shackle ourselves to them, especially in a scenario like this where it can negatively impact production code and hinder adoption.

kg · 2015-05-04T22:39:38Z

@lukewagner I agree with you on principle, but TBH I can't imagine an uglier wart than 'unaligned loads/stores are undefined behavior, sometimes'. Generating a sorted list of offsets sounds extremely complicated compared to two new opcodes, especially given that the opcodes can effectively be aliases in a native implementation. That table will get pretty big for applications with many unaligned ops...

lukewagner · 2015-05-04T22:43:39Z

The proposed semantics are "unaligned loads/stores work" (see V1.md#heap). The polyfill just doesn't implement the spec correctly (although, as Alon's PR for the high-level-goals points out, it could).

kripken · 2015-05-04T22:53:33Z

@lukewagner: I am not sure this would be limited to the polyfill. There is a reason LLVM IR has that information on each load and store (even the amount of alignment, e.g. a 4-byte load could be 2-byte aligned). Aren't there CPU instructions that are faster if pointers are aligned at runtime, and others that are slower, but that don't care between aligned and unaligned loads and stores? I believe @sunfishcode was telling me about those, but also possible I misunderstood. If I did get that right, it seems like this information in wasm would allow us to emit better code in the native impl, not just the polyfill.

kg · 2015-05-04T22:58:35Z

It's my understanding that movaps on x86 is still faster than movups (and the equivalents for other types), though modern chips are a lot better at the unaligned operations.

A quick search showed discussions as recently as two months ago about the performance impact of unaligned operations vs aligned ones (like https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51017 ).

I can believe that we eventually would regret having aligned/unaligned operations in this spec, but the cost of them is pretty minimal - they become aliases, so we're just eating two opcodes, and they're very simple aliases. Since we intend to have a table that maps named opcodes to integers, applications that don't use the unaligned stores won't pay any actual cost for their existence. Essentially, the burden of implementing these lies in the polyfill, along with some implementation effort in a compiler if you decide to use them. If you don't care about making unaligned ops work consistently, your compiler can just always emit aligned stores.

lukewagner · 2015-05-04T23:43:11Z

Yes, movaps/movups discussion came up (a lot) in the context of SIMD.js. To wit, Dart and Odin both emit movups now so having a separate loadUnaligned wouldn't buy anything unless there was a strong assumption that emitters of loadAligned really expected the load to be aligned (b/c movaps means trap handler on unaligned access).

That being said, Dan reminded me that the old-ARM-fault-on-unaligned case is symmetric with the polyfill case: you don't want to emit byte-loads because this is suboptimal on x86/new-ARM but you don't want to run 100x slower on old-ARM. So that at least gives us a non-polyfill use case (even if it also is diminishing over time) which makes me more positive on having a loadUnaligned.

(On a bikeshed-y note: since the assumed-aligned loads wouldn't throw on unaligned, I think it'd be better to just name them 'load'.)

kg · 2015-05-04T23:47:51Z

Yeah, I don't have any particular attachment to 'loadAligned' or anything, just the distinction.

I don't think there's any reason for them to have different semantics on native - both loadAligned and loadUnaligned generating movups is fine. My intent here is focused on behavior, not performance - using movups everywhere isn't observably different from using movaps for aligned loads & movups for unaligned loads, as long as you do it correctly. Having the webasm failure case for 'you did an aligned load from an unaligned address' be undefined behavior (masking, etc) instead of an invalid instruction trap or whatever is pretty sensible too, I think.

The performance argument re: ARM is interesting though. I was under the impression old-ARM wasn't considered important anymore, but maybe it's shipping in more modern devices than I realized.

lukewagner · 2015-05-05T00:02:11Z

Just to be really clear: the proposal was never to have undefined behavior on unaligned access in the spec and, even with proposed new (load|store)Unaligned opcodes, if C++ code falls into UB which causes Emscripten to emit a load (aligned) opcodes, they will still be silently masked by the polyfill.

I think the best way to summarize the addition here is that we've added an op for the cases where you know you want to do unaligned access but you don't want to sacrifice perf on x86/new-ARM by emitting the byte loads that Emscripten does now. Basically, we push this choice from the developer-side compiler into the client-side compiler.

I already have a todo to update the heap section (move stuff from V1 into AstSemantics now that it exists, include discussions about out-of-bounds), so I can add a cset in the PR to make this change if that sounds good to you.

kg · 2015-05-05T00:03:37Z

That sounds good. Maybe the root of our disconnect here is that I'm assuming the polyfill is intended to be spec-compliant. It sounds like you're saying that UB on unaligned access was never intended to be spec-compliant, but the intent was for the polyfill to do that? In which case, updating the heap section of V1/astsemantics sounds sufficient, and adding an op for this should be more than enough to make it possible to get guaranteed correct behavior in both polyfill & native.

lukewagner · 2015-05-05T00:11:14Z

Yes, as Alon also pointed out, it was non-obvious that the polyfill's goals weren't to be 100% correct wrt the spec, but just "good enough to work for most use, as well as asm.js works today" misalignment/oob aren't really considered "working today". Alon's PR (which I'll merge after others have had a chance to comment) should fix that by explicitly including such text in the high-level goals.

jfbastien · 2015-05-05T00:45:51Z

FWIW I don't think we should target pre-v7 ARM. Alignment shouldn't be an issue in v7/v8 when the kernel is properly configured.

kripken · 2015-05-05T01:16:20Z

How about sparc and power hardware?

jfbastien · 2015-05-05T01:19:43Z

Endianness would be bigger issues there ;-)
But in their little-endian versions it would be an issue. Are they targets? I guess that's worth figuring out early.

kg · 2015-05-05T01:23:04Z

I seem to remember seeing some advertisements that indicate Imagination is trying to push MIPS now, too:
http://www.imgtec.com/mips/

sunfishcode · 2015-05-05T02:26:16Z

I like the plan discussed here. We shouldn't pessimize our favorite platforms to support armv6 and similar processors, but in this case it looks like it'll be quite easy and painless, so I think it's worth doing.

Also, SIMD use cases for unaligned accesses are mostly distinct from scalar use cases, so we should have a separate discussion when it comes to adding SIMD (post-v1 as it's currently scheduled).

kripken · 2015-05-05T02:45:15Z

Is there any benefit to doing the full LLVM thing where e.g. a 4-byte load can be 2-byte aligned, or just have "fully aligned" vs "fully unaligned" for each size?

lukewagner · 2015-05-05T14:07:39Z

Another potential use case for this I was wondering about (but no evidence yet; I can ping Intel/ARM contacts) is whether various new low-power chips (say x86 Quark) will "support" unaligned loads/stores, but very slowly, much slower than byte-loads + or. With the unending push for smaller devices (IoT... sorry, I had to say it), it does seem likely we'll regress on unaligned loads just as we did back when mobile started. Anyhow, argument by itself wouldn't stand, but it does serve to "pile on".

jfbastien · 2015-05-05T14:49:52Z

Agreed with @kripken: at this point it makes sense to let any load/store specify an alignment. If the load/store lied then it gets UB (which will be platform dependent).

If no alignment is specified then we assume natural alignment (and again UB on under-alignment).

lukewagner · 2015-05-05T15:02:52Z

Whoa there, I don't think we want any UB in the spec. The worst that happens if you lie about alignment is your loads/stores go way slower (b/c take faults or other slow paths). There is the separate issue that the polyfill will mask your aligned loads, but this is being treated as a performance cheat in the polyfill (one that can be turned off, at the cost of running slower), not UB in the spec.

sunfishcode · 2015-05-05T15:10:22Z

The benefit at the WebAssembly level of specifying a specific alignment would be on platforms that require alignment and on the asm.js polyfill that e.g. 2-byte alignment on a 4-byte load can be implemented by 2 2-byte loads, rather than 4 1-byte loads.

That's a nice benefit, though it's not huge. On the other hand, the cost is just a few more variants of loads to define, which is a mild burden, but a pretty small one beyond what we already have just by having the basic unaligned access constructs. I'm ok either way.

lukewagner · 2015-05-05T15:33:59Z

How common is it to have a non-natural-word-aligned load/store in LLVM? Does Emscripten currently optimize such loads/stores (e.g., into 2 2-bytes)?

titzer · 2015-05-05T15:38:43Z

I agree that we should avoid UB like the plague. There is a philosophical
question maybe underlying this discussion, which is, are there performance
cliffs in the platform? Given the choice between performance cliffs and
behavior cliffs, I'll take the performance cliffs. Those can be mitigated
with tooling and optimization, whereas you can't really fix UB after it's
there, since there will be implementations that "conform" to the old spec
and do random things.

On Tue, May 5, 2015 at 5:02 PM, Luke Wagner [email protected]
wrote:

Whoa there, I don't think we want any UB in the spec. The worst that
happens if you lie about alignment is your loads/stores go way slower (b/c
take faults or other slow paths). There is the separate issue that the
polyfill will mask your aligned loads, but this is being treated as a
performance cheat in the polyfill (one that can be turned off, at the cost
of running slower), not UB in the spec.

—
Reply to this email directly or view it on GitHub
WebAssembly/spec#30 (comment).

kripken · 2015-05-05T15:51:44Z

Does Emscripten currently optimize such loads/stores (e.g., into 2 2-bytes)?

Yes, Emscripten will use the highest alignment LLVM says is ok to use, which can be more than 1 but less than the full size of the load/store.

I don't know if it's common. I suppose it depends on how often people use packed structs...

lukewagner · 2015-05-05T16:18:20Z

It's hard to have a strong opinion (on this sub-natural alignment sub-issue): we have the general goal of minimalism on the one hand and the simplicity/obviousness of the ops on the other hand and pretty mild use cases.

jfbastien · 2015-05-05T16:35:07Z

How slow of an implementation is acceptable? A faulting memory access can be handled with a signal handler by the implementation :-)

If we ship aligned{load,store} and unaligned{load,store} then there is implementation-defined behavior when an aligned{load,store} isn't actually aligned!

That's why I would rather have developer-supplied alignment specification, with implementation-defined behavior when the developer lied. We can decide to align to a natural boundary (some HW does this automtically), or catch and handle the fault on HW that faults, and on other HW it'll just be slower. I think mandating one or the other will lead to regret as we deploy wasm on other ISAs.

lukewagner · 2015-05-05T16:45:30Z

@jfbastien Yes, on the few archs that fault, the intention is that the faulting access is handled by a signal handler. The only difference between aligned/unaligned is (from a spec pov) non-normative: the unaligned shouldn't go 100x slower if you feed it unaligned addresses.

jfbastien · 2015-05-05T16:49:28Z

That would also be doable with developer-specified alignment on every load/store. Furthermore (this one is for @titzer) having better alignment information is wonderful for auto-vectorization (which is definitely not something we aim to do any time soon).

titzer · 2015-05-05T16:51:37Z

I'm not against specifying alignment; it might actually be really useful to
the optimizer. The question is, as you pointed out, what happens when they
lied? Performance punishment or semantic punishment?

On Tue, May 5, 2015 at 6:49 PM, JF Bastien [email protected] wrote:

That would also be doable with developer-specified alignment on every
load/store. Furthermore (this one is for @titzer
https://github.com/titzer) having better alignment information is
wonderful for auto-vectorization (which is definitely not something we
aim to do any time soon).

—
Reply to this email directly or view it on GitHub
WebAssembly/spec#30 (comment).

jfbastien · 2015-05-05T17:03:33Z

I'd go with implementation-defined for now, with a list of acceptable effects. Tentatively:

Slow
Very slow
Force-align to what was promised
Force-align to something
Non-recoverable fault
Recoverable fault (when we have signal handling)

We can use this to later constrain what's an acceptable implementation! Leave it open for now, and once we have implementation experience and developer feedback we'll know what the right thing to do is.

lukewagner · 2015-05-05T17:03:40Z

@titzer That is the central question, but I think the answer has to be "punish performance". Why? Because if we say "punish semantics" that means "throw". That means that on platforms that have no problem with unaligned loads/stores (x86, new ARM), we'd have to add extra guards unless we specified the behavior to be non-deterministic (which sounds bad). With "punish performance", you only get the perf cliff if you perform an unaligned load with a load-aligned and the "fix" is to use load-unaligned. The devtools profiler should point a big red arrow right to this and this is C++ UB territory anyway, so I don't anticipate this being a real problem in practice.

lukewagner · 2015-05-05T17:05:47Z

@jfbastien Shrinking the set of allowable behaviors is not a backwards compatible operation. I think we should start with fully deterministic since it's not even clear if the platforms that fault on unaligned loads matter.

jfbastien · 2015-05-05T17:07:43Z

It is backward compatible for developers: they were told one of these could occur, so they can't rely on a specific one occurring.

lukewagner · 2015-05-05T17:15:17Z

Ben has already stated why this isn't simply the case. Also, I think we have an (implicit, though perhaps it should be made explicit) high-level goal of minimizing non-determinacy. Given that this alignment issue isn't even an issue on all major platforms, this seems like a clear case for fully-deterministic.

titzer · 2015-05-05T18:45:30Z

My original point is that punishing semantics, with either "undefined
behavior" or an immediate exception is kind of a one-way street. Early
implementations might exploit the undefined behavior and then there's a
defacto precedent that can't really be fixed, unless we add a new bytecode
set with different semantics.

On Tue, May 5, 2015 at 7:15 PM, Luke Wagner [email protected]
wrote:

Ben has already stated why this isn't simply the case. Also, I think we
have an (implicit, though perhaps it should be made explicit) high-level
goal of minimizing non-determinacy. Given that this alignment issue
isn't even an issue on all major platforms, this seems like a clear case
for fully-deterministic.

—
Reply to this email directly or view it on GitHub
WebAssembly/spec#30 (comment).

jfbastien · 2015-05-05T19:21:50Z

IIUC I'm proposing to embrace implementation-defined, you don't want that, but the current proposal has it (developer says access is aligned, and it isn't). Is that correct?

We should have a wider discussion about implementation-defined behavior, so I'll be happy if we circle back to this specific issue, but I definitely want to understand what you're proposing we do!

lukewagner · 2015-05-05T19:52:08Z

Again: the current proposal does not have implementation-defined behavior: load and loadUnaligned would be semantically identical (performance is not part of the specification). We can have sections explaining perf cliffs etc, but they'd be non-normative.

jfbastien · 2015-05-05T20:24:42Z

OK. I'm not sure I want us to standardize these separate aligned/unaligned opcodes then. If they're the same then they should be the same. It's almost like adding branch or inlining hints.

lukewagner · 2015-05-05T21:20:37Z

There are use cases which they address (I won't restate them again here), so it would be more effective to argue why the use cases don't matter or how there is a better solution.

sunfishcode · 2015-05-09T09:01:46Z

#34 added alignment attributes to the load and store operands. I believe that addresses the concern here, so we can close this issue. Feel free to file new issues if there are further concerns.

lukewagner mentioned this issue May 5, 2015

Import more recent discussion on heap accesses #34

Merged

sunfishcode closed this as completed May 9, 2015

sunfishcode mentioned this issue Jun 3, 2015

Should WebAssembly have spooky action at a distance? #107

Closed

jfbastien mentioned this issue Mar 31, 2016

Appropriateness for wasm tc39/proposal-ecmascript-sharedmem#59

Closed

MaxGraey mentioned this issue Nov 7, 2018

Add DataView to standard library AssemblyScript/assemblyscript#316

Merged

2 tasks

Alignment / undefined behavior for loads & stores #30

Alignment / undefined behavior for loads & stores #30

Comments

kg commented May 4, 2015

kripken commented May 4, 2015

Uh oh!

lukewagner commented May 4, 2015

Uh oh!

kg commented May 4, 2015

Uh oh!

kg commented May 4, 2015

Uh oh!

lukewagner commented May 4, 2015

Uh oh!

kripken commented May 4, 2015

Uh oh!

kg commented May 4, 2015

Uh oh!

lukewagner commented May 4, 2015

Uh oh!

kg commented May 4, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

kg commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

kripken commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

kg commented May 5, 2015

Uh oh!

sunfishcode commented May 5, 2015

Uh oh!

kripken commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

sunfishcode commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

titzer commented May 5, 2015

Uh oh!

kripken commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

titzer commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!

titzer commented May 5, 2015

Uh oh!

jfbastien commented May 5, 2015

Uh oh!

lukewagner commented May 5, 2015

Uh oh!