Skip to content

Alignment will probably require implementation-defined behavior #105

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
titzer opened this issue Jun 2, 2015 · 68 comments
Closed

Alignment will probably require implementation-defined behavior #105

titzer opened this issue Jun 2, 2015 · 68 comments
Milestone

Comments

@titzer
Copy link

titzer commented Jun 2, 2015

It seems that some ARM implementations may ignore the low order bits of unaligned memory accesses and thus round down to the next aligned address. That would mean that every access that the engine cannot prove is properly aligned would need a dynamic check (since these processors won't cause a hardware fault). That may be too slow or too much code.

Would it be reasonable to spec aligned/unaligned accesses thusly?

  • All accesses require alignment to be specified.
  • Load/Store[aligned=true] have implementation-defined behavior when the offset is not actually aligned.
  • Load/Store[aligned=unknown] never have implementation-defined behavior, but may be slow on some architectures when the offset is not actually aligned.

For both kinds of accesses we could specify a sanitizer mode that will trap on Load/Store[aligned=true](actually not aligned) and profile or warn on Load/Store[aligned=unknown](actually not aligned).

The above would allow the engine to omit checks for the [aligned=true] case, accepting whatever the hardware does, but still require it to emit checks for [aligned=unknown] on these processors.

@sunfishcode
Copy link
Member

Is there any documentation available on these ARM architectures? I'm interested in learning more.

@kripken
Copy link
Member

kripken commented Jun 2, 2015

Me too. Specifically, I wonder if those ARM implementations just silently do the rounding (that would be exactly what JS typed arrays do, ironically :) ? Or do they trap?

@kg
Copy link
Contributor

kg commented Jun 2, 2015

I seem to remember your proposal roughly being the consensus from prior discussions. Obligatory aligned/unaligned distinction, with unaligned operatoins Always Working but possibly being slow, and aligned-with-unaligned-address being potentially undefined seems good to me, albeit a little gross.

That distinction is already really important for the polyfill to be remotely usable without breaking applications that do unaligned loads/stores.

The last time I shipped ARM code (on a particular handheld console), it trapped on unaligned accesses in some scenarios (non-32-bit load/store) and was Just Slow in other cases. I think in some cases you can configure the behavior, so it might depend on the OS/host application and not just the hardware.

@jfbastien
Copy link
Member

I thought we had agreed to have explicit alignment to a specific byte number (not just true/unknown). The rest is what I recall: if the program lied then implementation-defined behavior occurs.

I wouldn't spec the sanitizers: they can either be done by the developer-side compiler, or by the implementation (maybe behind a flag). I see sanitizers as tools that should "just work", so there's no need to spec them.

The ARM specs aren't accessible publicly, but you can get the PDF for free by registering. This behavior, IIRC, is pre-ARMv7 and in some R and M profile CPUs. Most ARM CPUs sold in consumer devices recently are ARMv7 A profile, or ARMv8, but it would be nice for Web Assembly to work on these other CPUs which are often used in smaller IoT devices (you know we want Web Assembly to be IoT compliant!!!).

@titzer
Copy link
Author

titzer commented Jun 2, 2015

Here's a link to a section in the ARM architecture reference manual:

https://books.google.de/books?id=O5G-6WX1xWsC&pg=PT57&lpg=PT57&dq=unaligned+access+on+arm+ignore+lower+bits&source=bl&ots=_d6f1Osah6&sig=RO95auOcu78sxqzgsHY4KpmEwxE&hl=en&sa=X&ei=tOttVYHFC-XuyQOozICgCQ&ved=0CCEQ6AEwAA#v=onepage&q=unaligned%20access%20on%20arm%20ignore%20lower%20bits&f=false

On Tue, Jun 2, 2015 at 7:02 PM, Dan Gohman [email protected] wrote:

Is there any documentation available on these ARM architectures? I'm
interested in learning more.


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@sunfishcode
Copy link
Member

By my reading of the documentation:

ARMv5 and earlier have the alignment-rounding problem.

ARMv6 has multiple configuration modes. The "Legacy" mode behaves like ARMv5. However, many popular ARMv6 implementations, such as Linux on Raspberry Pi, seem to use one the newer modes that don't have the problem.

In ARMv7 and ARMv8, documentation I have says that the "Legacy" configuration mode is no longer present, and they don't have the problem.

Assuming I didn't miss anything, this appears to come down to a question of the limits of portability (#38). Is ARMv5 or ARMv6-in-legacy-mode worth supporting, at the cost of weakening the spec wrt alignment?

@pizlonator
Copy link
Contributor

Thanks for summarizing this!

ARMv5 is pretty old. I think we'd have to have a super good argument in its favor if we wanted to complicate the spec with it.

-Fil

On Jun 2, 2015, at 12:52 PM, Dan Gohman [email protected] wrote:

By my reading of the documentation:

ARMv5 and earlier have the alignment-rounding problem.

ARMv6 has multiple configuration modes. The "Legacy" mode behaves like ARMv5. However, many popular ARMv6 implementations, such as Linux on Raspberry Pi, seem to use one the newer modes that don't have the problem.

In ARMv7 and ARMv8, documentation I have says that the "Legacy" configuration mode is no longer present, and they don't have the problem.

Assuming I didn't miss anything, this appears to come down to a question of the limits of portability (#38). Is ARMv5 or ARMv6-in-legacy-mode worth supporting, at the cost of weakening the spec wrt alignment?


Reply to this email directly or view it on GitHub.

@MikeHolman
Copy link
Member

For us, only ARMv7 THUMB/THUMB2 matter. Of course we aren't in a vacuum so I'm fine making concessions where necessary, but it doesn't sound like ARMv5/legacy mode is important enough to weaken the spec.

@titzer
Copy link
Author

titzer commented Jun 3, 2015

Good catch, Dan.

I also just verified that the arm64 specification only requires alignment
for ordered and exclusive loads and stores; others are fine to be
unaligned. The processor does have a strict alignment checking mode that
will trap on unaligned accesses, so it's got that going for it, which is
nice.

V8 cares about architectures in roughly this order: X64, ia32, arm, arm64,
mips, mips64, ppc.

I'll do some digging into those few at the end and see if there are any
issues with alignment that impact this.

On Tue, Jun 2, 2015 at 9:52 PM, Dan Gohman [email protected] wrote:

By my reading of the documentation:

ARMv5 and earlier have the alignment-rounding problem.

ARMv6 has multiple configuration modes. The "Legacy" mode behaves like
ARMv5. However, many popular ARMv6 implementations, such as Linux on
Raspberry Pi, seem to use one the newer modes that don't have the problem.

In ARMv7 and ARMv8, documentation I have says that the "Legacy"
configuration mode is no longer present, and they don't have the problem.

Assuming I didn't miss anything, this appears to come down to a question
of the limits of portability (#38
WebAssembly/spec#38). Is ARMv5 or
ARMv6-in-legacy-mode worth supporting, at the cost of weakening the spec
wrt alignment?


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@titzer
Copy link
Author

titzer commented Jun 4, 2015

I've checked with some MIPS and PPC experts and the result is this: no problem on PPC (should be Intel-fast), and MIPS cores trap to kernel for emulation, but chips are coming that just do it in hardware. So it looks like we're all good if we make the reasonable decision to ignore 10 year old arm cores. I'll double check with the folks at ARM, though.

@jfbastien
Copy link
Member

@titzer it's not just older ARM core: it's low-power / embedded ones too. I've talked to folks running node.js on tiny chips inside lightbulbs, do we care about this type of user? To which degree?

I'm probably OK saying: we expect fully compliant Web Assembly implementations to have behavior X, but some not-too-compliant implementations could do Y.

I'd rather not ban this behavior outright because I think the usecase matters. It would be nice to have a compliance suite, and implementations can list how they diverge from the spec. When it's "benign" divergences like this I think it's fine.

@kripken
Copy link
Member

kripken commented Jun 4, 2015

Would those older ARM cores and tiny low-power embedded chips have larger divergences from "normal" behavior than the polyfill will? Given wasm code that properly annotates the alignment of loads and stores (never says they are aligned when they aren't), both those chips and the polyfill will perform properly, is my understanding correct?

@titzer
Copy link
Author

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 8:10 PM, Alon Zakai [email protected] wrote:

Would those older ARM cores and tiny low-power embedded chips have larger
divergences from "normal" behavior than the polyfill will? Given wasm code
that properly annotates the alignment of loads and stores (never says they
are aligned when they aren't), both those chips and the polyfill will
perform properly, is my understanding correct?

Cores that drop the lower bits from unaligned accesses will require checks
inserted by the wasm engine, with emulation code done in user land. All
code on those cores pays, even if they always stay aligned.

Cores that trap will go to the kernel and the user program only pays when
they actually go unaligned.


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@sunfishcode
Copy link
Member

@jfbastien Can you be more specific about which models of ARM cores these are? I've checked ARMv7-R and ARMv7-M documentation and both are ok here.

@sunfishcode
Copy link
Member

Looks like ARMv6-M is good too.

@kripken
Copy link
Member

kripken commented Jun 4, 2015

@titzer: not sure I follow? If a load/store is marked as aligned, then it doesn't need to pay any cost, does it? The VM can emit an aligned access, and if the code lied and it turns out unaligned, it's ok that it drops the lower bits - just like the polyfill does.

And if the load/store is marked as unaligned, then a slow path would be taken, definitely paying a cost, but likewise, around the same as the polyfill pays. And in practice we hope little code would be marked as unaligned, so both polyfill and older/smaller CPUs would be ok.

I feel like the older/smaller CPU case is very similar to the polyfill, overall. Am I missing something?

@titzer
Copy link
Author

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 8:51 PM, Alon Zakai [email protected] wrote:

@titzer https://github.com/titzer: not sure I follow? If a load/store
is marked as aligned, then it doesn't need to pay any cost, does it? The VM
can emit an aligned access, and if the code lied and it turns out
unaligned, it's ok that it drops the lower bits - just like the polyfill
does.

And if the load/store is marked as unaligned, then a slow path would be
taken, definitely paying a cost, but likewise, around the same as the
polyfill pays. And in practice we hope little code would be marked as
unaligned, so both polyfill and older/smaller CPUs would be ok.

That's OK; marking unaligned accesses is a kind of opt-in to may-be-slow.

I feel like the older/smaller CPU case is very similar to the polyfill,
overall. Am I missing something?

You are requiring masking for aligned accesses. See first post. I was
assuming that aligned accesses would not be masked.


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@kripken
Copy link
Member

kripken commented Jun 4, 2015

I still don't understand why a claimed-aligned access would require a mask. Why not just emit an access without a mask, on these old/small CPUs? (It might silently drop some bits, but that's what the mask would have done anyhow?)

@titzer
Copy link
Author

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 9:03 PM, Alon Zakai [email protected] wrote:

I still don't understand why a claimed-aligned access would require a
mask. Why not just emit an access without a mask, on these old/small CPUs?
(It might silently drop some bits, but that's what the mask would have done
anyhow?)

Because on Intel and processors that support unaligned access properly, it
will read/write unaligned memory, and you will get different results than
on these older CPUs, or if you had dropped the lower bits in the engine
with a mask.


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@sunfishcode
Copy link
Member

We specifically don't want to be bound by present-day limitations of JS semantics in the long term, so we don't want to get too accustomed to saying "the polyfill did XYZ, so it's ok if other implementations do that too".

@kripken
Copy link
Member

kripken commented Jun 4, 2015

@titzer: Yes, but that is exactly as in the polyfill, and we allow it, don't we?

I may have a big misunderstanding here. I was under the impression that if one lied about alignment, claiming it was aligned when it wasn't, then we said that was not fully specified. And the polyfill would then be free to do the "wrong" thing by dropping the lower bits, thus letting it remain fast (otherwise, each load would need to support the case of it being unaligned). In practice, this is fine because the compiler should know what is aligned and what might not be, and we can mark the rare loads which might not be, as unaligned. But 99% of them would be aligned, and fast in the polyfill, and correct in the polyfill.

Did I get that wrong? Are we not saying that claiming alignment but lying leads to implementation-defined behavior?

@kripken
Copy link
Member

kripken commented Jun 4, 2015

@sunfishcode: I 100% agree. I wasn't saying that the polyfill does it so it's fine. I am saying that I understood what the polyfill did to be fine because of reason X, and that reason X is valid in itself, and it looks like X applies to old/weak CPUs too. Unless have I misunderstood X all this time?

@titzer
Copy link
Author

titzer commented Jun 4, 2015

Just in case it wasn't clear from start, the goal here was:

1.) If you promise an access is aligned, and it is, you pay nothing, not
even a mask.
2.) If you promise an access is aligned and you lied, you get something
strange (not nasal demons, but maybe slow, maybe a trap, or maybe you get
forcibly aligned).
2b.) In sanitizer mode, if you promise an access is aligned and you lied,
you get a trap.
3.) If you said an access is unaligned, it will work on all engines and
give you the exact same results. It might be really slow, though.

On Thu, Jun 4, 2015 at 9:22 PM, Dan Gohman [email protected] wrote:

We specifically don't want to be bound by present-day limitations of JS
semantics in the long term, so we don't want to get too accustomed to
saying "the polyfill did XYZ, so it's ok if other implementations do that
too".


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@kripken
Copy link
Member

kripken commented Jun 4, 2015

@titzer: Yes! :) And is not (2) covered by emitting a load without a mask on those old/small CPUs? You get a forcibly aligned result, which is one of the options you listed.

That's all I've been saying here: aligned loads/stores do not need masks in the polyfill nor on old/small CPUs, assuming those CPUs just ignore the lower bits. So both can be fast on aligned code, and also correct if actually aligned, so they are quite similar in that respect.

(edit: by "masks in the polyfill" i mean "written in the JS code". While of course the VM must emit a mask, because it is JS and has precise semantics. But if the underlying CPU were a weak/old one which itself drops the lower bits and force-aligns, then the VM could actually avoid that, as if the hardware were specialized for typed arrays being aligned ;)

@titzer
Copy link
Author

titzer commented Jun 4, 2015

On Thu, Jun 4, 2015 at 9:30 PM, Alon Zakai [email protected] wrote:

@titzer https://github.com/titzer: Yes! :) And is not (2) covered by
emitting a load without a mask on those old/small CPUs? You get a
forcibly aligned result, which is one of the options you listed.

That's all I've been saying here: aligned loads/stores do not need masks
in the polyfill nor on old/small CPUs, assuming those CPUs just ignore the
lower bits. So both can be fast on aligned code, and also correct if
actually aligned, so they are quite similar in that respect.

Yes, I realized that on a closer reading of your comments that we're
basically in agreement. That does mean that we do have
implementation-defined behavior for that [aligned=true]/lied case, which
actually I was kind of hoping we could find a way around.

Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@sunfishcode
Copy link
Member

The other side here is that we have yet to actually name a CPU here which we really care about which actually needs implementation-defined behavior. Unless this changes, it'd be great to just stick with our current rules, which don't have the implementation-defined behavior part.

@kripken
Copy link
Member

kripken commented Jun 4, 2015

@titzer: Ok, good, now I think we are on the same page.

Given

  1. The rarity of the problem, as supported by both theoretical arguments (unaligned is undefined behavior in C/C++) and practical experience (many codebases ported to typed array semantics, almost no issues; and sanitizer tools fix the few that do),
  2. As @jfbastien says, tiny CPUs exist, not just old ones,
  3. The polyfill will not just matter for a few months but for a very long time.

Then in practice, what difference does it make if [aligned=true]/lied is described as implementation-defined behavior, or not? It seems a philosophical point. Regardless of how we call it, those tiny CPUs and the polyfill will still be able to run wasm codebases just fine, and they will be used to run those codebases.

Is there a practical, concrete benefit to not calling this implementation-defined behavior?

@sunfishcode
Copy link
Member

Every bit of implementation-specific behavior we add is an opportunity for applications to behave differently across different implementations. I'm not opposed to all implementation-specific behavior, but it'd be nice if someone could name something more interesting than ARMv5 before we accept it here.

@titzer
Copy link
Author

titzer commented Jun 4, 2015

Actually a second round with MIPS folks was less promising. Apparently some
devices ship with a mode where unaligned accesses aren't handled by the
kernel and they cause a bus error; that hurts and puts the engine back in
the emulating the unaligned access in userland situation. They were also
pretty uncomfortable with the performance penalty. Masking might be the
best option on those processors. I asked ARM for some clarification about
how prevalent ARM chips with the bit-ignoring behavior is; waiting to hear
back.

I'm not clear on why we want an alignment annotation if it doesn't make any
semantic difference; if [aligned=true]/lied gives exactly the same results
as [aligned=false]/not_aligned, then why have it? Is it just to make the
latter case fast by always emulating it in userland to avoid kernel traps
on crappy hardware?

On Thu, Jun 4, 2015 at 9:57 PM, Dan Gohman [email protected] wrote:

Every bit of implementation-specific behavior we add is an opportunity for
applications to behave differently across different implementations. I'm
not opposed to all implementation-specific behavior, but it'd be nice if
someone could name something more interesting than ARMv5 before we
accept it here.


Reply to this email directly or view it on GitHub
WebAssembly/spec#105 (comment).

@jfbastien
Copy link
Member

The worry on these platform is that regular accesses either need to be split up into byte accesses and then merged, or signal handling must be used. This isn't a "pay for what you use" approach to performance: you may have no unaligned accesses and performance will suffer, or you'll need to use a signal handler which folks have said they don't want to mandate. See the Linux MIPS docs for details.

@lukewagner
Copy link
Member

@jfbastien Yes, but what is the nondeterminism buying us in those cases? If you have to branch on misaligned access anyway then you can just as well implement Just Works as something else nondeterministic. The only case I can see nondeterminism buying something is for auto-aligning platforms which would not otherwise have to branch. Is this the MIPS use case?

@lukewagner
Copy link
Member

... and that is just from the performance perspective. From the perspective of "I want apps that run on other platforms correctly to also run on my auto-aligning platform correctly", then you don't want to be the one oddball platform that auto-aligns; of course apps are going to randomly break for you. That's why I was saying above (and iiuc @pizlonator was also saying) that, even if nondeterminism was a choice, I'd still want to implement Just Works semantics just to minimize bustage.

@pizlonator
Copy link
Contributor

Do we have data on what the penalty for misaligned-accesses-do-weird-things platforms will be, if we require misaligned accesses to just work, but then also roll up our sleeves and actually optimize that case? I’ve been pondering this a bit. If you have profiling that tells you what the low bits of a pointer tend to look like, then you can emit optimized code that is biased for either aligned or misaligned, and you could even speculate that the pointer was already aligned which allows you to blow away repeated alignment checks on that pointer - and probably alignment checks on most pointers derived from that one, if the derivatives are just “ptr + C” where C is a multiple of the appropriate word size.

Since we probably do not have such data, it seems we have the following to choose from, and the following mitigations in a subsequent version if the performance isn’t good enough:
1) MVP only has access modes that Just Work when misaligned, old ARM and MIPS be damned. Future versions introduce new access modes, which allow for better performance on old ARM and MIPS.
2) MVP only has access modes that Trap when misaligned, x86 and ARM64 be damned. Future versions introduce new access modes, which allow for better performance on x86.
3) MVP only has access modes that are undef when misaligned. Future versions nail down the undef to mean either “Just Work” or “Trap”, depending on our empirical findings.

I prefer (1) because it’s the most forward-looking. I like (2) more than (3) because undef has a high likelihood of causing confusion for developers.

-Filip

On Jul 21, 2015, at 1:04 PM, Luke Wagner [email protected] wrote:

... and that is just from the performance perspective. From the perspective of "I want apps that run on other platforms correctly to also run on my auto-aligning platform correctly", then you don't want to be the one oddball platform that auto-aligns; of course apps are going to randomly break for you. That's why I was saying above (and iiuc @pizlonator https://github.com/pizlonator was also saying) that, even if nondeterminism was a choice, I'd still want to implement Just Works semantics just to minimize bustage.


Reply to this email directly or view it on GitHub #105 (comment).

@sunfishcode
Copy link
Member

C/C++ developers can also catch misaligned accesses by using UBSan (aka -fsanitize=undefined) (clang, GCC).

@sunfishcode
Copy link
Member

I agree with what's said above; nondeterminism in anything other than trapping-or-not doesn't help much because it just converts applications that were slow on said architectures to applications that behave wrong on the same architectures.

I still believe "it's nondeterministic whether misaligned accesses trap" (misaligned means dynamic alignment is less than static alignment) is worth considering if we can't do "everything always just works". Implementations on MIPS/etc. might then choose to have two modes, "fast" (traps) and "slow" (branches). "fast" could be the default, and when a program traps (which should be rare), the implementation could (for example) automatically restart the program, blacklisting it to "slow" mode thereafter (for example). Blessing this in the spec means that spec conformance can remain something which is done by default. And this approach would mean that there's no mandate to catch and handle signals, and it would permit "pay for what you use", addressing two of @jfbastien's concerns above.

ARMv5 would just have to do "slow" mode, but there's a fair amount of agreement here that ARMv5 is old and not worth complicating the spec for.

@sunfishcode sunfishcode added this to the MVP milestone Jul 28, 2015
@titzer
Copy link
Author

titzer commented Jul 28, 2015

The other important implementation that does masking (i.e. forcible
alignment) is the polyfill to asm.js. If we go with "always works", then
the polyfill is going to be incorrect for misaligned accesses. How strongly
do we value the correctness of the polyfill? Or conversely, how specially
do we treat the polyfill in comparison to any other implementation? When a
spec comes, will we need to add special exceptions for it, or will it
remain spec incompliant?

On Tue, Jul 28, 2015 at 4:36 AM, Dan Gohman [email protected]
wrote:

I agree with what's said above; nondeterminism in anything other than
trapping-or-not doesn't help much because it just converts applications
that were slow on said architectures to applications that behave wrong on
the same architectures.

I still believe "it's nondeterministic whether misaligned accesses trap"
(misaligned means dynamic alignment is less than static alignment) is worth
considering if we can't do "everything always just works". Implementations
on MIPS/etc. might then choose to have two modes, "fast" (traps) and "slow"
(branches). "fast" could be the default, and when a program traps (which
should be rare), the implementation could (for example) automatically
restart the program, blacklisting it to "slow" mode thereafter (for
example). Blessing this in the spec means that spec conformance can remain
something which is done by default. And This approach would mean that
there's no mandate to catch and handle signals, and it would permit "pay
for what you use", addressing two of @jfbastien
https://github.com/jfbastien's concerns above.

ARMv5 would just have to do "slow" mode, but there's a fair amount of
agreement here that ARMv5 is old and not worth complicating the spec for.


Reply to this email directly or view it on GitHub
#105 (comment).

@sunfishcode
Copy link
Member

There is a plan for the polyfill. It's a little awkward, but it's an attempt at a practical strategy to break with JS semantics in certain key areas.

If an implementor is thinking "the polyfill masks addresses, so why shouldn't I do it too?", we'll remind them that any time the polyfill's alignment masking actually affects anything, then the program doesn't work right under the polyfill. "Program doesn't work right" isn't something that we anticipate implementors should need to emulate [0].

[0] And we aren't worried about programs coming to depend on the polyfill semantics either, because we already know that popular native wasm implementations won't be masking.

@paul99
Copy link

paul99 commented Jul 29, 2015

@Tizer asked me to comment here, I work at MIPS/Imgtec on V8.

As discussed above, existing MIPS cores trap on unaligned accesses. Any remotely modern kernel will fixup the un-aligned load/store (same result as x86). It just works, but these accesses are slow.

Newer cores (in development) will support unaligned accesses in hardware.

Of course, code that claims [aligned=true] but lies could tank performance.

Detection and deoptimization to safe accesses would be trivial with a signal handler (though we have avoided those due to concerns with sandboxing, etc.) There are pure software methods discussed by others above.

So MIPS does not introduce indeterminism, and the performance impact of 'Just Work when misaligned' can be mitigated over time.

The debug-mode dev tool support would be excellent.

@lukewagner
Copy link
Member

@paul99 Just to be clear, though: on all the MIPS archs you're considering, it is possible to trap on unaligned access in user-mode? That would make the MIPS case equivalent to the slow-ARM case we've already been considering.

@sunfishcode's comment suggests a hybrid solution that doesn't require any semantically-visible modes: the engine optimistically compiles with trap-on-misaligned and, after a significant number of traps, recompiles into branching (dynamically, swapping out on-stack or, if nothing else, between turns of the event loop).

@jfbastien
Copy link
Member

@lukewagner IIUC you don't need to trap in usermode, the kernel traps and fixes up the access for you and usermode goes on without knowing about this.

@lukewagner
Copy link
Member

@jfbastien I realize that, but I was asking if it was possible since that enables several of the things we've been talking about.

@sunfishcode
Copy link
Member

On Linux MIPS, according to the docs linked to above, a process can easily chose which it wants.

@paul99
Copy link

paul99 commented Jul 29, 2015

@lukewagner Yes, its possible to disable the kernel fixups, and then install signal handler to catch the alignment errors in user-mode. (As @sunfishcode just said :)

@lukewagner
Copy link
Member

Ok, thanks. So given all the above, I'm still not seeing how nondeterministic fault-on-misaligned access would help out MIPS here. What is the desired codegen?

@paul99
Copy link

paul99 commented Jul 29, 2015

I may be missing your point, but I don't see any nondeterminism here. Code known to be unaligned could use byte accesses and construct the larger words. Code presumed to be aligned but with rare unaligned accesses would just work. Code with frequent unaligned accesses would also work, but would have terrible performance. I would like to detect that case and fall back to byte accesses.

I don't see this as essential for MVP, but desirable in the longer term, as we see how rare or common these unaligned accesses are. I think this mostly agrees with Dan's #105 (comment)

@lukewagner
Copy link
Member

@paul99 The big question being discussed in this issue is whether we should weaken the specified semantics of loads/stores from always Just Working to possibly faulting (i.e., the wasm app crashes). It sounds like that's not what you're asking for, though, which is good.

@lukewagner
Copy link
Member

Actually, it's possible I misread you. When you say "I would like to detect that case and fall back to byte accesses.", I assumed you meant "dynamically and transparently". That is, you'd somehow (user-mode signal handler? perf counter?) detect a lot of this misaligned access going on and then generate a new version of code that uses byte accesses and swap in this new code. Is that what you meant?

@paul99
Copy link

paul99 commented Jul 29, 2015

I assumed you meant "dynamically and transparently". That is, you'd somehow (user-mode signal handler? perf counter?) detect a lot of this misaligned access going on and then generate a new version of code that uses byte accesses and swap in this new code. Is that what you meant?

Yes, this is exactly what I meant.

From #105 (comment) there is statement "Obligatory aligned/unaligned distinction, with unaligned operations Always Working but possibly being slow".

The aligned/unaligned distinction seems very valuable for MIPS, where knowing ahead that accesses will be unaligned will let us generate reasonable code for that case, and only the 'promised aligned but lied' case would give us the perf hit (which we could dynamically detect and generate replacement code).

@lukewagner
Copy link
Member

@paul99 Great, then it also sounds like you're happy with the current state of the design.

Are there any more outstanding reasons to consider weaking the semantics of misaligned loads from Just Working or can we close this issue?

@paul99
Copy link

paul99 commented Jul 29, 2015

@lukewagner Now I see where we were misunderstanding each other.

In the spec I see no mention of All accesses require alignment to be specified. The discussions here seemed to all require a promise of the alignment intent, and then various methods for handling actual alignment differing from the promised alignment. Having such an attribute would be super helpful in generating the right code to start with for data that is a-priori known to be unaligned, and I had presumed that would exist.

Then the exceptions to that can be dealt with dynamically/transparently. All cases would just work, but the places that would tank performance would be far fewer.

But if there is no distinction on alignment, I would think it would be uniformly ignored, and no one would even notice the small performance hit on Intel, for example.

@lukewagner
Copy link
Member

@paul99 The first sentence of the alignment section says "Each linear memory access operation also has an immediate positive integer power of 2 alignment attribute." Is this not what you mean?

@paul99
Copy link

paul99 commented Jul 30, 2015

@lukewagner My apologies, I previously misread the first paragraph. I now see that the alignment attribute is precisely what I was looking for. The whole section sounds good to me as written. Sorry for the churn!

@lukewagner
Copy link
Member

No problem, it's good to talk through these issues to make sure we understand. Think we can close this issue @titzer?

@rodolph
Copy link

rodolph commented Aug 6, 2015

Hi,

@titzer asked me to comment for the ARM architecture, I work for ARM on V8. The current proposal works on all modern ARM cores: they either support unaligned accesses or mechanisms to enable emulation (with the OS support).

The beginning of this thread touches on the behaviour of older ARM cores. Those cores (ARMv5 and earlier) have a peculiar behaviour which would introduce non determinism with this proposal. For example for a word load, if the address is not aligned the loaded values is rotated 8 times by the value of bits[1:0] of the address. And those cores can either be little or big endian.

All those cores have been superseded a while ago so it may be acceptable to ignore those.

Regards,
Rodolph.

@lukewagner
Copy link
Member

Thanks for that information! So this also suggests we don't need any changes from what's in the spec. @titzer any other issues to consider before closing?

@titzer
Copy link
Author

titzer commented Aug 10, 2015

No, it sounds good. I'm OK with keeping the high road and going for that
misaligned accesses just work out, so we can close this issue.

I'll probably do some experiments when we've got some example workloads,
and will only bring this up again if it turns out to be a showstopper.

On Thu, Aug 6, 2015 at 7:13 PM, Luke Wagner [email protected]
wrote:

Thanks for that information! So this also suggests we don't need any
changes from what's in the spec. @titzer https://github.com/titzer any
other issues to consider before closing?


Reply to this email directly or view it on GitHub
#105 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

10 participants