Maximum alignment? #217

jfbastien · 2016-01-26T00:42:06Z

What's the maximum alignment for load/store?

We should probably specify something, e.g. log2(address_space_bytes), and test for it.

The text was updated successfully, but these errors were encountered:

mbodart · 2016-01-26T00:48:23Z

By "test for it", I assume you mean just a range check on the alignment value.

If I'm interpreting AstSemantics.md correctly, the alignment is just a performance hint, not a "trappable" guarantee. Is this a correct interpretation?

jfbastien · 2016-01-26T00:53:26Z

Correct, check it in the implementation, and have corresponding tests in the spec test suite.

Your interpretation is correct.

ghost · 2016-01-26T01:10:02Z

This came up before somewhere, and I recall some support for the page size being the maximum. If the minimum is one then this could be encoded in 4 bits. But perhaps 256 is also a reasonable limit, encoded in 3 bits, or even 16 encoded in 2 bits. The v8 encoding only has one bit for the alignment - the natural alignment flag.

kg · 2016-01-26T01:32:47Z

Don't bother worrying about bitpacking. Every stream compressor will do it for you. :-)

ghost · 2016-01-26T01:47:55Z

@kg That's not my experience. It helps the compressor to pack and avoid unnecessary redundancy. Even brotli only looks back one or two bytes. We should decide on the limits, the lower limit seems to be one and the upper limit at most the page size. We are only interested in powers of 2, so only encode the power of two. The gives fours bits at most. We might be able to do even better given knowledge of the operations natural alignment - brotli only looks at 6 prior bits and might not even notice the difference between the opcodes to focus the codes.

ghost · 2016-01-26T01:51:24Z

Sorry, if the lower limit is one, that would be 2^0 so four bits only gets to 2^15 which is half of the wasm page size. Similar error in the other suggestions.

kg · 2016-01-26T01:55:01Z

@kg That's not my experience. It helps the compressor to pack and avoid unnecessary redundancy. Even brotli only looks back one or two bytes. We should decide on the limits, the lower limit seems to be one and the upper limit at most the page size. We are only interested in powers of 2, so only encode the power of two. The gives fours bits at most. We might be able to do even better given knowledge of the operations natural alignment - brotli only looks at 6 prior bits and might not even notice the difference between the opcodes to focus the codes.

I tested this specific question and talked to one of the designers of Brotli. Once you split streams, the alignment values will be a stream of 1 (or 4) byte values which can be arithmetic coded or otherwise represented very efficiently. If we bitpack the alignment into 3-5 bits with other unrelated data in the other bits, what we're actually doing is destroying some of the potential for the compressor to efficiently encode either chunk of data.

Encoding the alignment as a power of two is a perfectly reasonable idea, but if you cap the alignment at some arbitrary threshold so you can pack bits, you may regret it. Vectors >256bytes in size are an inevitability, as are user-defined structures of size >256bytes.

ghost · 2016-01-26T02:14:43Z

@kg I really doubt that is correct, but we can settle it in a test. Here's why: most of the alignments will be the natural alignment. In a separate stream they will jump around and there is no context. Whereas in inline there is the context of the operator which could predict the natural alignment most of the time, most of the time it would be just one bit. What we might be able to do is optimize the encoding to be 0 for the natural alignment, then when packed in a separate stream it would compress very well.

kg · 2016-01-26T02:18:38Z

@kg I really doubt that is correct, but we can settle it in a test.

I tested it. It's in js-astcompressor, and you can also see some general things to this effect in the JSZap paper. If you are confused I would be happy to explain it to you in depth offline.

In a separate stream they will jump around and there is no context. Whereas in inline there is the context of the operator which could predict the natural alignment most of the time, most of the time it would be just one bit.

This does bring up an important detail. If your stream splitting is coarse-grained, like 'here is all the uint8s', you're correct that the values will be harder to predict. That's not the right way to do stream splitting (though it is, in fact, better than nothing). You want to split streams semantically, so your individual streams are things like 'opcodes', 'int8 constant immediates' and 'memory load alignments'.

ghost · 2016-01-26T02:45:02Z

@kg I think I understand your point, but have not explore it. Here's a test.

A: split out all the load/store alignments into a separate binary file, one byte for each alignment, e.g. hex dump 04 04 04 04 04 04 ... zlib benchmark. Compressed to max with brotli gives 393 bytes.
B: split out all the natural alignments as zero and non-natural as-is, one byte for each. Same compression gives 13 bytes.

The context helps. Another option would be to keep the one bit flag, in the v8 encoding, and have a separate stream for the non-natural cases, and in that case since most of the alignments are natural this side stream would be very small.

mbodart · 2016-01-26T17:15:11Z

Is there any data to show that natural alignment is common?

Yes, data is typically naturally aligned when allocated. But given a pointer parameter, say "int *p",
a C/C++ compiler often cannot guarantee that p is well aligned as it most likely hasn't seen
its allocation. Are we expecting C/C++ compilers to optimistically mark references
with natural alignment, and only specify lesser alignment when that is clear to them?

Given that alignment is only a hint, it's utility is not clear to me.

sunfishcode · 2016-01-26T17:30:49Z

We do expect C/C++ compilers to optimistically mark references with natural alignment, even when they haven't seen the allocation, and only specify lesser alignments with that is clear to them. Natural alignment is the overwhelming majority.

kg · 2016-01-26T18:35:24Z

@mbodart

Given that alignment is only a hint, it's utility is not clear to me.

The importance of the alignment hint is largely for the polyfill, but it matters on some native architectures too: If we want to do unaligned reads/writes in asm.js we have to jump through some obscene hoops to make them work. So if we don't know the alignment status of any memory operations, we eat a performance hit on all memory traffic.

The alignment hint allows a compiler to say 'it's okay to break me if I lie about my alignment' in the common case so that you only get bad performance in the spots where it's needed.

I agree with @sunfishcode that natural alignment is probably the overwhelming majority in practice. I do think it's the case that some compilers will not trivially be able to flag loads/stores as aligned without doing a full global analysis, though...

mbodart · 2016-01-26T19:00:56Z

I think we're in agreement, as long as when you say "it's okay to break me" you really mean "it's okay to lessen my performance, but not crash".

I'm not so concerned about MVP. But when simd comes along, the current alignment definition does not allow an engine to choose movaps instead of movups (on X86), as movaps would fault if unaligned. This may be moot as movups performs well on more recent cpus when the data is aligned.

sunfishcode · 2016-02-01T03:05:01Z

I believe wasm's current alignment hints do permit implementations to use movaps for SIMD, provided they're capable of catching the hardware trap and fixing things up.

ghost · 2016-03-27T22:04:26Z

This seems to have been resolved in the binary format which has a maximum alignment of the access natural alignment, or is it still open to discussion to increased this maximum?

lukewagner · 2016-03-28T17:05:19Z

Agreed.

sunfishcode · 2016-04-22T00:01:59Z

This is not yet implemented in ml-proto. For example, memory.wast tests that (module (memory 0) (func (i32.load align=8 (i32.const 0)))) validates.

dschuff · 2016-04-22T16:59:28Z

Actually, would it make sense to at least allow say 4 or 8, even for smaller-sized memory references? I could totally imagine there being plenty of e.g. byte-sized loads with alignments of 4, and I don't see any reason to pessimize those. Is there some particular reason the binary format doesn't allow that?

dschuff · 2016-04-22T17:01:19Z

(By the way I noticed that @sunfishcode committed an LLVM patch limiting the alignment on loads and stores to natural alignment. I guess that was just so that its output could be properly encoded with the 0xb format today?)

flagxor · 2016-04-22T17:24:32Z

Was size savings the only argument for capping at natural alignment?
Seems a shame to throw away information we have in LLVM.
Though I admit implementations probably won't get around to using it for a bit.
Dan, Ben, others, is there any particular reason not to allow up to page size?

titzer · 2016-04-25T07:51:45Z

As a hint, it can't do any damage.

jfbastien · 2016-04-25T16:29:04Z

Could we allow alignment hints on load / store up to page size?

kripken · 2016-04-25T16:33:12Z

Could that be encoded in a way that doesn't increase the binary size noticeably?

jfbastien · 2016-04-25T16:37:48Z

If it's frequent enough to matter then macro-compression will pick it up. If a user is OK dropping hints to lose potential perf then it can be stripped away.

The hint is useful in guiding speculative optimizations by the JIT (test and recompile, otherwise main code assumes alignment is correct). You can still do that speculation without the hint, but why throw away information LLVM has?

titzer · 2016-04-25T16:40:05Z

On Mon, Apr 25, 2016 at 6:37 PM, JF Bastien [email protected]
wrote:

If it's frequent enough to matter then macro-compression will pick it up.
If a user is OK dropping hints to lose potential perf then it can be
stripped away.

The hint is useful in guiding speculative optimizations by the JIT (test
and recompile, otherwise main code assumes alignment is correct). You can
still do that speculation without the hint, but why throw away information
LLVM has?

The question is whether encoding page size alignment (a 3-byte LEB) is
worth it, versus just specifying natural alignment.

—
You are receiving this because you commented.
Reply to this email directly or view it on GitHub
#217 (comment)

kripken · 2016-04-25T17:03:20Z

If it's frequent enough to matter then macro-compression will pick it up.

Sure, but perhaps to the detriment of other things the compression could have reduced. I think this is worth measuring.

sunfishcode · 2016-04-26T02:08:29Z

What kinds of optimizations would supernatural alignment hints enable? LLVM already does the main alignment-related optimizations, such as merging adjacent memory accesses into larger accesses.

jfbastien · 2016-04-26T16:57:15Z

What kinds of optimizations would supernatural alignment hints enable? LLVM already does the main alignment-related optimizations, such as merging adjacent memory accesses into larger accesses.

Depends on the architecture. The basic one is to speculate is-aligned, vectorize or merge some operations but avoid scalar entry+exit loops as well as isn't-aligned loop. Test-and-interp on failure, recompile if too often. Less code is executed dynamically and less is generated statically. That's paid off in my experience.

sunfishcode · 2016-04-26T18:30:05Z

@jfbastien LLVM already does that optimization. If it has a bunch of sequential memory accesses and it knows the alignment of the whole group, and it's not too much work to marshal the data around for a merged access, it'll merge them itself.

jfbastien · 2016-04-26T19:27:15Z

@jfbastien LLVM already does that optimization. If it has a bunch of sequential memory accesses and it knows the alignment of the whole group, and it's not too much work to marshal the data around for a merged access, it'll merge them itself.

I don't think we're talking of the same thing: I'm talking about speculative optimizations which lead to recompilation on speculation failure. One of the big costs of speculation is when the optimizer is over-eager and assumes things that aren't true, that's where the extra info would pay off.

sunfishcode · 2016-04-26T19:46:56Z

LLVM's own alignment information is never speculative in nature. At the LLVM level, it's full-octopus UB if the alignment isn't correct.

In general, wasm's alignment hint works well for these kinds of "highly confident" cases. If we change wasm's alignment hint to be used for more speculative use cases, such as humans decorating their own code, then VM's will have to be more conservative with it, which would weaken it for the high-confidence use cases.

jfbastien · 2016-04-26T22:49:47Z

LLVM's own alignment information is never speculative in nature. At the LLVM level, it's full-octopus UB if the alignment isn't correct.

Indeed, but wasm can't take that for granted. Using that information for speculation will therefore always be correct, unless the developer gives incorrect information.

In general, wasm's alignment hint works well for these kinds of "highly confident" cases. If we change wasm's alignment hint to be used for more speculative use cases, such as humans decorating their own code, then VM's will have to be more conservative with it, which would weaken it for the high-confidence use cases.

That's not what I'm suggesting: I'm saying that the VM can speculate because the developer hinted that the alignment value was correct. These types of high-confidence cases are exactly where we want to avoid throwing away information.

sunfishcode · 2016-04-27T00:56:09Z

If the producer has high confidence, it can just do the optimization itself. LLVM already does this. Other optimizing compilers commonly do such things. It wouldn't be too difficult to implement this in a relatively lightweight JIT library. Doing this on the producer side reduces time, complexity, and unpredictable performance on the consumer side, so it's worth encouraging.

While speculative optimizations with bailouts and recompilations may be easy to support in some of today's JS-engine-based wasm implementations, there are advantages to building wasm in a way that minimizes its dependence on them to achieve good performance. Greater alignment hints to support greater speculative optimization is one path we might take here, but there are other paths we can take here as well, from a broader perspective.

lukewagner · 2016-04-27T04:47:21Z

The question is whether encoding page size alignment (a 3-byte LEB) is worth it, versus just
specifying natural alignment.

@titzer Since alignment in flags is currently encoded as log2(alignment) 64k-alignment would still fit in 1 byte.

One thing that bugs me about the current design is that the number of reserved bits depends on the load/store width. That seems bug-prone for everyone involved; it means that if we, e.g., use the bits after for ordering annotations that they start at a different bit for i32.load than i64.load. Simply reserving a fixed number of bits for the max alignment we think we might ever need seems simpler. 2 bits only buys 8-byte alignment, which is already too small for i32x4; 3 bits bumps that way up to 128-byte alignment but that is "only" 2x bigger than AVX-512 so it still seems shortsighted. 4 bits provides 32k-alignment which seems like truly enough alignment for anyone. So how about just fixing alignment to 4 bits in flags?

(I'm pretty dubious about autovectorization opportunities in wasm, though. For one thing, I think it benefits the whole platform if wasm engines aren't trying to do fancy optimizations since these add cost (time and predictability) and, once the race begins, engines can be forced into a situation where they end up hurting the common case for the benefit of benchmarketting; we've seen this with JS. Instead, we should strive to expose the underlying mechanism (viz., SIMD ops) so that wasm generators can be explicit about what they want and not rely on magic. So I'd wish we could strike autovectorization as an argument for super-natural alignment (or anything else, for that matter).)

ghost · 2016-04-27T05:00:46Z

@lukewagner Four bits was my suggestion way above too, and could you please also consider assigning the natural alignment the value zero to help compression too.

lukewagner · 2016-04-27T15:29:50Z

If we can fix immediates in the opcode table (we're working on a prototype impl so we can make a real proposal soon), there won't be any immediate in the common case, so it seems like we won't have to worry about compression effects of the immediate.

rossberg · 2016-08-12T15:18:32Z

Check now implemented.

As in: WebAssembly/spec#302 and WebAssembly/spec#217

It is implemented as 1000000 in both V8 and Firefox. V8: https://github.com/v8/v8/blob/64aeabbc95576892b61d3e99e60ea74bbf734f80/src/wasm/wasm-limits.h#L36 Firefox: https://github.com/mozilla/gecko-dev/blob/7b9d23ece4835bf355e5195f30fef942d376a1c7/js/src/wasm/WasmConstants.h#L1087 Fixes WebAssembly#212.

ghost mentioned this issue Feb 3, 2016

Alignment hints: boolean or integer? WebAssembly/design#534

Closed

lukewagner closed this as completed Mar 28, 2016

sunfishcode reopened this Apr 22, 2016

sunfishcode added this to the MVP milestone Apr 22, 2016

rossberg mentioned this issue Jul 12, 2016

Check maximum alignment #309

Merged

rossberg closed this as completed Aug 12, 2016

jfbastien added a commit to WebAssembly/design that referenced this issue Oct 5, 2016

Alignment maximum

4d789e5

As in: WebAssembly/spec#302 and WebAssembly/spec#217

jfbastien mentioned this issue Oct 5, 2016

Alignment maximum WebAssembly/design#815

Merged

jfbastien added a commit to WebAssembly/design that referenced this issue Oct 5, 2016

Alignment maximum (#815)

acfa2e2

As in: WebAssembly/spec#302 and WebAssembly/spec#217

Maximum alignment? #217

Maximum alignment? #217

Comments

jfbastien commented Jan 26, 2016

mbodart commented Jan 26, 2016

jfbastien commented Jan 26, 2016

ghost commented Jan 26, 2016

kg commented Jan 26, 2016

ghost commented Jan 26, 2016

ghost commented Jan 26, 2016

kg commented Jan 26, 2016

ghost commented Jan 26, 2016

kg commented Jan 26, 2016

ghost commented Jan 26, 2016

mbodart commented Jan 26, 2016

sunfishcode commented Jan 26, 2016

kg commented Jan 26, 2016

mbodart commented Jan 26, 2016

sunfishcode commented Feb 1, 2016

ghost commented Mar 27, 2016

lukewagner commented Mar 28, 2016

sunfishcode commented Apr 22, 2016 • edited Loading

dschuff commented Apr 22, 2016

dschuff commented Apr 22, 2016

flagxor commented Apr 22, 2016

titzer commented Apr 25, 2016

jfbastien commented Apr 25, 2016

kripken commented Apr 25, 2016

jfbastien commented Apr 25, 2016

titzer commented Apr 25, 2016

kripken commented Apr 25, 2016

sunfishcode commented Apr 26, 2016

jfbastien commented Apr 26, 2016

sunfishcode commented Apr 26, 2016 • edited Loading

jfbastien commented Apr 26, 2016

sunfishcode commented Apr 26, 2016

jfbastien commented Apr 26, 2016

sunfishcode commented Apr 27, 2016

lukewagner commented Apr 27, 2016 • edited Loading

ghost commented Apr 27, 2016

lukewagner commented Apr 27, 2016

rossberg commented Aug 12, 2016

sunfishcode commented Apr 22, 2016 •

edited

Loading

sunfishcode commented Apr 26, 2016 •

edited

Loading

lukewagner commented Apr 27, 2016 •

edited

Loading