-
Notifications
You must be signed in to change notification settings - Fork 472
Maximum alignment? #217
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
By "test for it", I assume you mean just a range check on the alignment value. If I'm interpreting AstSemantics.md correctly, the alignment is just a performance hint, not a "trappable" guarantee. Is this a correct interpretation? |
Correct, check it in the implementation, and have corresponding tests in the spec test suite. Your interpretation is correct. |
This came up before somewhere, and I recall some support for the page size being the maximum. If the minimum is one then this could be encoded in 4 bits. But perhaps 256 is also a reasonable limit, encoded in 3 bits, or even 16 encoded in 2 bits. The v8 encoding only has one bit for the alignment - the natural alignment flag. |
Don't bother worrying about bitpacking. Every stream compressor will do it for you. :-) |
@kg That's not my experience. It helps the compressor to pack and avoid unnecessary redundancy. Even brotli only looks back one or two bytes. We should decide on the limits, the lower limit seems to be one and the upper limit at most the page size. We are only interested in powers of 2, so only encode the power of two. The gives fours bits at most. We might be able to do even better given knowledge of the operations natural alignment - brotli only looks at 6 prior bits and might not even notice the difference between the opcodes to focus the codes. |
Sorry, if the lower limit is one, that would be 2^0 so four bits only gets to 2^15 which is half of the wasm page size. Similar error in the other suggestions. |
I tested this specific question and talked to one of the designers of Brotli. Once you split streams, the alignment values will be a stream of 1 (or 4) byte values which can be arithmetic coded or otherwise represented very efficiently. If we bitpack the alignment into 3-5 bits with other unrelated data in the other bits, what we're actually doing is destroying some of the potential for the compressor to efficiently encode either chunk of data. Encoding the alignment as a power of two is a perfectly reasonable idea, but if you cap the alignment at some arbitrary threshold so you can pack bits, you may regret it. Vectors >256bytes in size are an inevitability, as are user-defined structures of size >256bytes. |
@kg I really doubt that is correct, but we can settle it in a test. Here's why: most of the alignments will be the natural alignment. In a separate stream they will jump around and there is no context. Whereas in inline there is the context of the operator which could predict the natural alignment most of the time, most of the time it would be just one bit. What we might be able to do is optimize the encoding to be 0 for the natural alignment, then when packed in a separate stream it would compress very well. |
I tested it. It's in js-astcompressor, and you can also see some general things to this effect in the JSZap paper. If you are confused I would be happy to explain it to you in depth offline.
This does bring up an important detail. If your stream splitting is coarse-grained, like 'here is all the uint8s', you're correct that the values will be harder to predict. That's not the right way to do stream splitting (though it is, in fact, better than nothing). You want to split streams semantically, so your individual streams are things like 'opcodes', 'int8 constant immediates' and 'memory load alignments'. |
@kg I think I understand your point, but have not explore it. Here's a test.
The context helps. Another option would be to keep the one bit flag, in the v8 encoding, and have a separate stream for the non-natural cases, and in that case since most of the alignments are natural this side stream would be very small. |
Is there any data to show that natural alignment is common? Yes, data is typically naturally aligned when allocated. But given a pointer parameter, say "int *p", Given that alignment is only a hint, it's utility is not clear to me. |
We do expect C/C++ compilers to optimistically mark references with natural alignment, even when they haven't seen the allocation, and only specify lesser alignments with that is clear to them. Natural alignment is the overwhelming majority. |
The importance of the alignment hint is largely for the polyfill, but it matters on some native architectures too: If we want to do unaligned reads/writes in asm.js we have to jump through some obscene hoops to make them work. So if we don't know the alignment status of any memory operations, we eat a performance hit on all memory traffic. The alignment hint allows a compiler to say 'it's okay to break me if I lie about my alignment' in the common case so that you only get bad performance in the spots where it's needed. I agree with @sunfishcode that natural alignment is probably the overwhelming majority in practice. I do think it's the case that some compilers will not trivially be able to flag loads/stores as aligned without doing a full global analysis, though... |
I think we're in agreement, as long as when you say "it's okay to break me" you really mean "it's okay to lessen my performance, but not crash". I'm not so concerned about MVP. But when simd comes along, the current alignment definition does not allow an engine to choose movaps instead of movups (on X86), as movaps would fault if unaligned. This may be moot as movups performs well on more recent cpus when the data is aligned. |
I believe wasm's current alignment hints do permit implementations to use movaps for SIMD, provided they're capable of catching the hardware trap and fixing things up. |
This seems to have been resolved in the binary format which has a maximum alignment of the access natural alignment, or is it still open to discussion to increased this maximum? |
Agreed. |
This is not yet implemented in ml-proto. For example, memory.wast tests that |
Actually, would it make sense to at least allow say 4 or 8, even for smaller-sized memory references? I could totally imagine there being plenty of e.g. byte-sized loads with alignments of 4, and I don't see any reason to pessimize those. Is there some particular reason the binary format doesn't allow that? |
(By the way I noticed that @sunfishcode committed an LLVM patch limiting the alignment on loads and stores to natural alignment. I guess that was just so that its output could be properly encoded with the 0xb format today?) |
Was size savings the only argument for capping at natural alignment? |
As a hint, it can't do any damage. |
Could we allow alignment hints on load / store up to page size? |
Could that be encoded in a way that doesn't increase the binary size noticeably? |
If it's frequent enough to matter then macro-compression will pick it up. If a user is OK dropping hints to lose potential perf then it can be stripped away. The hint is useful in guiding speculative optimizations by the JIT (test and recompile, otherwise main code assumes alignment is correct). You can still do that speculation without the hint, but why throw away information LLVM has? |
On Mon, Apr 25, 2016 at 6:37 PM, JF Bastien [email protected]
|
Sure, but perhaps to the detriment of other things the compression could have reduced. I think this is worth measuring. |
What kinds of optimizations would supernatural alignment hints enable? LLVM already does the main alignment-related optimizations, such as merging adjacent memory accesses into larger accesses. |
Depends on the architecture. The basic one is to speculate is-aligned, vectorize or merge some operations but avoid scalar entry+exit loops as well as isn't-aligned loop. Test-and-interp on failure, recompile if too often. Less code is executed dynamically and less is generated statically. That's paid off in my experience. |
@jfbastien LLVM already does that optimization. If it has a bunch of sequential memory accesses and it knows the alignment of the whole group, and it's not too much work to marshal the data around for a merged access, it'll merge them itself. |
I don't think we're talking of the same thing: I'm talking about speculative optimizations which lead to recompilation on speculation failure. One of the big costs of speculation is when the optimizer is over-eager and assumes things that aren't true, that's where the extra info would pay off. |
LLVM's own alignment information is never speculative in nature. At the LLVM level, it's full-octopus UB if the alignment isn't correct. In general, wasm's alignment hint works well for these kinds of "highly confident" cases. If we change wasm's alignment hint to be used for more speculative use cases, such as humans decorating their own code, then VM's will have to be more conservative with it, which would weaken it for the high-confidence use cases. |
Indeed, but wasm can't take that for granted. Using that information for speculation will therefore always be correct, unless the developer gives incorrect information.
That's not what I'm suggesting: I'm saying that the VM can speculate because the developer hinted that the alignment value was correct. These types of high-confidence cases are exactly where we want to avoid throwing away information. |
If the producer has high confidence, it can just do the optimization itself. LLVM already does this. Other optimizing compilers commonly do such things. It wouldn't be too difficult to implement this in a relatively lightweight JIT library. Doing this on the producer side reduces time, complexity, and unpredictable performance on the consumer side, so it's worth encouraging. While speculative optimizations with bailouts and recompilations may be easy to support in some of today's JS-engine-based wasm implementations, there are advantages to building wasm in a way that minimizes its dependence on them to achieve good performance. Greater alignment hints to support greater speculative optimization is one path we might take here, but there are other paths we can take here as well, from a broader perspective. |
@titzer Since alignment in One thing that bugs me about the current design is that the number of reserved bits depends on the load/store width. That seems bug-prone for everyone involved; it means that if we, e.g., use the bits after for ordering annotations that they start at a different bit for (I'm pretty dubious about autovectorization opportunities in wasm, though. For one thing, I think it benefits the whole platform if wasm engines aren't trying to do fancy optimizations since these add cost (time and predictability) and, once the race begins, engines can be forced into a situation where they end up hurting the common case for the benefit of benchmarketting; we've seen this with JS. Instead, we should strive to expose the underlying mechanism (viz., SIMD ops) so that wasm generators can be explicit about what they want and not rely on magic. So I'd wish we could strike autovectorization as an argument for super-natural alignment (or anything else, for that matter).) |
@lukewagner Four bits was my suggestion way above too, and could you please also consider assigning the natural alignment the value zero to help compression too. |
If we can fix immediates in the opcode table (we're working on a prototype impl so we can make a real proposal soon), there won't be any immediate in the common case, so it seems like we won't have to worry about compression effects of the immediate. |
Check now implemented. |
It is implemented as 1000000 in both V8 and Firefox. V8: https://github.com/v8/v8/blob/64aeabbc95576892b61d3e99e60ea74bbf734f80/src/wasm/wasm-limits.h#L36 Firefox: https://github.com/mozilla/gecko-dev/blob/7b9d23ece4835bf355e5195f30fef942d376a1c7/js/src/wasm/WasmConstants.h#L1087 Fixes WebAssembly#212.
What's the maximum alignment for load/store?
We should probably specify something, e.g.
log2(address_space_bytes)
, and test for it.The text was updated successfully, but these errors were encountered: