-
Notifications
You must be signed in to change notification settings - Fork 696
Are 64-bit addresses a mode or separate opcodes? #255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think we should have separate bytecodes, since loads/stores already have
Having separate bytecodes has the nice advantage that the "signatures" i.e. On Tue, Jul 7, 2015 at 3:11 PM, Dan Gohman [email protected] wrote:
|
I should also mention the "bells and whistles" I was talking about for // Functionality related to encoding memory accesses. struct MemoryAccess { // Atomicity annotations for access to the memory and globals. enum Atomicity {
}; // Alignment annotations for memory accesses. enum Alignment { kAligned = 0, kUnaligned = 1 }; // Memory width for integer accesses. enum IntWidth { kInt8 = 0, kInt16 = 1, kInt32 = 2, kInt64 = 3 }; // Bitfields for the various annotations for memory accesses. typedef BitField<IntWidth, 0, 2> IntWidthField; typedef BitField<bool, 2, 1> SignExtendField; typedef BitField<Alignment, 3, 1> AlignmentField; typedef BitField<Atomicity, 4, 2> AtomicityField; }; That means that loads/stores are 2-byte opcodes, with the first bytecode We don't have to use the exact bits above, just pasting for reference. On Tue, Jul 7, 2015 at 3:15 PM, Ben L. Titzer [email protected] wrote:
|
Implementations are free to have separate opcodes internally. Is the utility to developers of being able to mix pointer sizes within a process worth the potential complexity to developers? |
Outside of implementation convenience, I think there are two advantages On Tue, Jul 7, 2015 at 3:25 PM, Dan Gohman [email protected] wrote:
|
Since we expect the binary format to have an opcode name table, all this requires is that implementations check the mode when doing the initial opcode name lookups. For the rest of the decoding, there is no additional burden. The question here is whether the benefits of allowing mixed 32-bit/64-bit applications outweigh the ecosystem complexity. A hybrid option is also possible; we could have separate opcode names, but still prohibit them from being used within the same module. This would give us most of the ecosystem simplicity of having modes, while still leaving the door open for mixed-mode operation in the future. |
|
Is this something LLVM would/could do? If so, that would certainly give a nice performance motivation (since you could potentially eliminate all bound checks on the 32-bit accesses). Last week I talked with @sunfishcode over IRC about this and it didn't seem like this is something LLVM would take advantage of. From the wasm VM perspective I don't see much difference in complexity whether we allow mixed access or not, but there might be some things that come up (e.g. with dynamic linking) that make mixed access more difficult.
This is something I would not want to see. It sounds like a lot of work, added complexity, terrible performance implications, and is something developers might overlook. I'd rather the hard limit for 32-bit heaps. That way if developers need more space, they don't inadvertently (albeit likely with some console warning) have their app cross that threshold, and instead they have to compile for 64-bit. |
I see the aesthetic argument for having the signature/semantics of ops be independent of any global mode and so I like the hybrid @sunfishcode proposed above: have separate <4gib and >4gib ops and validation only accepts one kind. With module-local opcode tables, there shouldn't be any index space wasted and the implementation will stay simple and not have to worry about mixing. |
Let's make sure dynamic linking also works. Sidenote: what's nice about separate ops is also that we can loosen the rules later if we want to (something that used to fail validation because it mixed accesses could now be made to pass). |
I agree with JF in that separate bytecode leaves more options for the On Tue, Jul 7, 2015 at 10:21 PM, JF Bastien [email protected]
|
Ok, so you're saying that VMs may want to generate 32-bit code that only operates on data in the low 4GiB while the VM code uses the full 64-bit address space? I think I can see this use case. For MVP, I like the idea of starting with not allowing this mixing and, as @jfbastien said, considering loosening later on. |
Ok, then it sounds we have consensus for the hybrid approach: separate opcode names, but prohibit them from being used at the same time for now. Allowing both at the same time can be a Future Feature. I'll make a PR. And in the LLVM port we need to change the target triple again ;-) |
This is now answered. |
In #245, @titzer asked whether 64-bit addressing (for >4GiB linear memory) should be a mode or just separate opcodes.
In practice, C/C++ and similar code is going to require one form or the other. Several people have looked at address-size abstractions, but ultimately decided to just accept it as a mode at the C/C++ ABI level, so we're currently expecting to have two ABIs anyway.
And, the utility of having a >4GiB heap with some code in the process that can only access the low 4 GiB of it seems marginal, and the potential for confusion and mistakes under such circumstances is significant.
Having two modes at the platform level seems indicated. Does anyone have other opinions?
The text was updated successfully, but these errors were encountered: