Skip to content

Memory w/ 64-bit indexes #1325

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
binji opened this issue Jan 30, 2020 · 17 comments
Closed

Memory w/ 64-bit indexes #1325

binji opened this issue Jan 30, 2020 · 17 comments

Comments

@binji
Copy link
Member

binji commented Jan 30, 2020

Currently wasm only provides up to 4GiB of linear memory. The JS API currently restricts this further to 2 GiB. As mentioned in that issue, some applications are already hitting the 2GiB limit, and will almost certainly hit the 4GiB limit soon after.

As an alternative to wasm64 (which is still very much vaporware), we can introduce 64-bit memories. (See previous discussion) We can extend the current format as follows:

  • The limits structure would now contain an index type, either i32 or i64. (This would also allow us to extend to tables w/ 64-bit indexes later.) If it is i64, then the min and max fields have the type u64.
  • We could choose to allow for a u64 offset in memarg. Perhaps this is not necessary though.
  • Limits validation is extended to 2**48 (since it is measured in 2**16-byte pages), and would require matching index types.
  • All memory instructions are now validated with their index type (i32 or i64).
  • Data segment offset expressions must validate to the index type of the memory.
  • Memory instances are extended to allow 64 bit vectors (the current vec has a maximum of 2**32).
  • Growing memory instances must be extended to 2**48.
  • Assuming we build on top of the bulk-memory-proposal, then in module instantiation the memory.init instructions will use i64 for 64-bit memories.
  • The limits binary encoding will be extended to allow 64-bit indexes. We'll want to use 0x4, since 0x2 is currently used for indicating shared memory.
  • If we allow u64 offsets in memarg, we can extend the binary format to be u64, then validate that they are in bounds later.
  • In the text format, we can mark a 64-bit memory by allowing an optional i64 before the min/max values of the limits.

In the JS spec, we'll have to decide how to expose this new memory to JS. Is there a maximum ArrayBuffer size (or perhaps maximum array index)? If so, we may need to suggest allowing bigints as an index.

I think that covers everything...? It seems like it should be a pretty minor spec change. It also means we don't need to add any new instructions or sections. What does everyone think?

@kmiller68
Copy link
Contributor

I would be good to see the performance difference between 64-bit memories and 32-bit memories. Browsers rely heavily on VM page faults + clamping to enforce bounds checks. IIRC, disabling this was roughly a 10-15% performance regression. Unless there's an optimization I haven't seen yet that fixes this problem.

This proposal, as is, forces engines to explicitly bounds check. I'd expect this to be a significant enough performance cliff to be a deal breaker for most apps...

@binji
Copy link
Member Author

binji commented Jan 30, 2020

Good point. @titzer mentioned back in 2018 a potential trick that might help here:

I think there are better tricks available to the engine for WASM64. E.g. the engine can allocate a second guard region of > 1TiB and emit one additional memory access to guard[index >> 24] before the normal access of the memory memory[index+base+offset]. The second guard region can be arranged so that any upper bits of the index being set results in an access to an unmapped page in the guard region.

@aardappel
Copy link

As an alternative to wasm64.. can you describe what the main thing missing is in what you propose compared to a "full" wasm64? Given that this requires compilers to output code that stores/computes i64 address operands for loads/stores, doesn't this get us most of the way?

@binji
Copy link
Member Author

binji commented Jan 30, 2020

can you describe what the main thing missing is in what you propose compared to a "full" wasm64

Perhaps this is what folks meant by wasm64. I always thought it meant changing all of the indexes to be 64-bit. So for example, you could have > 2**32 functions, a section could be > 2**32 bytes, etc. Perhaps no one mentioned this because it's not as useful, generally. That said, it did come up w.r.t. custom sections.

@aardappel
Copy link

Ahh.. yes, most of these don't sound particularly urgent to me :)
Totally in favor of going with this minimal set of changes first.
Sounds like relaxing the size of sections could be done independently?

@backes
Copy link
Member

backes commented Jan 31, 2020

Good point. @titzer mentioned back in 2018 a potential trick that might help here:

I think there are better tricks available to the engine for WASM64. E.g. the engine can allocate a second guard region of > 1TiB and emit one additional memory access to guard[index >> 24] before the normal access of the memory memory[index+base+offset]. The second guard region can be arranged so that any upper bits of the index being set results in an access to an unmapped page in the guard region.

An alternative would be multi-level memory lookup, similarly to how page tables work on Linux. In a 64-bit address space, that would require at least three levels to keep the tables at each level at a manageable size. Two levels could still be implemented by only reserving (but not committing) the huge tables, but it would waste a lot of virtual address space.
Inaccessible regions can be memory-protected at each level (whole pages within each level), and pointers to inaccessible regions within an otherwise accessible page at a given level (in the last accessible page) would just point to a (global) inaccessible region.

That scheme would require two additional loads for each memory operation. I could image that to still be faster than a dynamic branch.

@sunfishcode
Copy link
Member

"wasm64" is indeed just a tool convention, and the main feature it depends on is linear memories with 64-bit indices.

For 64-bit section sizes or data initializer sizes, many programs wouldn't need them, and adding them in the future wouldn't break compatibility.

If we wanted tables with 64-bit indices for "wasm64" function pointers, that would be harder to add without breaking compatibility, so I opened WebAssembly/tool-conventions#136 to discuss function pointers.

@lukewagner
Copy link
Member

The proposal makes sense and indeed coincides with what I was vaguely assuming "wasm64" meant: basically, making 32-vs-64 a memory-by-memory choice by having it be part of the memory type. This has the added benefit of, with multi-memory, allowing a single module to use both 32-bit and 64-bit memories, which could be necessary for various fancy uses of wasm.

@binji
Copy link
Member Author

binji commented Feb 1, 2020

Cool, I'm glad this is generally what folks were thinking of. :-)

A few questions:

  • Does the memory type have to match exactly when importing memory? Is importing a 32-bit memory as 64-bit allowed? Is the reverse allowed, perhaps only allowing access to the lower 32-bit indexes? If so, how does memory.grow work?
  • Do we want to allow a u64 offset in the memarg? We probably do, but given that the offset is generally used for offsets into a structure, perhaps it's not necessary.
  • How do we want to handle effective address calculation? We currently require that the offset+index is calculated as a 33-bit sum (i.e. it doesn't wrap). Do we also require a 65-bit sum for offset+index, or is it OK to wrap?
  • How can we access from JS? Does an ArrayBuffer index have to be 32-bit, or can we assume it will work up to MAX_SAFE_INTEGER (253-1)? Is it OK that we can't access indexes larger than that (8PiB)?

@lars-t-hansen
Copy link
Contributor

I think that for the time being, we should assume that the memory types have to match exactly. It's plausible that we will want to use different techniques to avoid OOB checks for 32-bit and 64-bit memories, and these techniques may require the memories to be allocated differently (eg with different buffer zones). I think that if we allow non-matching memory types we should justify it well.

ArrayBuffers are limited by the range of Index values, [0 .. 2^53-1]. It does not seem to me to be much of a practical problem for a JS embedding to limit a wasm memory to this size.

Would it be meaningful to allow 64-bit memories to be accessed with i32 addresses? Even on 64-bit architectures there's an advantage (instruction size, instruction count) to avoiding 64-bit data, and a toolchain that places static data in the low 4GB would benefit from the size and performance boost, modest though it may be.

@jakobkummerow
Copy link

JS ArrayBuffers are spec'ed to be capped at MAX_SAFE_INTEGER (via ToLength being performed on the constructor arg).
Of course that doesn't imply that all implementations actually support that much. (V8 does.)

That said, current x86_64 platforms only support 248 bits of virtual address space, and only very few computers in existence today have more than 235 bytes (32GB) of physical memory [1]. I think it might make sense to define a limit lower than 264 for Wasm, to reflect this reality.

One could even go as far as starting with a max of 233, and expecting to bump it by one bit every few years as the need arises. Once the underlying types are spec'ed and implemented as i64/u64, bumping the limit should be trivially easy in terms of implementation effort. Or we could set it to, say, 248, and expect to leave it at that until that value becomes a limiting factor (=a couple of decades probably?).

[1] https://store.steampowered.com/hwsurvey/Steam-Hardware-Software-Survey-Welcome-to-Steam says "6.8% have more than 16GB", and doesn't even bother with a "more than 32GB" bucket. I know that >32GB machines exist (I'm typing this on one), but I also know that they are rare.

@aardappel
Copy link

Would it be meaningful to allow 64-bit memories to be accessed with i32 addresses? Even on 64-bit architectures there's an advantage (instruction size, instruction count) to avoiding 64-bit data, and a toolchain that places static data in the low 4GB would benefit from the size and performance boost, modest though it may be.

That actually sounds like relatively a lot of complexity for a very small gain. Static addresses in an i32.const are already a LEB, so they are equally small for small static data on 32 and 64-bit (assuming the expanded LEBs needed for relocs have been undone). Pointers to static data inside other static data (e.g. a static array of static strings) could use 32-bit pointers, but now LLVM needs to carry these mixed pointer sizes around.. and then it somehow needs to be undone if during linking static data goes >4GB? Maybe LTO-only? Sounds tricky to me.

@billti
Copy link

billti commented Feb 6, 2020

Per @jakobkummerow comments: There has to be a practical limit that is quite a bit lower. Commonly on Windows x64 (and several other common platforms I believe), the user-space limit on memory is 8TB (i.e. 43-bits from 0x000'00000000 through 0x7FF'FFFFFFFF - see https://docs.microsoft.com/en-us/windows-hardware/drivers/gettingstarted/virtual-address-spaces). However you obviously have to share the space with the actual process hosting the Wasm engine.

As Wasm expects a "linear" address space, a range is typically reserved up front, but not committed, then gradually committed as the memory grows. Being that you already have binaries, stack-space, heap, etc... mapped somewhere into that 8TB range, you're not going to be able to reserve anywhere near that amount generally - and that's ignoring the fact you might have multiple memories (or VM contexts/isolates in a browser) in the same process.

So being that any "growable" memory needs to reserve a range up front to be able to grow into, each engine is going to need to pick a practical limit well below 8TB if the standard doesn't specify one. (Unless of course they choose to NOT reserve a maximum amount up front, but attempt to reallocate a new range and copy all memory if they outgrow a range, but that would be very slow, and you still have the 8TB cap on total possible user-mode memory to deal with).

EDIT: Minor correction: Since Windows 8.1 there is ~128TB of user-mode address space available on x64 (i.e. 47 bits). But the meta-point of how much "linear space" to allow/reserve per Wasm memory on Wasm64 still stands.

@binji
Copy link
Member Author

binji commented Feb 6, 2020

Fortunately, this proposal doesn't change the current wasm behavior of memory.grow. So if an implementation can't grow memory, it is still fine to return -1 (which would be a i64 now). Agreed that we'll want a JS API limit to encourage compatibility, which initially might be very low (233 say as @jakobkummerow suggests), that we can gradually increase.

I'm curious if anyone has thoughts about this point:

How do we want to handle effective address calculation? We currently require that the offset+index is calculated as a 33-bit sum (i.e. it doesn't wrap). Do we also require a 65-bit sum for offset+index, or is it OK to wrap?

In particular, we can currently calculate the effective address (i32 offset + i32 index) by using 64-bit arithmetic on most machines. If we have an i64 offset and index, then the sum is 65 bits, which can't be easily calculated AFAIK. It seems this would make a trap-handling solution much more difficult. Any ideas?

@lars-t-hansen
Copy link
Contributor

@aardappel, I see the complexity for current toolchains. And it's conservative not to allow 32-bit indices for 64-bit memories initially. Perhaps something to investigate empirically further in the future, if other types of toolchains and wasm binaries appear.

sunfishcode added a commit to bytecodealliance/target-lexicon that referenced this issue Feb 27, 2020
wasm64 is an early prototyping phase, but the name of the target is
sufficiently stable that we can add it here.

See also WebAssembly/design#1325.
@lukewagner
Copy link
Member

A module importing a 32-bit memory may want to assume the memory has guard pages and thus the module can elide bounds checks. However, 64-bit memories generally cannot use the guard page technique and so they may not want to have any guard pages at all. You could imagine giving 64-bit memories guard pages when the memories were smaller than the guard page size, but this seems an unfortunate impl requirement. So +1 for not having 64-bit memories satisfy 32-bit memory imports.

@binji
Copy link
Member Author

binji commented Mar 3, 2020

I created a new repo for this: https://github.com/WebAssembly/memory64. Let's continue the discussion there.

@binji binji closed this as completed Mar 3, 2020
sunfishcode added a commit to bytecodealliance/target-lexicon that referenced this issue Mar 5, 2020
wasm64 is an early prototyping phase, but the name of the target is
sufficiently stable that we can add it here.

See also WebAssembly/design#1325.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants