Skip to content

Loads, stores, memory types, and conversions #326

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rossberg opened this issue Sep 2, 2015 · 15 comments
Closed

Loads, stores, memory types, and conversions #326

rossberg opened this issue Sep 2, 2015 · 15 comments
Milestone

Comments

@rossberg
Copy link
Member

rossberg commented Sep 2, 2015

The current AstSemantics is somewhat baroque and inconsistent regarding loads and stores:

  • Both loads and stores specify the type twice, once as part of the local type and once as part of the memory type. This is redundant, because they can only differ in size.
  • Loads and stores duplicate the semantics of various conversion operators. It leads to a bit of an inflation of opcodes and is redundant because most can already be expressed by suitable compositions (and the cases where these combinations are needed are probably not very common either).
  • At the same time, some combinations are missing. See e.g. issue Extending FP loads, truncating FP stores #316 .
  • The load instruction for globals does not allow to choose sign extension behaviour.

The way I see it, there would be two possible approaches for a more consistent design:

1. Loads and stores with memory types, but minimal set of conversions

In this approach,

  • MemoryType is an extension of LocalType with additional small sizes.
  • Globals have a memory type.
  • Memory loads and stores have a memory type.
  • The only conversions built into loads and stores are for memory types that are not local types. Access at these types always convert to/from the smallest possible local type.
  • Thus, the local type is always uniquely determined by the memory type and doesn't need an extra annotation.

That is, there would be opcodes

int8.load_s/u    ;; results in an int32
int16.load_s/u   ;; results in an int32
int32.load
int64.load
float32.load
float64.load
load_global_s/u  ;; for globals of type int8/16, results in int32
load_global      ;; for all others

int8.store       ;; takes an int32 and truncates
int16.store      ;; takes an int32 and truncates
int32.store
int64.store
float32.store
float64.store
store_global     ;; truncates for globals of type int8/16

In the (presumably rarer case) that other combinations are needed, the respective conversion operators are readily available.

2. Loads and stores with local types + memory sizes, all conversions built-in

This is a variation of what was discussed in #82.

In this approach

  • There are no memory types.
  • Globals are declared with a local type and a size.
  • Memory loads and stores have a local type and a size.
  • In both cases, the size has to be smaller or equal to the local type's natural size
  • The access performs all necessary extensions or truncations.

The opcodes would be

int32.load8_s/u
int32.load16_s/u
int32.load32
int64.load8_s/u
int64.load16_s/u
int64.load32_s/u
int64.load64
float32.load32
float64.load32
float64.load64
load_global_s/u   ;; if the global's size is smaller than its type's natural size
load_global       ;; for all others

int32.store8
int32.store16
int32.store32
int64.store8
int64.store16
int64.store32
int64.store64
float32.store32
float64.store32
float64.store64
store_global

This scheme requires more opcodes, but saves extra use of conversions.

Additional types

In both schemes it would be easy to add support for additional types, e.g. a float16 memory type. In approach 1, it would introduce opcodes

float16.load    ;; extends to float32
float16.store   ;; truncates a float32

In approach 2:

float32.load16
float64.load16
float32.store16
float64.store16

Comparison

Approach 1 needs fewer opcodes (especially when adding more types) and has less redundancy with conversion operators. Approach 2 is a bit more explicit about the sizes and closer to what is currently in AstSemantics. Personally, I think apporach 1 is more attractive and less bloated.

@AndrewScheidecker
Copy link

A variation on approach 2 would be most consistent with the conversion operators as they are presently described in AstSemantics:

int32.load_signed[int8]
int64.extend_signed[int32]

The convention used by the conversion operators is output.convert[input], so if we go with approach 1 that should be changed to input.convert[output].

@jfbastien
Copy link
Member

I like solution 1. that you propose overall. A few issues though:

  • I see two problems with float16.load implicitly widening to float32:
    1. We can't have them widen to float64 directly.
    2. We can later decide to add a float16 local type cleanly. That type is very useful for low-precision computations, especially for a progressive refinement algorithm than then uses float32 or float64 after getting close enough to a solution. Real hardware has support for float16, so this isn't theoretical.
  • This applies to int8 and int16, but I think it's unlikely we'll find a reason to add them in the future. It does seem like we may want to be able to extend them to int64 though?
  • It's worth discussing whether we'll ever want wider integer types, and making sure the design can handle the addition of int128.
  • Global variables and atomics #314 discussed a few possibilities for globals, including making them array-able. I still like this idea.
  • As in Global variables and atomics #314 we should make sure this proposal will work when we add atomics, both from the heap as well as from globals. The POR is to do something akin to the proposal for JavaScript.
  • This proposal doesn't discuss the indexing capabilities of load/store. Is that intentional? It's somewhat orthogonal to your change, but I want to make sure that you agree/disagree on keeping indexing.

@lukewagner
Copy link
Member

IIUC, the differences between Approach 2 and what is currently in AstSemantics.md is:

  • syntax: int32.load_sx[int8] => int32.load8_s
  • added float32-to-float64 loads and float64-to-float32 stores.

and the point of the second bullet is that otherwise we're being asymmetric between int and float types. That seems fair.

I also like proposal 1 better for the reasons you gave. For the "far" conversions (int8-to-int64, float16-to-float64), I agree it makes sense to "require using an explicit conversion". Beefy compilers should have no problem optimizing; dumb compilers can pattern if necessary (which seems unlikely). Adding int8/int16 as local types would break the regularity of Proposal 1, but I don't see that happening.

Agreed with @AndrewScheidecker that, even if it's technically redundant, for symmetry with other op names, the result local type should prefix the opcode's name. I'd suggest you use the syntax in Proposal 2, e.g., int32.load8_s. That's just the opcode name, though; the Load/Store AST nodes could exclusively contain the memory_type.

Lastly, both proposals imply adding int8/int16 globals. I'm not necessarily opposed (though I don't really see the advantage, assuming apps don't have millions of globals), but this doesn't seem relevant to the load/store+conversion discussion so perhaps we could have that separately.

@jfbastien
Copy link
Member

@lukewagner
Copy link
Member

Ah, I hadn't noticed that; I had inferred from recent discussion that they had local type, sorry.

@sunfishcode sunfishcode added this to the MVP milestone Sep 3, 2015
@lukewagner
Copy link
Member

Ok, starting with solution 1, ignoring globals (which I hope we can remove from MVP anyway (#154)), and applying the convention of prefixing with result type (mentioned above) produces this set of load/store ops.

int32.load8_s/u
int32.load16_s/u
int32.load
int64.load
float32.load
float64.load
int32.store8
int32.store16
int32.store
int64.store
float32.store
float64.store

Look good to everyone (incl @sunfishcode, @titzer)? I like that this achieves a reasonable minimalism: to remove the 8->32 and 16->32 coercive ops, we'd have to introduce int8/int16 types local types, which is more work.

@jfbastien
Copy link
Member

@lukewagner this mostly lgtm, except:

  • We should discuss the f16 issue I mention above.
  • This should also mention alignment and indexing that load/store can have.
  • We should agree that atomic are a simple addition to this proposal.

@titzer
Copy link

titzer commented Sep 11, 2015

lgtm

On Fri, Sep 11, 2015 at 3:49 AM, Luke Wagner [email protected]
wrote:

Ok, starting with solution 1, ignoring globals (which I hope we can remove
from MVP anyway (#154 #154)),
and applying the convention of prefixing with result type (mentioned above)
produces this set of load/store ops.

int32.load8_s/u
int32.load16_s/u
int32.load
int64.load
float32.load
float64.load
int32.store8
int32.store16
int32.store
int64.store
float32.store
float64.store

Look good to everyone (incl @sunfishcode https://github.com/sunfishcode,
@titzer https://github.com/titzer)? I like that this achieves a
reasonable minimalism: to remove the 8->32 and 16->32 coercive ops, we'd
have to introduce int8/int16 types local types, which is more work.


Reply to this email directly or view it on GitHub
#326 (comment).

@kg
Copy link
Contributor

kg commented Sep 11, 2015

lgtm

@lukewagner
Copy link
Member

@jfbastien

  • I thought there was some agreement that we'd likely want both f16 loads and float16 local types at the same time, so the addition would just be float16.load/float16.store; no coercions.
  • Alignment seems out of scope for this issue's narrow focus on naming and conversions. The PR can explicitly mention that there is an optional /n suffix to specify n-byte alignment, though. I'm not sure what "indexing" means, but I think it would be out of scope too.
  • I also think atomics are out of scope.

Let's just look at the PR which I'll write in a bit, which should just be a minimal change to the Linear Memory section.

@jfbastien
Copy link
Member

@lukewagner it's not clear that we'll want f16 memory and local types on all architectures: some architectures only support promote/demote, whereas others support full arithmetic. Using feature testing, it means that in the former case the only valid uses would be load+promote / demote+store, assuming there's an f16 local type that can only load/store/promote/demote. Another option would be to not have a local type for f16 and fold the promote/demote into the load/store (akin to i8/i16).

I think you're suggesting we go with the later?

Indexing: I mean base+offset on load/store.

I'm asking about these other issues becasue @rossberg-chromium wants to revisit load/store. I want to make sure we do so for real! I'm OK with these, and want to make sure we're keeping them knowingly.

Assuming all of the above, the corresponding PR would lgtm.

@lukewagner
Copy link
Member

@jfbastien I'd like to keep the specific topic @rossberg-chromium filed this issue about and we discussed and agreed on separate from float16 and indexing.

@jfbastien
Copy link
Member

Regarding f16, I just want to avoid painting ourselves into a corner with this change! But it sounds like we agree on what we would do, so all is good.

@ghost
Copy link

ghost commented Sep 11, 2015

Consider of the 'indexing' in these memory access operations might need some more elaboration. When bounds checking is considered it can become important to distinguish if the pre-offset value is expected to be positive (for performance reasons).

For example, some code generates a signed index (say a natural result of a computation) then adds an offset to make its range positive and then looks up this index in a table which might be at a fixed address. Wasm has no concept of pointers so does not know if an added offset is applied to a potentially signed index or to an pointer that is expected to be positive. It could hurt performance if a wasm code generator folded these two offsets together, and perhaps the load/store index could be defined as expecting a positive source argument otherwise a slow path might be taken in bounds checking.

@lukewagner
Copy link
Member

I think this issue is resolved now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants