Skip to content

BinaryEncoding.md changes implied by #682 #727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from

Conversation

lukewagner
Copy link
Member

@lukewagner lukewagner commented Jul 18, 2016

This PR makes the binary format changes to import, export, memory, table, global, code, data and element sections as implied by #682. The PR is currently a branch off master but I'm expecting to rebase it against whatever binary_0x* branch we choose when we're ready to merge.

The existing s-expr text format can produce this new binary format, but new s-expr forms are necessary to activate the new functionality of #682:

  • (import "m" "f" memory <min> <max?>) / (import "m" "f" table <min> <max?>) to import (the default) module/table
  • (export "x" memory)/(export "y" table) to export (the default) module/table
  • (elem <off> x y z) to specify an element segment that writes functions x, y and z into the default table with offset specified by the initializer s-expr
  • (data <off> "data") to specify a data segment that writes data into the default memory with offset specified by the initializer s-expr
  • (table x y z) would now produce both a table section and an elem section
  • some new syntax is needed to define a table with initial/maximum (without elements); for SM prototype/testing purposes I used (table (resizable initial maximum)), but happy to have alternatives

This PR makes one change to Modules.md: the elements of the element section are changed to simply be indices (into the appropriate index space); the per-element type was redundant.

@lukewagner lukewagner added this to the MVP milestone Jul 19, 2016
Note that the initial/maximum fields are specified in units of
[WebAssembly pages](AstSemantics.md#linear-memory).

### Global section
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you indicate in the High Level structure section where this section is located? (and add a link to this anchor)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that in this patch, the Global section is placed after the Import section. I assume there is a single Global index space; so this means that in the index space, all the imported global indexes will be strictly less than the first locally defined global, is that right? Independently of the final choice (imported global being before or after the locally defined globals), it might be worth telling something about it here or somewhere else (Modules.md), maybe?

Actually, if call_import is removed, function calls also share the functions index space and the same precision could be useful too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right; that's how it works for all the importable definitions (functions, memories, tables, globals) as well as arguments+locals. This is documented in Modules.md in the "X index space" sections which I've tried to link to, where appropriate, from this doc. But I could try to add a few more links.

@rossberg
Copy link
Member

On 19 July 2016 at 00:33, Luke Wagner [email protected] wrote:

The existing s-expr text format can produce this new binary format, but
new s-expr forms are necessary to activate the new functionality of #682
#682:

  • (import "m" "f" memory) / (import "m" "f" table) to import (the
    default) module/table
  • (export "x" memory)/(export "y" table) to export (the default)
    module/table
  • (elem off x y z) to initialize functions x, y and z int the default
    table at offset
  • (table x y z) would now produce both a table section and an elem
    section
  • some new syntax is needed to define a table with initial/maximum
    (without elements); for SM prototype/testing purposes I used (table
    (resizable initial maximum)), but happy to have alternatives

I was just starting to implement an S-expression format for this the other
day. My thoughts so far:

  • Table syntax and memory syntax should be symmetric. So table should
    become (table <min> <max>? <table_segment>*).
  • Correspondingly, an element segment has the form (segment <offset> <var>*), where is an S-expr now (same generalisation for memory
    segments).
  • We may keep the current table syntax as a shorthand.
  • But in fact it may make sense to bring the S-expression format closer to
    the current section format, i.e., pull out the segments from memory and
    table into their own data and elem declarations.
  • Import/export memory/table syntax like you propose, but imports also need
    to include the limits. In accordance with the above, I would simply suggest
    (import "a" "b" table <min> <max>?), similarly for memory.
  • We should make the syntax future proof wrt to multi-tables/memories.
    Allowing identifier binders with their definitions and imports for that
    purpose seems straightforward. For the default, maybe it's easiest to
    introduce separate declarations (default table <var>) and (default memory <var>). These are only required if there is more than one of those
    (but then you can't represent a program with just one table but no default,
    hm...).

Come to think about it, would it make sense to declare defaults in their
own "default" section even in the binary format? It's a bit awkward to
mangle and distribute that flag in several possible places, because it's
kind of orthogonal. In particular, it's a bit weird that import specs have
this flag, since it has no relevance to the import semantics itself.

@lukewagner
Copy link
Member Author

  • So table should become (table <min> <max>? <table_segment>*).
  • We may keep the current table syntax as a shorthand.

I think these two are in conflict since you won't be able to tell whether (table 1 2) is a table with min/max 1/2 or elements 1 and 2. (That's why I put in its own nested expression.)

But in fact it may make sense to bring the S-expression format closer to the current section format, i.e., pull out the segments from memory and table into their own data and elem declarations.

Agreed; we actually need to do this so that we can initialize imported table/memory. (This is what we've done in SM's proto s-expr format which I forgot to mention above.)

Import/export memory/table syntax like you propose, but imports also need to include the limits. In accordance with the above, I would simply suggest (import "a" "b" table <min> <max>?), similarly for memory.

Oops, yes, that's what we've implemented in our s-expr proto, I forgot to mention it above; I'll update it.

We should make the syntax future proof wrt to multi-tables/memories.

When we allow multiple tables/memories (which I think will necessarily block on GC, so it'll be a while, and by then I'm thinking we'll have a non-s-expr text format), I was thinking we could introduce it in a backwards-compatible way by saying: if you specify a name in the import/definition of a table/memory, then it's non-default. Then to refer to a non-default memory/table you write (memory $foo) instead of memory.

Come to think about it, would it make sense to declare defaults in their own "default" section even in the binary format? It's a bit awkward to mangle and distribute that flag in several possible places, because it's kind of orthogonal.

I kinda see the abstract point, but I don't see any practical downside to putting it on the import/def or practical upside to having a separate "defaults" section. It might actually be a hair more complicated since it produces a window, in the decoding process, where you have a memory/table but don't yet know if it's default or not. Until we allow non-default memories/tables it is definitely more work (b/c atm you trivially just check that the bit is set),

@rossberg
Copy link
Member

On 19 July 2016 at 17:40, Luke Wagner [email protected] wrote:

So table should become (table ? <table_segment>*).

We may keep the current table syntax as a shorthand.

I think these two are in conflict since you won't be able to tell whether (table
1 2) is a table with min/max 1/2 or elements 1 and 2. (That's why I put
in its own nested expression.)

Aw right. In that case, I propose removing the current syntax, given that
it no longer matches the binary structure. We don't need to optimise this
construct for terseness.

We should make the syntax future proof wrt to multi-tables/memories.

When we allow multiple tables/memories (which I think will necessarily
block on GC, so it'll be a while, and by then I'm thinking we'll have a
non-s-expr text format), I was thinking we could introduce it in a
backwards-compatible way by saying: if you specify a name in the
import/definition of a table/memory, then it's non-default. Then to refer
to a non-default memory/table you write (memory $foo) instead of memory.

Yeah, that has the disadvantage that you cannot represent a case that
actually exists, and for which you'd probably like to write tests.

Come to think about it, would it make sense to declare defaults in their
own "default" section even in the binary format? It's a bit awkward to
mangle and distribute that flag in several possible places, because it's
kind of orthogonal.

I kinda see the abstract point, but I don't see any practical downside to
putting it on the import/def or practical upside to having a separate
"defaults" section. It might actually be a hair more complicated since it
produces a window, in the decoding process, where you have a memory/table
but don't yet know if it's default or not. Until we allow non-default
memories/tables it is definitely more work (b/c atm you trivially just
check that the bit is set),

Actually, to the contrary: it would completely avoid the need for
introducing any default flags/decls now. Even later you'd only require the
default section for "disambiguation" in modules that actually define/import
multiple tables/memories -- if a module has only one, that is automatically
taken as the default. Wouldn't that factor things more nicely?

It's a bit weird right now that the default flag is part of the
resizable_limits, although it has nothing to do with them -- it's rather a
coincidence that only resizable things have defaults. With the above, the
flag field could be removed from the limits for good.

(in the MVP, *every* memory/table import/definition must be flagged as default)
* `0x2` : indicates that the memory/table has a specified maximum

### init_expr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, are all uses of init_expr evaluated in order? What happens if a get_global in one of them refers to a not-yet-initialised global?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should also state the requirement here but Modules.md#initializer-expression requires that the get_global only refer to an immutable import which will necessarily precede all the uses.

@lukewagner
Copy link
Member Author

lukewagner commented Jul 19, 2016

Aw right. In that case, I propose removing the current syntax

sgtm. The current hack in SM is just to allow existing .wasts to work unchanged (so it's easy to test on existing demo builds), but I'm happy to unhack.

Actually, to the contrary: it would completely avoid the need for introducing any default flags/decls now.

Oh, then I guess I misunderstood what you were proposing; so you're just saying that, when there's only 1, it's the default by, well, default. Then post-MVP, when we support >1, we can decide how to pick which is default. That sounds good to me and keeps our options open.

@rossberg
Copy link
Member

Btw, shouldn't this be a PR against the binary_0xc (or binary_0xd) branch?

@lukewagner
Copy link
Member Author

Right, some binary_0x* branch; I'm not expecting to merge this into master. As I noted in the OP, I'm waiting to see what people want to do for either folding this into binary_0xc or waiting until binary_0xd.

@rossberg
Copy link
Member

@lukewagner, actually, shouldn't the same have applied to #682? I just realised it got merged into master. ;)

@lukewagner
Copy link
Member Author

@rossberg-chromium Well, the nascent protocol we've been using is to only use the binary_0x* branch for BinaryEncoding.md changes (hence "binary_" :) for the purpose of coordinating landing the breaking binary format changes in engines. Post-MVP, breaking changes will cease, so probably we should shift to a more regular "everything related to a proposed feature is on a branch until the whole thing merges".

| `call` | `0x16` | argument_count : `varuint1`, function_index : `varuint32` | call a function by its index |
| `get_global` | `0x18` | global_index : `varuint32` | read a global variable |
| `set_global` | `0x19` | global_index : `varuint32` | write a global variable |
| `call` | `0x16` | argument_count : `varuint1`, function_index : `varuint32` | call a function by its [index](Modules.md#function-index-space) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I had commented on that already, but can't find the comment anymore: fyi, this conflicts with #711, will have to resolve somehow.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we're going to defragment this opcode space anyway, I'll change these codes.

@flagxor
Copy link
Member

flagxor commented Jul 27, 2016

Luke this seems good, can you resolve conflicts and merge?

sunfishcode and others added 3 commits July 27, 2016 12:25
…686)

* Clarify that wasm may be viewed as either an AST or a stack machine.

* Reword the introductory paragraph.

* Add parens, remove "typed".
Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode.

All-zeros is one of the more common patterns of corrupted data. This
change makes it more likely that code that is accidentally zeroed, in
whole or in part, will be noticed when executed rather than silently
running through a nop slide.

Obviously, this doesn't matter when an opcode table is present, but
if there is a default opcode table, it would presumably use the
opcodes defined here.
@lukewagner lukewagner force-pushed the table-memory-binary-changes branch from 0419cc1 to fd4268f Compare July 27, 2016 19:31
@lukewagner
Copy link
Member Author

Righto. Merged to binary_0xc (not master, as discussed above). (GH doesn't let me simply retarget this PR, so I'll just close this now.)

@lukewagner lukewagner closed this Jul 27, 2016
@lukewagner lukewagner deleted the table-memory-binary-changes branch July 27, 2016 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants