BinaryEncoding.md changes implied by #682 #727

lukewagner · 2016-07-18T22:33:44Z

This PR makes the binary format changes to import, export, memory, table, global, code, data and element sections as implied by #682. The PR is currently a branch off master but I'm expecting to rebase it against whatever binary_0x* branch we choose when we're ready to merge.

The existing s-expr text format can produce this new binary format, but new s-expr forms are necessary to activate the new functionality of #682:

(import "m" "f" memory <min> <max?>) / (import "m" "f" table <min> <max?>) to import (the default) module/table
(export "x" memory)/(export "y" table) to export (the default) module/table
(elem <off> x y z) to specify an element segment that writes functions x, y and z into the default table with offset specified by the initializer s-expr
(data <off> "data") to specify a data segment that writes data into the default memory with offset specified by the initializer s-expr
(table x y z) would now produce both a table section and an elem section
some new syntax is needed to define a table with initial/maximum (without elements); for SM prototype/testing purposes I used (table (resizable initial maximum)), but happy to have alternatives

This PR makes one change to Modules.md: the elements of the element section are changed to simply be indices (into the appropriate index space); the per-element type was redundant.

bnjbvr · 2016-07-19T09:34:22Z

BinaryEncoding.md

+Note that the initial/maximum fields are specified in units of 
+[WebAssembly pages](AstSemantics.md#linear-memory).
+
+### Global section


Can you indicate in the High Level structure section where this section is located? (and add a link to this anchor)

It appears that in this patch, the Global section is placed after the Import section. I assume there is a single Global index space; so this means that in the index space, all the imported global indexes will be strictly less than the first locally defined global, is that right? Independently of the final choice (imported global being before or after the locally defined globals), it might be worth telling something about it here or somewhere else (Modules.md), maybe?

Actually, if call_import is removed, function calls also share the functions index space and the same precision could be useful too.

That's right; that's how it works for all the importable definitions (functions, memories, tables, globals) as well as arguments+locals. This is documented in Modules.md in the "X index space" sections which I've tried to link to, where appropriate, from this doc. But I could try to add a few more links.

rossberg · 2016-07-19T12:06:06Z

On 19 July 2016 at 00:33, Luke Wagner [email protected] wrote:

The existing s-expr text format can produce this new binary format, but
new s-expr forms are necessary to activate the new functionality of #682
#682:

(import "m" "f" memory) / (import "m" "f" table) to import (the
default) module/table

(export "x" memory)/(export "y" table) to export (the default)
module/table

(elem off x y z) to initialize functions x, y and z int the default
table at offset

(table x y z) would now produce both a table section and an elem
section

some new syntax is needed to define a table with initial/maximum
(without elements); for SM prototype/testing purposes I used (table
(resizable initial maximum)), but happy to have alternatives

I was just starting to implement an S-expression format for this the other
day. My thoughts so far:

Table syntax and memory syntax should be symmetric. So table should
become (table <min> <max>? <table_segment>*).
Correspondingly, an element segment has the form (segment <offset> <var>*), where is an S-expr now (same generalisation for memory
segments).
We may keep the current table syntax as a shorthand.
But in fact it may make sense to bring the S-expression format closer to
the current section format, i.e., pull out the segments from memory and
table into their own data and elem declarations.
Import/export memory/table syntax like you propose, but imports also need
to include the limits. In accordance with the above, I would simply suggest
(import "a" "b" table <min> <max>?), similarly for memory.
We should make the syntax future proof wrt to multi-tables/memories.
Allowing identifier binders with their definitions and imports for that
purpose seems straightforward. For the default, maybe it's easiest to
introduce separate declarations (default table <var>) and (default memory <var>). These are only required if there is more than one of those
(but then you can't represent a program with just one table but no default,
hm...).

Come to think about it, would it make sense to declare defaults in their
own "default" section even in the binary format? It's a bit awkward to
mangle and distribute that flag in several possible places, because it's
kind of orthogonal. In particular, it's a bit weird that import specs have
this flag, since it has no relevance to the import semantics itself.

lukewagner · 2016-07-19T15:39:47Z

So table should become (table <min> <max>? <table_segment>*).

We may keep the current table syntax as a shorthand.

I think these two are in conflict since you won't be able to tell whether (table 1 2) is a table with min/max 1/2 or elements 1 and 2. (That's why I put in its own nested expression.)

But in fact it may make sense to bring the S-expression format closer to the current section format, i.e., pull out the segments from memory and table into their own data and elem declarations.

Agreed; we actually need to do this so that we can initialize imported table/memory. (This is what we've done in SM's proto s-expr format which I forgot to mention above.)

Import/export memory/table syntax like you propose, but imports also need to include the limits. In accordance with the above, I would simply suggest (import "a" "b" table <min> <max>?), similarly for memory.

Oops, yes, that's what we've implemented in our s-expr proto, I forgot to mention it above; I'll update it.

We should make the syntax future proof wrt to multi-tables/memories.

When we allow multiple tables/memories (which I think will necessarily block on GC, so it'll be a while, and by then I'm thinking we'll have a non-s-expr text format), I was thinking we could introduce it in a backwards-compatible way by saying: if you specify a name in the import/definition of a table/memory, then it's non-default. Then to refer to a non-default memory/table you write (memory $foo) instead of memory.

Come to think about it, would it make sense to declare defaults in their own "default" section even in the binary format? It's a bit awkward to mangle and distribute that flag in several possible places, because it's kind of orthogonal.

I kinda see the abstract point, but I don't see any practical downside to putting it on the import/def or practical upside to having a separate "defaults" section. It might actually be a hair more complicated since it produces a window, in the decoding process, where you have a memory/table but don't yet know if it's default or not. Until we allow non-default memories/tables it is definitely more work (b/c atm you trivially just check that the bit is set),

rossberg · 2016-07-19T16:21:02Z

On 19 July 2016 at 17:40, Luke Wagner [email protected] wrote:

So table should become (table ? <table_segment>*).

We may keep the current table syntax as a shorthand.

I think these two are in conflict since you won't be able to tell whether (table
1 2) is a table with min/max 1/2 or elements 1 and 2. (That's why I put
in its own nested expression.)

Aw right. In that case, I propose removing the current syntax, given that
it no longer matches the binary structure. We don't need to optimise this
construct for terseness.

We should make the syntax future proof wrt to multi-tables/memories.

When we allow multiple tables/memories (which I think will necessarily
block on GC, so it'll be a while, and by then I'm thinking we'll have a
non-s-expr text format), I was thinking we could introduce it in a
backwards-compatible way by saying: if you specify a name in the
import/definition of a table/memory, then it's non-default. Then to refer
to a non-default memory/table you write (memory $foo) instead of memory.

Yeah, that has the disadvantage that you cannot represent a case that
actually exists, and for which you'd probably like to write tests.

Come to think about it, would it make sense to declare defaults in their
own "default" section even in the binary format? It's a bit awkward to
mangle and distribute that flag in several possible places, because it's
kind of orthogonal.

I kinda see the abstract point, but I don't see any practical downside to
putting it on the import/def or practical upside to having a separate
"defaults" section. It might actually be a hair more complicated since it
produces a window, in the decoding process, where you have a memory/table
but don't yet know if it's default or not. Until we allow non-default
memories/tables it is definitely more work (b/c atm you trivially just
check that the bit is set),

Actually, to the contrary: it would completely avoid the need for
introducing any default flags/decls now. Even later you'd only require the
default section for "disambiguation" in modules that actually define/import
multiple tables/memories -- if a module has only one, that is automatically
taken as the default. Wouldn't that factor things more nicely?

It's a bit weird right now that the default flag is part of the
resizable_limits, although it has nothing to do with them -- it's rather a
coincidence that only resizable things have defaults. With the above, the
flag field could be removed from the limits for good.

rossberg · 2016-07-19T16:47:47Z

BinaryEncoding.md

+          (in the MVP, *every* memory/table import/definition must be flagged as default)
+* `0x2` : indicates that the memory/table has a specified maximum
+
+### init_expr


Btw, are all uses of init_expr evaluated in order? What happens if a get_global in one of them refers to a not-yet-initialised global?

I should also state the requirement here but Modules.md#initializer-expression requires that the get_global only refer to an immutable import which will necessarily precede all the uses.

lukewagner · 2016-07-19T17:48:46Z

Aw right. In that case, I propose removing the current syntax

sgtm. The current hack in SM is just to allow existing .wasts to work unchanged (so it's easy to test on existing demo builds), but I'm happy to unhack.

Actually, to the contrary: it would completely avoid the need for introducing any default flags/decls now.

Oh, then I guess I misunderstood what you were proposing; so you're just saying that, when there's only 1, it's the default by, well, default. Then post-MVP, when we support >1, we can decide how to pick which is default. That sounds good to me and keeps our options open.

rossberg · 2016-07-21T15:47:43Z

Btw, shouldn't this be a PR against the binary_0xc (or binary_0xd) branch?

lukewagner · 2016-07-21T17:31:22Z

Right, some binary_0x* branch; I'm not expecting to merge this into master. As I noted in the OP, I'm waiting to see what people want to do for either folding this into binary_0xc or waiting until binary_0xd.

rossberg · 2016-07-22T12:28:36Z

@lukewagner, actually, shouldn't the same have applied to #682? I just realised it got merged into master. ;)

lukewagner · 2016-07-22T15:14:57Z

@rossberg-chromium Well, the nascent protocol we've been using is to only use the binary_0x* branch for BinaryEncoding.md changes (hence "binary_" :) for the purpose of coordinating landing the breaking binary format changes in engines. Post-MVP, breaking changes will cease, so probably we should shift to a more regular "everything related to a proposed feature is on a branch until the whole thing merges".

rossberg · 2016-07-22T19:39:21Z

BinaryEncoding.md

-| `call` | `0x16` | argument_count : `varuint1`, function_index : `varuint32` | call a function by its index |
+| `get_global` | `0x18` | global_index : `varuint32` | read a global variable |
+| `set_global` | `0x19` | global_index : `varuint32` | write a global variable |
+| `call` | `0x16` | argument_count : `varuint1`, function_index : `varuint32` | call a function by its [index](Modules.md#function-index-space) |


Thought I had commented on that already, but can't find the comment anymore: fyi, this conflicts with #711, will have to resolve somehow.

Since we're going to defragment this opcode space anyway, I'll change these codes.

flagxor · 2016-07-27T13:16:05Z

Luke this seems good, can you resolve conflicts and merge?

…686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed".

Make opcode 0x00 `unreachable`, and move `nop` to a non-zero opcode. All-zeros is one of the more common patterns of corrupted data. This change makes it more likely that code that is accidentally zeroed, in whole or in part, will be noticed when executed rather than silently running through a nop slide. Obviously, this doesn't matter when an opcode table is present, but if there is a default opcode table, it would presumably use the opcodes defined here.

…ble going before Memory

…entation of globals

lukewagner · 2016-07-27T19:37:42Z

Righto. Merged to binary_0xc (not master, as discussed above). (GH doesn't let me simply retarget this PR, so I'll just close this now.)

lukewagner added this to the MVP milestone Jul 19, 2016

bnjbvr reviewed Jul 19, 2016
View reviewed changes

rossberg reviewed Jul 19, 2016
View reviewed changes

flagxor added the binary format label Jul 20, 2016

This was referenced Jul 21, 2016

RFC: Add support for multiple tables WebAssembly/binaryen#642

Closed

Implement globals WebAssembly/spec#313

Merged

lukewagner mentioned this pull request Jul 21, 2016

Updating import/export tables in the binary format #522

Closed

rossberg mentioned this pull request Jul 22, 2016

Binary format support for wasm globals WebAssembly/binaryen#652

Merged

rossberg reviewed Jul 22, 2016
View reviewed changes

sunfishcode and others added 3 commits July 27, 2016 12:25

Clarify that wasm may be viewed as either an AST or a stack machine. (#…

f32008b

…686) * Clarify that wasm may be viewed as either an AST or a stack machine. * Reword the introductory paragraph. * Add parens, remove "typed".

BinaryEncoding.md changes implied by #682

e3c2bcc

Luke Wagner added 12 commits July 27, 2016 12:29

Fix thinko in import section

b1c24ba

Rename definition_kind to external_kind for precision

b3ea227

Rename resizable_definition to resizable_limits

d9c0434

Add opcode delimiter to init_expr

c48bdb3

Add Elem section to ToC and move it before Data section to reflect Ta…

08b1fd6

…ble going before Memory

Add missing init_expr to global variables and undo the grouped repres…

30c66b9

…entation of globals

Note that only immutable globals can be exported

0b014b5

Change the other 'mutability' flag to 'varuint1'

7b03fd4

Give 'anyfunc' its own opcode

21562ee

Add note about immutable global import requirement

b15d041

Remove explicit 'default' flag; make memory/table default by default

63ef7f7

Change (get|set)_global opcodes

fd4268f

lukewagner force-pushed the table-memory-binary-changes branch from 0419cc1 to fd4268f Compare July 27, 2016 19:31

lukewagner closed this Jul 27, 2016

lukewagner deleted the table-memory-binary-changes branch July 27, 2016 19:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BinaryEncoding.md changes implied by #682 #727

BinaryEncoding.md changes implied by #682 #727

lukewagner commented Jul 18, 2016 •

edited

Loading

bnjbvr Jul 19, 2016

bnjbvr Jul 19, 2016

lukewagner Jul 19, 2016

rossberg commented Jul 19, 2016

lukewagner commented Jul 19, 2016

rossberg commented Jul 19, 2016

So table should become (table ? <table_segment>*).

rossberg Jul 19, 2016

lukewagner Jul 19, 2016

lukewagner commented Jul 19, 2016 •

edited

Loading

rossberg commented Jul 21, 2016

lukewagner commented Jul 21, 2016

rossberg commented Jul 22, 2016

lukewagner commented Jul 22, 2016

rossberg Jul 22, 2016

lukewagner Jul 22, 2016

flagxor commented Jul 27, 2016

lukewagner commented Jul 27, 2016

BinaryEncoding.md changes implied by #682 #727

BinaryEncoding.md changes implied by #682 #727

Conversation

lukewagner commented Jul 18, 2016 • edited Loading

bnjbvr Jul 19, 2016

Choose a reason for hiding this comment

bnjbvr Jul 19, 2016

Choose a reason for hiding this comment

lukewagner Jul 19, 2016

Choose a reason for hiding this comment

rossberg commented Jul 19, 2016

lukewagner commented Jul 19, 2016

rossberg commented Jul 19, 2016

So table should become (table ? <table_segment>*).

rossberg Jul 19, 2016

Choose a reason for hiding this comment

lukewagner Jul 19, 2016

Choose a reason for hiding this comment

lukewagner commented Jul 19, 2016 • edited Loading

rossberg commented Jul 21, 2016

lukewagner commented Jul 21, 2016

rossberg commented Jul 22, 2016

lukewagner commented Jul 22, 2016

rossberg Jul 22, 2016

Choose a reason for hiding this comment

lukewagner Jul 22, 2016

Choose a reason for hiding this comment

flagxor commented Jul 27, 2016

lukewagner commented Jul 27, 2016

lukewagner commented Jul 18, 2016 •

edited

Loading

lukewagner commented Jul 19, 2016 •

edited

Loading