Skip to content

Commit d992a1a

Browse files
committed
Binary encoder & decoder (#268)
Add working (post-order) encoder & decoder, and several additions to make it useful. For example, you can now invoke the interpreter like ``` wasm module.wast -o module.wasm ``` to convert from text to binary. In a while, the same will also be possible in the inverse direction. I also extended the script language with (input <file>) and (output <file>) commands. Both are supposed to be able to handle both wast and wasm eventually. The former allows including other scripts or binary modules, the latter allows conversion as part of a script. Finally, the command line now supports an -e <script> option, which enables to give commands directly. This is useful, for example, when intermingled with binary module arguments, e.g., to invoke exports: ``` wasm module.wasm -e '(invoke "foo")' ``` Furthermore, extends the test runner with encoding & decoding of all test files via a wast->wasm transcoding and loading the resulting modules. Works for all current tests. There are still a number of smaller TODOs left, to be addressed in follow-ups. Other follow-up work: wasm->wast reverse transcoding, more aggressive tests and testing capabilities. We'll probably also need to revise the use of different integer types throughout the spec, which is not always consistent with what the binary format supports.
1 parent 573a177 commit d992a1a

27 files changed

+1401
-202
lines changed

ml-proto/README.md

Lines changed: 39 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,13 @@ This repository implements a prototypical reference interpreter for WebAssembly.
55
Currently, it can
66

77
* *parse* a simple S-expression format,
8+
* *decode* the binary format (work in progress),
89
* *validate* modules defined in it,
9-
* *execute* invocations to functions exported by a module.
10+
* *execute* invocations to functions exported by a module,
11+
* *encode* the binary format,
12+
* *prettyprint* the S-expression format (work in progress).
1013

11-
The file format is a (very dumb) form of *script* that cannot just define a module, but also batch a sequence of invocations.
14+
The S-expression format is a (very dumb) form of *script* that cannot just define a module, but in fact a sequence of them, and a batch of invocations, assertions, and conversions to each one. As such it is different from the binary format, with the additional functionality purely intended as testing infrastructure. (See [below](#scripts) for details.)
1215

1316
The interpreter can also be run as a REPL, allowing to enter pieces of scripts interactively.
1417

@@ -61,17 +64,34 @@ Either way, in order to run the test suite you'll need to have Python installed.
6164
You can call the executable with
6265

6366
```
64-
wasm [option] [file ...]
67+
wasm [option | file ...]
6568
```
6669

67-
where `file` is a script file (see below) to be run. If no file is given, you'll get into the REPL and can enter script commands interactively. You can also get into the REPL by explicitly passing `-` as a file name. You can do that in combination to giving a module file, so that you can then invoke its exports interactively, e.g.:
70+
where `file`, depending on its extension, either should be an S-expression script file (see below) to be run, or a binary module file to be loaded.
71+
72+
A file prefixed by `-o` is taken to be an output file. Depending on its extension, this will write out the preceding module definition in either S-expression or binary format. This option can be used to convert between the two in both directions, e.g.:
6873

6974
```
70-
./wasm module.wast -
75+
wasm -d module.wast -o module.wasm
76+
wasm -d module.wasm -o module.wast
7177
```
72-
Note however that the REPL currently is too dumb to allow multi-line input. :)
7378

74-
See `wasm -h` for (the few) options.
79+
The `-d` option selects "dry mode" and ensures that the input module is not run, even if it has a start section.
80+
In the second case, the produced script contains exactly one module definition (work in progress).
81+
82+
Finally, the option `-e` allows to provide arbitrary script commands directly on the command line. For example:
83+
84+
```
85+
wasm module.wasm -e '(invoke "foo")'
86+
```
87+
88+
If neither a file nor any of the previous options is given, you'll land in the REPL and can enter script commands interactively. You can also get into the REPL by explicitly passing `-` as a file name. You can do that in combination to giving a module file, so that you can then invoke its exports interactively, e.g.:
89+
90+
```
91+
wasm module.wast -
92+
```
93+
94+
See `wasm -h` for (the few) additional options.
7595

7696

7797
## S-Expression Syntax
@@ -168,9 +188,13 @@ cmd:
168188
( assert_return_nan (invoke <name> <expr>* )) ;; assert return with floating point nan result of invocation
169189
( assert_trap (invoke <name> <expr>* ) <failure> ) ;; assert invocation traps with given failure string
170190
( assert_invalid <module> <failure> ) ;; assert invalid module with given failure string
191+
( input <string> ) ;; read script or module from file
192+
( output <string> ) ;; output module to file
171193
```
172194

173-
Invocation is only possible after a module has been defined.
195+
Commands are executed in sequence. Invocation, assertions, and output apply to the most recently defined module (the _current_ module), and are only possible after a module has been defined. Note that there only ever is one current module, the different module definitions cannot interact.
196+
197+
The input and output commands determine the requested file format from the file name extension. They can handle both `.wast` and `.wasm` files. In the case of input, a `.wast` script will be recursively executed.
174198

175199
Again, this is only a meta-level for testing, and not a part of the language proper.
176200

@@ -202,11 +226,15 @@ The implementation consists of the following parts:
202226

203227
* *Parser* (`lexer.mll`, `parser.mly`, `desguar.ml[i]`). Generated with ocamllex and ocamlyacc. The lexer does the opcode encoding (non-trivial tokens carry e.g. type information as semantic values, as declared in `parser.mly`), the parser the actual S-expression parsing. The parser generates a full AST that is desugared into the kernel AST in a separate pass.
204228

229+
* *Pretty Printer* (`prettyprint.ml[i]`). Turns a module AST back into the textual S-expression format. (Work in progress)
230+
231+
* *Decoder*/*Encoder* (`decode.ml[i]`, `encode.ml[i]`). The former (work in progress) parses the binary format and turns it into an AST, the latter does the inverse.
232+
205233
* *Validator* (`check.ml[i]`). Does a recursive walk of the kernel AST, passing down the *expected* type for expressions, and checking each expression against that. An expected empty type can be matched by any result, corresponding to implicit dropping of unused values (e.g. in a block).
206234

207235
* *Evaluator* (`eval.ml[i]`, `values.ml`, `arithmetic.ml[i]`, `int.ml`, `float.ml`, `memory.ml[i]`, and a few more). Evaluation of control transfer (`br` and `return`) is implemented using local exceptions as "labels". While these are allocated dynamically in the code and addressed via a stack, that is merely to simplify the code. In reality, these would be static jumps.
208236

209-
* *Driver* (`main.ml`, `script.ml[i]`, `error.ml`, `print.ml[i]`, `flags.ml`). Executes scripts, reports results or errors, etc.
237+
* *Driver* (`main.ml`, `run.ml[i]`, `script.ml[i]`, `error.ml`, `print.ml[i]`, `flags.ml`). Executes scripts, reports results or errors, etc.
210238

211239
The most relevant pieces are probably the validator (`check.ml`) and the evaluator (`eval.ml`). They are written to look as much like a "specification" as possible. Hopefully, the code is fairly self-explanatory, at least for those with a passing familiarity with functional programming.
212240

@@ -215,6 +243,6 @@ In typical FP convention (and for better readability), the code tends to use sin
215243

216244
## What Next?
217245

218-
* Binary format as input and output.
246+
* More tests.
219247

220-
* Compilation to JS/asm.js.
248+
* Compilation to JS/asm.js?

ml-proto/given/lib.ml

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,8 @@
11
module List =
22
struct
3+
let rec make n x =
4+
if n = 0 then [] else x :: make (n - 1) x
5+
36
let rec take n xs =
47
match n, xs with
58
| 0, _ -> []

ml-proto/given/lib.mli

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
module List :
44
sig
5+
val make : int -> 'a -> 'a list
56
val take : int -> 'a list -> 'a list
67
val drop : int -> 'a list -> 'a list
78

ml-proto/given/source.ml

Lines changed: 8 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
type pos = {file : string; line : int; column : int}
22
type region = {left : pos; right : pos}
3-
type 'a phrase = { at : region; it : 'a}
3+
type 'a phrase = {at : region; it : 'a}
44

55

66
(* Positions and regions *)
@@ -9,9 +9,14 @@ let no_pos = {file = ""; line = 0; column = 0}
99
let no_region = {left = no_pos; right = no_pos}
1010

1111
let string_of_pos pos =
12-
string_of_int pos.line ^ "." ^ string_of_int (pos.column + 1)
12+
if pos.line = -1 then
13+
string_of_int pos.column
14+
else
15+
string_of_int pos.line ^ "." ^ string_of_int (pos.column + 1)
16+
1317
let string_of_region r =
14-
r.left.file ^ ":" ^ string_of_pos r.left ^ "-" ^ string_of_pos r.right
18+
r.left.file ^ ":" ^ string_of_pos r.left ^
19+
(if r.right = r.left then "" else "-" ^ string_of_pos r.right)
1520

1621
let before region = {left = region.left; right = region.left}
1722
let after region = {left = region.right; right = region.right}

0 commit comments

Comments
 (0)