Invalid unreachable loop in text format #1541

kripken · 2018-05-10T14:49:46Z

E.g.

(module
 (func $0
  (drop
   (block $label$1 (result i32)
    (loop $label$2
     (unreachable)
    )
   )
  )
 )
)

The loop is unreachable, but we need to emit an unreachable after the loop too, for the block, wabt says

w.wast:6:7: error: type mismatch in block, expected [i32] but got []

Not sure why this is just a bug in the text format - the binary for it is fine.

The text was updated successfully, but these errors were encountered:

MaxGraey · 2018-05-10T15:30:12Z

Hmm, what if just remove return result for such kind of code? Like:

(module
 (func $0
   (block $label$1
    (loop $label$2
     (unreachable)
    )
   )
 )
)

Or this sintetic example?

Because this doesn't depend on unreachable. Current code behave (failed) the same:

(module
 (func $0
  (drop
   (block $label$1 (result i32)
    (loop $label$2
     (nop)
    )
   )
  )
 )
)

kripken · 2018-05-10T15:54:38Z

It's true that removing the result would make this example work, but what would the new rules be for when to do so - how would that fit in the current typing rules? I'm not sure offhand how that could be done, and this is tricky stuff with many corner cases I'm afraid...

In general, if there isn't a simple fix for something like this, it may just be something we document as a text format difference between binaryen and wat - there are already a few in the list. While we need binary format support to be perfect - that's the point of binaryen :) - the text format is meant to represent our internal IR, which has been diverging from wasm. We may want to just stop claiming to support wat, and instead have our own byn text format as has been discussed before.

dcodeIO · 2018-05-11T10:06:07Z

For reference, the following is the saved binary from

var mod = new binaryen.Module();
var funcType = mod.addFunctionType("v", binaryen.none, []);
var func = mod.addFunction("0", funcType, [],
  mod.drop(
    mod.block("label$1", [
      mod.loop("label$2",
        mod.unreachable()
      )
    ], binaryen.i32)
  )
);
mod.addExport("0", "0");

When reading it back, it becomes the original (apparently invalid) text format:

(module
 (type $0 (func))
 (export "0" (func $0))
 (func $0 (; 0 ;) (type $0)
  (drop
   (block $label$1 (result i32)
    (loop $label$2
     (unreachable)
    )
   )
  )
 )
)

So...

Not sure why this is just a bug in the text format - the binary for it is fine.

is this just the text printer missing something the binary writer adds implicitly? Asking because when looking at the comments above, it appears to be more complicated than that?

kripken · 2018-05-11T14:46:06Z

I believe the binary writer adds the extra unreachable after the loop. The reason is that Binaryen IR has an unreachable type for loops, but wasm binaries do not. To fix that, after it emits a loop that is unreachable it also emits an unreachable opcode, which ensures we are in an unreachable zone in the binary. And when reading wasm we do the opposite, in effect.

It's not clear how to fix this for the text format, though:

A "proper" fix would be to remove the unreachable type from the IR. See previous discussions on wasm diverging from Binaryen after wasm became a stack machine and removed unreachable loops, blocks and ifs. The core issue, iirc, is that unreachability makes optimization more easy - you can replace an unreachable loop with a br, and know it is valid because it has the same type, while if the loop has the empty type then it's only clear from the outside context whether the replacement would be ok, which is messy.
Rename Binaryen's current text format ".byt" ("binaryen text"), and add another input and output format, the .wat format. Then Binaryen would have three input/output formats, .wasm, .wat, .byt.

dcodeIO · 2018-05-12T09:30:30Z

What if we'd just do

    if (curr->type != none && curr->body->type == unreachable) {
      o << maybeNewLine;
      !minify && doIndent(o, indent);
      printOpening(o, "unreachable");
      o << ')';
    }

right below this line, similar to what the binary writer does?

$> wasm-opt the-file.wat --print

(module
 (type $0 (func))
 (func $0 (; 0 ;) (type $0)
  (drop
   (block $label$1 (result i32)
    (loop $label$2
     (unreachable)
    )
    (unreachable)
   )
  )
 )
)

kripken · 2018-05-12T15:15:03Z

We could do that, but we'd also need to make changes in the text input code to handle that, and the bigger issue is that by making such changes the text format would no longer represent Binaryen IR as it currently does. And having a good way to print the internal IR is necessary for debugging and development. So we'd need to support two text formats, .byt and .wat as mentioned before.

I'm not saying that's a bad idea, but it would take some work to do, and I'm not sure it's worth it. How important is it that binaryen can emit the wasm text format correctly?

dcodeIO · 2018-05-12T16:22:33Z

How important is it that binaryen can emit the wasm text format correctly?

For AssemblyScript it's quite convenient to have a text format printer provided by Binaryen (while there isn't really a need to read text format), so it can be its sole dependency. I've tried to avoid adding WABT for example because it adds another megabyte or so of JS just for printing text format.

kripken · 2018-05-12T18:43:11Z

I see, good point @dcodeIO

So, if emitting wasm text is important but reading it is not, then that's not too hard to do, we can make a variant of the --print pass, maybe --print-wat, and internally Print would have a parameter whether to conform to proper wat syntax.

Before going down that route, is there anyone else that does care about reading proper wat as well?

binji · 2018-05-12T20:06:32Z

I've tried to avoid adding WABT for example because it adds another megabyte or so of JS just for printing text format.

Currently the JS build of wabt includes a lot of extra cruft -- it wouldn't be too hard to separate some of it out to slim down the library.

MaxGraey · 2018-05-12T21:32:26Z

@binji it will be great if wasm-objdump provide an opportunity to optionally emit machine code representation (x86-64 for example) of wasm file. This allowed by capstone and one of standalone VM for example and Web Aseembly Studio used this. WDYT?

binji · 2018-05-12T23:17:39Z

@MaxGraey that's a bit outside the scope of wasm-objdump, in my opinion. What you'd probably want is something more like compiler explorer where you can provide wasm input and choose the VM to produce assembly output.

MaxGraey · 2018-05-12T23:36:11Z

Yah, may be better to create another separate tool for this routine. The main idea to determine which is codegen quality better for specific wasm operations or optimizations and how this changes during evolution some VM to get informative feedback and estimation about some binaryen performance optimizations.

PS Sorry for offtop

kripken · 2018-05-14T20:35:01Z

@binji "it wouldn't be too hard to separate some of it out to slim down the library."

Yeah, we could make a wabt build with just the binary loading and text printing code included, perhaps? Might be as easy as only specifying those functions in the list of exported functions.

binji · 2018-05-14T22:53:29Z

@kripken Hm, not so sure now that I take a quick look. It seems at least 350k or so is the re2c generated lexing function! Haven't really dug into the rest, but it looks like the callback-based parsing might be bloating the resulting library too.

kripken added the help wanted label May 10, 2018

WebAssembly deleted a comment from lovepasswords May 19, 2018

MaxGraey mentioned this issue Jan 3, 2020

return early AssemblyScript/assemblyscript#1044

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Invalid unreachable loop in text format #1541

Invalid unreachable loop in text format #1541

kripken commented May 10, 2018

MaxGraey commented May 10, 2018 •

edited

Loading

kripken commented May 10, 2018

dcodeIO commented May 11, 2018 •

edited

Loading

kripken commented May 11, 2018

dcodeIO commented May 12, 2018 •

edited

Loading

kripken commented May 12, 2018

dcodeIO commented May 12, 2018

kripken commented May 12, 2018

binji commented May 12, 2018

MaxGraey commented May 12, 2018 •

edited

Loading

binji commented May 12, 2018

MaxGraey commented May 12, 2018 •

edited

Loading

kripken commented May 14, 2018

binji commented May 14, 2018

Invalid unreachable loop in text format #1541

Invalid unreachable loop in text format #1541

Comments

kripken commented May 10, 2018

MaxGraey commented May 10, 2018 • edited Loading

kripken commented May 10, 2018

dcodeIO commented May 11, 2018 • edited Loading

kripken commented May 11, 2018

dcodeIO commented May 12, 2018 • edited Loading

kripken commented May 12, 2018

dcodeIO commented May 12, 2018

kripken commented May 12, 2018

binji commented May 12, 2018

MaxGraey commented May 12, 2018 • edited Loading

binji commented May 12, 2018

MaxGraey commented May 12, 2018 • edited Loading

kripken commented May 14, 2018

binji commented May 14, 2018

MaxGraey commented May 10, 2018 •

edited

Loading

dcodeIO commented May 11, 2018 •

edited

Loading

dcodeIO commented May 12, 2018 •

edited

Loading

MaxGraey commented May 12, 2018 •

edited

Loading

MaxGraey commented May 12, 2018 •

edited

Loading