Simplify BinaryenIRWriter #3110

tlively · 2020-09-10T09:00:15Z

BinaryenIRWriter was previously inconsistent about whether or not it
emitted an instruction if that instruction was not reachable.
Instructions that produced values were not emitted if they were
unreachable, but instructions that did not produce values were always
emitted. Additionally, blocks continued to emit their children even
after emitting an unreachable child.

Since it was not possible to tell whether an unreachable instruction's
parent would be emitted, BinaryenIRWriter had to be very defensive and
emit many extra unreachable instructions around unreachable code to
avoid type errors.

This PR unifies the logic for emitting all non-control flow
instructions and changes the behavior of BinaryenIRWriter so that it
never emits instructions that cannot be reached due to having
unreachable children. This means that extra unreachable instructions
now only need to be emitted after unreachable control flow
constructs. BinaryenIRWriter now also stops emitting instructions
inside blocks after the first unreachable instruction as an extra
optimization.

This change will also simplify Poppy IR stackification (see #3059) by
guaranteeing that instructions with unreachable children will not be
emitted into the stackifier. This makes satisfying the Poppy IR rule
against unreachable Pops trivial, whereas previously satisfying this
rule would have required about about 700 additional lines of code to
recompute the types of all unreachable children for any instruction.

BinaryenIRWriter was previously inconsistent about whether or not it emitted an instruction if that instruction was not reachable. Instructions that produced values were not emitted if they were unreachable, but instructions that did not produce values were always emitted. Additionally, blocks continued to emit their children even after emitting an unreachable child. Since it was not possible to tell whether an unreachable instruction's parent would be emitted, BinaryenIRWriter had to be very defensive and emit many extra `unreachable` instructions around unreachable code to avoid type errors. This PR unifies the logic for emitting all non-control flow instructions and changes the behavior of BinaryenIRWriter so that it never emits instructions that cannot be reached due to having unreachable children. This means that extra `unreachable` instructions now only need to be emitted after unreachable control flow constructs. BinaryenIRWriter now also stops emitting instructions inside blocks after the first unreachable instruction as an extra optimization. This change will also simplify Poppy IR stackification (see WebAssembly#3059) by guaranteeing that instructions with unreachable children will not be emitted into the stackifier. This makes satisfying the Poppy IR rule against unreachable Pops trivial, whereas previously satisfying this rule would have required about about 700 additional lines of code to recompute the types of all unreachable children for any instruction.

tlively · 2020-09-10T09:01:04Z

So far this has done 7000 iterations on the fuzzer, but I will leave it running over night.

tlively · 2020-09-10T17:20:22Z

Alright, I just stopped the fuzzer at 67892 iterations, so I'd say this is stable.

kripken

Very nice!

Two things though:

Does this have a performance impact? Using an Iterator instead of a direct walk may be slower. Worth measuring.
Might be good to run the emscripten test suite before landing (maybe at least other, wasm0, wasm2).

kripken · 2020-09-10T20:46:12Z

src/ir/iteration.h

+    if (Properties::isControlFlowStructure(curr)) {
+      // If conditions are the only stack children of control flow structures
+      if (auto* if_ = curr->dynCast<If>()) {
+        self->pushTask(SubType::scan, &if_->condition);


we use iff for this type of thing in more places

kripken · 2020-09-10T20:49:32Z

src/wasm-stack.h

-    // similar to in visitBlock, here we could skip emitting the block itself,
-    // but must still end the 'block' (the contents, really) with an unreachable
-    emitUnreachable();
+    if (child->type == Type::unreachable) {


maybe a comment here about why we do this and why it is valid? specifically that this is enough because an unreachable block must have an unreachable child.

kripken · 2020-09-10T20:54:56Z

src/ir/iteration.h

-class ChildIterator {
+//   ChildIterator - Iterates over all children
+//
+//   StackChildIterator - Iterates over all children that produce values used by


"Stack" seems not clear enough to me. If this is just for Poppy IR then "Poppy" (or "Stacky" if we decided on that name?). But that doesn't seem clear enough either. How about PoppedChildIterator - indicating that these are children that are popped?

This isn't particular to Poppy IR, and I'm concerned that having "Popped" in the name might give the wrong impression. Maybe ValueChildIterator?

Yeah, maybe "value child" vs "structural child" or such is good terminology. +1 for ValueChildIterator

tlively · 2020-09-10T23:22:52Z

I checked the performance of roundtrip on a different on a 1.3MB binary. Times are averages over 5 runs.

Before this PR: 0.122s
After this PR: 0.151s
This PR with SmallVector: 0.135s

So we are taking a 10% hit on our baseline parse + emit performance. This is not great, but I think it is worth it for the simplicity wins. I don't feel strongly about that, though, and I would be willing to make the same behavioral change without unifying all the logic. Let me know what you think.

Also, the emscripten tests passed.

kripken · 2020-09-10T23:43:08Z

(What's the SmallVector change? In the Iterator?)

I think 10% might be acceptable here. We are moving to make unoptimized builds not run binaryen at all, and in optimized builds binary writing is pretty small compared to optimization work. But might be worth a TODO to look into optimizing this more, as in theory the iterator could be as fast as a walk.

tlively · 2020-09-10T23:52:15Z

Yeah, sorry, I should have explained the SmallVector change. That's with a SmallVector<Expression*, 4> instead of a std::vector<Expression*> in the iterator.

What we really need is a kind of Walker where tasks can return a value to make walk return without emptying the task stack and add a continueWalk method to pick up where the previous walk left off. The real trick would be to integrate that nicely into the existing walker framework without making anything more expensive.

aheejin · 2020-09-11T13:14:51Z

Late to the review, but it is really a nice simplifying change!

This test seems to be added in WebAssembly#2266 to test custom unreachable generation in `BinaryenIRWriter`, but given that the `fromBinary` files only contain a single `unreaachable` for the whole function, I don't think this tests serves a lot of purpose. Also the custom unreachable generation logic in WebAssembly#2266 was largely replaced in WebAssembly#3110.

This test seems to be added in WebAssembly#2266 to test custom unreachable generation in `BinaryenIRWriter`, but given that the `fromBinary` files only contain a single `unreaachable` for the whole function, I don't think this test serves a lot of purpose. Also the custom unreachable generation logic in WebAssembly#2266 was largely replaced in WebAssembly#3110.

This test seems to be added in #2266 to test custom unreachable generation in `BinaryenIRWriter`, but given that the `fromBinary` files only contain a single `unreachable` for the whole function, I don't think this test serves a lot of purpose. Also the custom unreachable generation logic in #2266 was largely replaced in #3110.

tlively requested review from kripken and aheejin September 10, 2020 09:00

kripken approved these changes Sep 10, 2020

View reviewed changes

tlively added 3 commits September 10, 2020 17:11

Update names, vector, and comments

f3e34b2

More comments

2358503

Add TODO

c45b35d

tlively merged commit cd6f0d9 into WebAssembly:master Sep 11, 2020

tlively deleted the binaryen-ir-writer-unreachables branch September 11, 2020 00:49

aheejin mentioned this pull request Jan 11, 2021

Remove extra-unreachable.wast #3480

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Simplify BinaryenIRWriter #3110

Simplify BinaryenIRWriter #3110

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

kripken left a comment

Uh oh!

kripken Sep 10, 2020

Uh oh!

kripken Sep 10, 2020

Uh oh!

kripken Sep 10, 2020

Uh oh!

tlively Sep 10, 2020

Uh oh!

kripken Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

kripken commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

aheejin commented Sep 11, 2020

Uh oh!

Uh oh!

Simplify BinaryenIRWriter #3110

Simplify BinaryenIRWriter #3110

Uh oh!

Conversation

tlively commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

kripken left a comment

Choose a reason for hiding this comment

Uh oh!

kripken Sep 10, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Sep 10, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Sep 10, 2020

Choose a reason for hiding this comment

Uh oh!

tlively Sep 10, 2020

Choose a reason for hiding this comment

Uh oh!

kripken Sep 10, 2020

Choose a reason for hiding this comment

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

kripken commented Sep 10, 2020

Uh oh!

tlively commented Sep 10, 2020

Uh oh!

aheejin commented Sep 11, 2020

Uh oh!

Uh oh!