-
Notifications
You must be signed in to change notification settings - Fork 786
Poppify pass #3541
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Poppify pass #3541
Conversation
Add a utility for calculating the least upper bounds of two StackSignatures, taking into account polymorphic unreachable behavior. This will important in the finalization and validation of Poppy IR blocks, where a block is allowed to directly produce fewer values than the branches that target it carry if the difference can be made up for by polymorphism due to an unreachable instruction in the block. Also removes the Poppy IR documentation from stack-utils.h because that documentation is moved to the Stackify pass in WebAssembly#3541.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe I've already suggested calling this pass Poppify
- seems clearer as we use the term "Poppy IR". What did we decide there?
src/passes/Stackify.cpp
Outdated
@@ -0,0 +1,497 @@ | |||
/* | |||
* Copyright 2020 WebAssembly Community Group participants |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Copyright 2020 WebAssembly Community Group participants | |
* Copyright 2021 WebAssembly Community Group participants |
src/passes/Stackify.cpp
Outdated
// ) | ||
// | ||
// Notice that the sequence of instructions in the block is now identical to the | ||
// sequence of instructions in raw WebAssembly. Also note that Poppy IR's |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// sequence of instructions in raw WebAssembly. Also note that Poppy IR's | |
// sequence of instructions in a WebAssembly binary. Also note that Poppy IR's |
src/passes/Stackify.cpp
Outdated
// sequence of instructions in raw WebAssembly. Also note that Poppy IR's | ||
// validation rules are largely additional on top of the normal Binaryen IR | ||
// validation rules, with the only exceptions being block body validation and | ||
// block unreahchability rules. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// block unreahchability rules. | |
// block unreachability rules. |
src/passes/Stackify.cpp
Outdated
// unreachable behavior in WebAssembly may have unreachable type. Blocks may | ||
// be unreachable when they are not branch targets and when they have an | ||
// unreachable child. Note that this means a block may be unreachable even | ||
// if it would otherwise have a concrete type, unlike in Binaryen IR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add an example of such a block - this is not clear to me.
src/passes/Stackify.cpp
Outdated
auto types = func->getLocalType(curr->index); | ||
const auto& elems = tupleVars[curr->index]; | ||
// Add the unconditional sets | ||
for (size_t i = types.size() - 1; i >= 1; --i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm confused as to why 0
is special here. It seems like for tee you emit
set 3
set 2
set 1
tee 0
get 1
get 2
get 3
Why is that correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, wait, I see it, nevermind...
Adds a Poppify ("--poppify") pass for converting normal Binaryen IR to Poppy IR. Like the existing construction of Stacky IR, Poppify depends on the BinaryenIRWriter to drive the emitting of instructions in correct stack machine order. As instructions are "emitted," Poppify replaces their children with pops and collects them in a list. At the end of each scope, Poppify creates a block containing all the collected instructions for that scope and injects that block into the enclosing scope. All tuple globals and instructions dealing with tuples are also expanded to remove all tuples from the program. The validator currently fails to validate many valid Poppy IR patterns produced in the tests, but fixing that is left as follow-on work to keep this PR focused on the Poppify pass itself. For now the tests simply skip validation.
8f134d8
to
fe2b3af
Compare
Oops, I had forgotten about that. I personally prefer "Stackify," but I also suspect that objectively you're right that "Poppify" is a better name and that I will get used to it quickly. I've gone ahead and renamed everything to use the poppify terminology. |
// unreachable child. Note that this means a block may be unreachable even | ||
// if it would otherwise have a concrete type, unlike in Binaryen IR. For | ||
// example, this block could have unreachable type in Poppy IR but would | ||
// have to have type i32 in Binaryen IR: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, now I understand the difference.
But why is this so? It's not clear to me why it has to be this way. (If it doesn't, removing differences would be good - but I assume there is a reason...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TBH I don't remember the original concrete motivation for making this change in Poppy IR. Allowing more things to be unreachable
allows DCE to be more aggressive, but I don't remember if this was an opportunistic change or if doing things the Binaryen IR way caused problems. I'll remove this documentation change from the PR for now and if it was there for a good reason I will find out (again) soon enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh wait, this documentation is not new in this PR; it already existed in stack-utils.h. Would it be ok with you if we revisited this point once things are working end-to-end? I'm concerned that changes we make in this half-built state will come back to bite us before we finish.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it can speed things up I'm not opposed. But I'd like to understand how this can work, though. The point of Poppy IR is to model the stack machine format more closely, and that format doesn't have unreachable blocks. So shouldn't it be the opposite, that is, the example block here would always have type i32 in Poppy IR?
Or am I misunderstanding Poppy IR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd say the point of Poppy IR is to make it easy to write optimizations and other transformations that take the stack machine structure of instructions into account. Part of that is necessarily modeling the stack machine syntax more closely, but unreachable
types still give optimizations useful flexibility and information, so they are worth preserving in Poppy IR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! The comments are mostly nits or questions:
src/passes/Poppify.cpp
Outdated
// 1. Function bodies and children of control flow (except If conditions) must | ||
// be blocks. | ||
// | ||
// 2. Blocks may have any Expressions except for Pops as children. The sequence |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this an inherent characteristic of Poppy IR or it will change after we support blocks taking values?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we support blocks taking values, I expect that more blocks will have pop children. Right now catch blocks are somewhat limited because this binary cannot be directly expressed in Binaryen or Poppy IR because our we need to have some instruction to consume the pop of the caught value:
try i32
do
i32.const 0
throw $e ;; throws i32
catch $e
;; i32 on stack
end
Eventually I would like to implement that catch with this Poppy IR block:
(block (params i32) (results i32)
(pop i32)
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we support blocks taking values, I expect that more blocks will have pop children.
Then this sounds like not an inherent characteristic of Poppy IR but a tentative restriction that will be removed later..? How about clarifying that in the comment?
Right now catch blocks are somewhat limited because this binary cannot be directly expressed in Binaryen or Poppy IR because our we need to have some instruction to consume the pop of the caught value:
try i32 do i32.const 0 throw $e ;; throws i32 catch $e ;; i32 on stack end
I'm not sure if I understand what you mean by 'limited' and we have to 'consume' the caught value. Given that we don't support block inputs at the moment, catch
seems to be an exception, but why is it limited? And I think it is possible not to consume the caught value if we flow it out of try
, as in your example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will update this comment to make clear that this restriction might be changed in the future.
I double checked and I was wrong about this example being invalid in Binaryen IR; sorry about that. There is no limitation in Binaryen IR after all. Right now the validator does improperly reject the Poppy IR version of that example, but I will fix that in a follow-up along with other validation issues with Poppy IR.
src/passes/Poppify.cpp
Outdated
// (block | ||
// (i32.const 42) | ||
// (i32.const 5) | ||
// (i32.add | ||
// (i32.pop) | ||
// (i32.pop) | ||
// ) | ||
// ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// (block | |
// (i32.const 42) | |
// (i32.const 5) | |
// (i32.add | |
// (i32.pop) | |
// (i32.pop) | |
// ) | |
// ) | |
// (i32.const 42) | |
// (i32.const 5) | |
// (i32.add | |
// (pop i32) | |
// (pop i32) | |
// ) |
- From looking at your test cases, we don't seem to add a new block?
pop
printing format fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We do generate the block, but then the normal printing logic that elides the blocks that make up function bodies kicks in so the block is not printed. You know it's there because there are multiple instructions in the function body, though. For the purposes of this documentation, I think it's important to show that the block is there even if it won't be explicitly printed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah but that block, which is not printed, exists before the Poppify pass, and your comment looks like Poppify generates it because it doesn't show in the 'before' example but it does in the 'after' example. I think it'd be consistent to show it in both or show it in neither.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case because the function body is just a single i32.add
expression, there won't be a block containing the body before the Poppify pass. Poppify really does generate a new block in this case.
src/passes/Poppify.cpp
Outdated
// scope and clear the current scope. | ||
void patchInstrs(Expression*& expr); | ||
|
||
// BinaryIRWriter methods |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// BinaryIRWriter methods | |
// BinaryenIRWriter methods |
src/passes/Poppify.cpp
Outdated
auto* loop = curr->cast<Loop>(); | ||
if (loop->name.is()) { | ||
// TODO: support loop parameters | ||
labelTypes[loop->name] = Type::none; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is labelTypes
used? Is there any reason we don't assign the loop's type here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, it is not. I briefly went through a period where I was recomputing types as I traversed the module, but it's much simpler to just use ReFinalize
like any other pass would.
@@ -0,0 +1,411 @@ | |||
;; NOTE: Assertions have been generated by update_lit_checks.py and should not be edited. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to add some tests that contains more nested blocks as expressions, such as
(func $nested (result i32)
(block (result i32)
(i32.add
(block (result i32)
(i32.const 3)
)
(i32.const 0)
)
)
)
src/passes/Poppify.cpp
Outdated
void run(PassRunner* runner, Module* module) { | ||
PassRunner subRunner(runner); | ||
subRunner.add(std::make_unique<PoppifyFunctionsPass>()); | ||
subRunner.add(std::make_unique<ReFinalize>()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need refinalization?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this Binaryen IR block:
(block $l i32
(br $l
(unreachable)
(i32.const 0)
)
In Binaryen IR, this block would have to have type i32
because it ends in an i32.const
. However, the BinaryenIRWriter
would not emit anything after the unreachable, so in Poppy IR we would end up with:
(block $l i32
(unreachable)
)
We want to refinalize this block to give it type unreachable
because there is no reason to keep it typed as i32
. It's better to give additional optimizations the flexibility and power to do things to this block that do not preserve its i32
type.
src/passes/Poppify.cpp
Outdated
|
||
// Generate names for the elements of tuple globals | ||
Name getGlobalElem(Name global, Index i) { | ||
return Name(std::string(global.c_str()) + '$' + std::to_string(i)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the original module already has two globals such as myglobal
and myglobal$1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, this code will do the wrong thing in that case. I will make this more robust.
auto& instrs = scopeStack.back().instrs; | ||
if (curr->value->type.isTuple()) { | ||
auto types = module->getGlobal(curr->name)->type; | ||
for (Index i = types.size(); i > 0; --i) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Do we need to iterate in the reverse order? (emitLocalSet
does though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Consider this:
(global.set $tuple
(tuple.make
(i32.const 0)
(i32.const 1)
(i32.const 2)
)
)
The tuple.make
is eliminated by Poppification, so we end up with 0, 1, and 2 on the stack. Since the 2 is at the top of stack, it needs to be consumed first, so we need to work through the values from last pushed to first pushed.
src/passes/Poppify.cpp
Outdated
std::vector<std::unique_ptr<Global>> newGlobals; | ||
for (int g = module->globals.size() - 1; g >= 0; --g) { | ||
const auto& global = *module->globals[g]; | ||
if (global.type.isTuple()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Maybe we can make this an early exit in case it is not a tuple, given that this loop has also other for and ifs below
) | ||
|
||
;; CHECK: (func $local-get-tuple (result i32 i64) | ||
;; CHECK-NEXT: (local $x (i32 i64)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When splitting a global, we seem to remove the original tuple global. Would it be good to do that for locals too? (Not necessarily in this PR though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan to use existing passes like CoalesceLocals
to do that as a follow-up optimization. I guess in principle we could use RemoveUnusedModuleElements
to clean up the globals as well, but that's simple enough to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I posted a few remaining questions but they are mostly minor. LGTM!
Adds a Poppify ("--poppify") pass for converting normal Binaryen IR to Poppy IR.
Like the existing construction of Stacky IR, Poppify depends on the
BinaryenIRWriter to drive the emitting of instructions in correct stack machine
order. As instructions are "emitted," Poppify replaces their children with pops
and collects them in a list. At the end of each scope, Poppify creates a block
containing all the collected instructions for that scope and injects that block
into the enclosing scope. All tuple globals and instructions dealing with tuples
are also expanded to remove all tuples from the program.
The validator currently fails to validate many valid Poppy IR patterns produced
in the tests, but fixing that is left as follow-on work to keep this PR focused
on the Poppify pass itself. For now the tests simply skip validation.