-
Notifications
You must be signed in to change notification settings - Fork 786
Stack utils #3083
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stack utils #3083
Conversation
Implement and test utilities for manipulating and analyzing a new stacky form of Binaryen IR that is able to express arbitrary stack machine code. This new Stacky IR will eventually replace Stack IR, and new optimization passes will be built with these utilities. See WebAssembly#3059.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't read the PR in detail yet, but I am a little confused, and have some questions:
- I think it is better to write somewhere what a Poppy IR is briefly, because I don't think it's written anywhere in this repo.
- When we talked about Poppy IR in your internal design doc, the one thing I kept asking was while I understood why this "structural" form of IR that had blocks and loops was better for some optimizations such as CodeFolding, I didn't get why we needed to make the IR "poppy", with placeholder pops. I thought the form was not very intuitive and adding a pseudo instruction
pop
everywhere seemed weird. Wouldn't adding block hierarchies to the current linear stack IR be sufficient? And your answer was that we can reuse Binaryen IR structures and utilities, which I couldn't understand either, because I didn't really understand what reusing traversing utilities would help, because, if we don't have this Binaryen IR-like format and just use a straight-line IR (with blocks), wouldn't traversing be just as simple as linear scan anyway? Why do we need to make a lot of efforts to make Poppy IR look like Binaryen IR to reuse its utilities, while straight line traversal is just simpler?
Sorry for bringing the old discussion again, but I just didn't understand the motivation for all those pops everywhere even then and also now.
And looking at this PR, apparently that Poppy IR seems to need a fair amount of utilities anyway..? Maybe I'm just confused about what all these classes do. In case I'm mistaken, some explanation of what these utility classes do, and why we need them, would be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG, sorry, I didn't mean to approve the PR just yet (As I said, I haven't even finished reading it yet 😂) There doesn't seem to be a way to 'unapprove' a PR?... I'm not requesting any changes, but clicking this option just to cancel my approval... Sorry.
@@ -4,6 +4,7 @@ set(ir_SOURCES | |||
ExpressionManipulator.cpp | |||
LocalGraph.cpp | |||
ReFinalize.cpp | |||
stack-utils.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lowercase looks inconsistent here, how about StackUtils.cpp
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Surveying the subdirectories of src, we have asmjs, emscripten-optimizer, support, tools, and wasm using (mostly) kebab-case
(with some snake_case
thrown in) for their .cpp files and only ir and passes using PascalCase
. I kind of understand passes using PascalCase because each .cpp is named for the pass class it contains, but I think generally it would be good to standardize on kebab-case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't feel strongly about this I guess. I agree we have existing inconsistency anyhow, so maybe this doesn't matter.
src/ir/stack-utils.cpp
Outdated
|
||
namespace StackUtils { | ||
|
||
void compact(Block* block) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps the name could mention that it is just for nops, like removeNops
?
That would avoid a possible future confusion with the Vacuum pass's visitBlock implementation which strips nops as well as some other things - that function could be moved out eventually and called something generic like compact
I think.
(One option might be to do that refactoring now, and call that method that removes nops as well as does other things - unless the goal is to only do nops and nothing else?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, removeNops
would be a fine name here. This is used in Poppy IR passes that remove instructions. To avoid quadratic behavior, they transform the removed instructions into nops, then remove all the nops linearly at the end. So it doesn't need to do anything besides remove nops.
}; | ||
|
||
// Calculates stack machine data flow, associating the sources and destinations | ||
// of all values in a block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is just for a block, perhaps the name could be BlockStackFlow
or BlockStackInfo
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there is no other kind of stack flow, I would prefer the shorter name. Let me know if that seems unreasonable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds ok to me.
src/ir/stack-utils.h
Outdated
// (produced), representing the `index`th input into (output from) instruction | ||
// `expr`. `unreachable` is true iff the corresponding value is consumed | ||
// (produced) by the polymorphic behavior of an unreachable instruction | ||
// without being used by that instruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to handle the polymorphic case? If we DCE before Stacky IR, I think we'd never have such a situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even with DCE, I think we would be able to encounter a case that looks like this:
(block (result f32 i64)
(i32.const 0)
(block [unreachable]
(unreachable)
)
)
In this example, we will want to be able to infer an (i32) -> (f32, f64)
type for the unreachable
inner block. To do that, we need to calculate the data flow into and out of the polymorphic inner block.
We could also prevent that situation by having more constraints and requiring more optimizations to be run before entering Stacky IR, that would make Stacky IR much less robust and discovering all the necessary constraints in the first place would be difficult. Ultimately we want Stacky IR to represent and facilitate reasoning about raw WebAssembly as closely as possible, and that includes reasoning about polymorphic unreachables.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that code fragment is poppy IR?
I don't think dce-d binaryen IR can lead to cases like that, and as we'll only get poppy IR from binaryen IR, I think that means we are ok to ignore unnecessary unreachable code (and I think unnecessary unreachable code is what's relevant here). But I may be missing something at several levels here...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that code fragment is Poppy IR.
Looking forward to using wasp to parse WebAssembly directly to Poppy IR and unstackifying it in a separate step, we will need to be able to represent unreachable code in Poppy IR because it can be presesent in WebAssembly.
Overall I am uncomfortable with making the assumption that Poppy IR has already been DCEd. That just seems brittle to me because then all Poppy IR optimizations have to avoid accidentally introducing new unreachable code. It also limits Poppy IR's ability to faithfully represent any WebAssembly module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we start out with the Binaryen IR
(block (result f32 i64)
(i32.add
(... side effects 1 ...) [i32]
(block [unreachable]
(unreachable)
(.... side effects 2 ...)
)
)
)
Then DCE should be able simplify this to something like
(block [unreachable]
(... side effects 1 ...) [i32]
(block [unreachable]
(unreachable)
)
)
Which is similar to the fragment from the prior post. Type inference still needs to be able to figure out what concrete type to assign the unreachable blocks here, which requires being able to calculate the outflow from a polymorphic unreachable instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, no wonder the DCE pass is so complicated. Maybe we can simplify it by making it use Poppy IR ;)
Actually, since these utilities give us the ability to do type inference for unreachable blocks, we might be able to simplify normal Binaryen IR's unreachability rules to match Poppy IR's rather than the other way around. Specifically, this would allow us to make any block unreachable if it is not branched to and it contains an unreachable instruction. This would get rid of the restriction that the final expression of the block cannot have a concrete type. That would be a follow-up investigation, of course, but it would make unreachability more powerful as well as simplify and unify the unreachability rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting idea about unreachability! Needs more thought but I don't see an obvious reason not to remove that restriction, if it helps.
I do think we should think about the other two arguments before moving forward here, though. Specifically, we should have an idea if we want to use poppy IR during read and not just during write (the latter is all I had in mind), and whether poppy passes will need to introduce new unreachable code (I was actually hoping they'd be very simple and not do that - they would focus on stack-machine specific stuff, all done after dead code was already removed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Places we've considered (with any amount of seriousness) using Poppy IR:
-
Pre-write optimizations: We can impose arbitrary structural restrictions (i.e. require arbitrary optimizations to already have run), but doing so makes testing, fuzzing, and general reasoning slightly more complicated.
-
Text parsing: Parsing to Poppy IR (probably via wasp) would allow us to support the full standard text format for the first time and to test the interpreter on the upstream spec tests. We would need to be able to represent arbitrary WebAssembly, including unreachable code. The unstackifier could be slightly simpler if stack utilities could reason correctly about this unreachable code, although the same logic could also be built into the unstackifier itself.
-
Binary parsing: We could outsource binary parsing to wasp as well, with the same considerations as text parsing.
-
Simplifying existing passes: It would be good to take a look at simplifying DCE, MergeBlocks, and any pass that expects flat IR by having them operate on Poppy IR, either having them run stackification and unstackification as subpasses or depending on the pass runner to insert conversions. Clearly we can't require running DCE before we run DCE, but more generally it is more robust to limit dependencies between optimization passes.
-
Unknown future passes: There's not much to say here except that the fewer limitations we put on Poppy IR, the more useful it can be.
Also note that Stack IR has no restrictions on unreachable code, so placing such restrictions on Poppy IR would make them more difficult to compare and would be a regression in capabilities.
I also don't think there is much upside to disallowing unreachable code in Poppy IR, since it behaves identically to unreachable instructions in the underlying WebAssembly that people should probably be familiar with anyways. This seems like a clear choice to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also don't think there is much upside to disallowing unreachable code in Poppy IR
I guess this is the bottom line. If there isn't much cost to support it now, then I'm not opposed.
So I think we agree on the key issue for now. But other notes:
Simplifying existing passes: It would be good to take a look at simplifying DCE, MergeBlocks, and any pass that expects flat IR by having them operate on Poppy IR, either having them run stackification and unstackification as subpasses or depending on the pass runner to insert conversions.
I'm skeptical about this (I wrote a similar comment here just now actually, but this expands on it). Converting IR back and forth would have a cost in speed. Binaryen's optimization pipeline is designed to be minimal and fast, with most optimizations just removing or altering code (i.e. no allocations and no major restructurings which could affect locality). Basic optimizations like DCE and MergeBlocks should have that behavior, as well as every pass we run in -O1
(higher levels do some much heavier things, of course).
Note also that I'm unsure how much Poppy would help with them. They do more than natively remove dead code or naively merge blocks - they also do a bunch of things related to those which are easier and more efficient to do at the same time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very reasonable. Even if the code could be much simpler if it used Poppy IR (and I am curious about that), I agree that the current complexity is probably worth saving the overhead of performing the translations to and from Poppy IR.
Will do. In the code I plan to use the name "Stacky IR," because it is more descriptive and after we remove Stack IR we won't need to worry about ambiguity. I plan to keep using "Poppy IR" when discussing it for the time being, though.
Pops are important for representing type information for each instruction. See
Without having that child available, we would have to look up the enclosing function to figure out the input type of the It would be possible to do away with pops for most instructions that always have the same stack signature, e.g. all unary or binary operations, but having pops on these instructions as well maintains the Binaryen IR invariant that these children are not null, which means less code needs to be changed to be reused.
These utilities do not correspond to any existing utilities for Binaryen IR because they deal with stack machine semantics that are not expressible in normal Binaryen IR. Besides these, Poppy IR will only reuse normal Binaryen IR utilities. What these utilities do is fairly well documented in the header and the reason we need them will be clearer in follow up PRs that actually use them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delayed reply. I haven't finished reading everything, but I think getting answers for some of these questions would be helpful.
// WebAssembly. That means that any expression may have a concrete type, not | ||
// just the final expression in the block. | ||
// | ||
// 3. All other children must be Pops, which are used to determine the input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I kind of like "Poppy" as a name because of this. But up to you...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drive-by comment: I like stuff with weird but perfectly cromulent names. Embiggens its meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good :) Since folks like the name, I will use it in the code as well. Using one name consistently will certainly simplify discussion, too.
Co-authored-by: Heejin Ahn <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have not understood every part of this, but at least for my reading style it will be easier once there is code that uses it, so I'm deferring some of that to later. But what I see so far looks good.
Thanks! Now I think I understand better. I posted some questions above. They are about 1. block end proxy and 2. whether unreachable expressions' stack signature should be different depending on their surrounding instructions. I deleted some questions I posted earlier after I understand more about the code, so there might be some questions in your mail notification that aren't here anymore; please ignore them. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your patient explanation. I appreciate new comments. LGTM!
Implement and test utilities for manipulating and analyzing a new
stacky form of Binaryen IR that is able to express arbitrary stack
machine code. This new Poppy IR will eventually replace Stack IR, and
new optimization passes will be built with these utilities. See #3059.