-
Notifications
You must be signed in to change notification settings - Fork 786
Grand Unified Flow Analysis (GUFA) #4598
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think we can actually move all the checking routines to `getDroppedUnconditionalChildrenAndAppend` and remove `canRemove` and `canRemoveStructurally`. Checking things in three different places was a little hard to understand, and I think `getDroppedUnconditionalChildrenAndAppend` actually needs to exclude more things that `canRemoveStructurally` covers, if it is to be used in other places as well. Additionally, I think it'd better to make these two dropping-children functions (`getDroppedChildrenAndAppend` and `getDroppedUnconditionalChildrenAndAppend`) to a cpp file and make `getDroppedChildrenAndAppend` an internal function, inaccessible to outer passes, given that it looks all places should use `getDroppedUnconditionalChildrenAndAppend` instead. But this can be a follow-up. Not sure why the test signatures change. May need to investigate..?
(func $test | ||
;; The only place this type is created is with a default value, and so we | ||
;; can optimize the get into a constant (note that no drop of the | ||
;; ref is needed: the optimizer can see that the struct.get cannot trap, as | ||
;; its reference is non-nullable). | ||
(drop | ||
(struct.get $struct 0 | ||
(struct.new_default_with_rtt $struct | ||
(rtt.canon $struct) | ||
) | ||
) | ||
) | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function body is different from the counterpart from cfp.wast
:
binaryen/test/lit/passes/cfp.wast
Lines 55 to 74 in 44fa122
(func $test | |
;; The only place this type is created is with a default value, and so we | |
;; can optimize the later get into a constant (plus a drop of the ref). | |
;; | |
;; (Note that the allocated reference is dropped here, so it is not actually | |
;; used anywhere, but this pass does not attempt to trace the paths of | |
;; references escaping and being stored etc. - it just thinks at the type | |
;; level.) | |
(drop | |
(struct.new_default_with_rtt $struct | |
(rtt.canon $struct) | |
) | |
) | |
(drop | |
(struct.get $struct 0 | |
(ref.null $struct) | |
) | |
) | |
) | |
) |
Also there are more different functions between the two. The GUFA's version seems to be a condensed version of CFP's. Is this intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific reason in the test you mentioned is that dropping each result leads to GUFA seeing that the result does not reach anywhere. And that null is an opaque placeholder for CFP, but GUFA sees that it will trap. So we need to connect results to uses, which "compresses" the test.
More details here:
binaryen/test/lit/passes/gufa-vs-cfp.wast
Lines 6 to 22 in adef90c
;; This is almost identical to cfp.wast, and is meant to facilitate comparisons | |
;; between the passes - in particular, gufa should do everything cfp can do, | |
;; although it may do it differently. Changes include: | |
;; | |
;; * Tests must avoid things gufa optimizes away that would make the test | |
;; irrelevant. In particular, parameters to functions that are never called | |
;; will be turned to unreachable by gufa, so instead make those calls to | |
;; imports. Gufa will also realize that passing ref.null as the reference of | |
;; a struct.get/set will trap, so we must actually allocate something. | |
;; * Gufa optimizes in a more general way. Cfp will turn a struct.get whose | |
;; value it infers into a ref.as_non_null (to preserve the trap if the ref is | |
;; null) followed by the constant. Gufa has no special handling for | |
;; struct.get, so it will use its normal pattern there, of a drop of the | |
;; struct.get followed by the constant. (Other passes can remove the | |
;; dropped operation, like vacuum in trapsNeverHappen mode). | |
;; * Gufa's more general optimizations can remove more unreachable code, as it | |
;; checks for effects (and removes effectless code). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very impressive framework! 😮 LGTM and sorry for the super delayed review! 😅
Thanks for the thorough review @aheejin ! |
I think we can actually move all the checking routines to `getDroppedUnconditionalChildrenAndAppend` and remove `canRemove` and `canRemoveStructurally`. Checking things in three different places was a little hard to understand, and I think `getDroppedUnconditionalChildrenAndAppend` actually needs to exclude more things that `canRemoveStructurally` covers, if it is to be used in other places as well. Additionally, I think it'd better to make these two dropping-children functions (`getDroppedChildrenAndAppend` and `getDroppedUnconditionalChildrenAndAppend`) to a cpp file and make `getDroppedChildrenAndAppend` an internal function, inaccessible to outer passes, given that it looks all places should use `getDroppedUnconditionalChildrenAndAppend` instead. But this can be a follow-up.
Umm, I think I messed up something while merging... |
I think we can actually move all the checking routines to `getDroppedUnconditionalChildrenAndAppend` and remove `canRemove` and `canRemoveStructurally`. Checking things in three different places was a little hard to understand, and I think `getDroppedUnconditionalChildrenAndAppend` actually needs to exclude more things that `canRemoveStructurally` covers, if it is to be used in other places as well. Additionally, I think it'd better to make these two dropping-children functions (`getDroppedChildrenAndAppend` and `getDroppedUnconditionalChildrenAndAppend`) to a cpp file and make `getDroppedChildrenAndAppend` an internal function, inaccessible to outer passes, given that it looks all places should use `getDroppedUnconditionalChildrenAndAppend` instead. But this can be a follow-up.
Thanks @aheejin ! |
Friendly ping @tlively , did you want to take a look here? |
Thanks for the ping. I'll take a look today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Half-rubberstamp LGTM. I don't want to hold this up any longer 😞
test/lit/passes/gufa-no_names.wast
Outdated
;; Two tags with different values. Names are added by text format parsing, which | ||
;; would inhibit optimizations, hence this pass requires unused names to be | ||
;; removed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can names inhibit optimizations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(This was a very old pending comment, feel free to ignore)
This is far from ready for review, but @tlively was curious to see the current status, so posting.
This tracks the possible contents in the entire program all at once using a single IR. That is in contrast to say DeadArgumentElimination of LocalRefining etc., all of whom look at one particular aspect of the program (function params and returns in DAE, locals in LocalRefining). The cost is to build up an entire new IR, which takes a lot of code - ~2000 lines atm, but should be close to done. At least all that code is separable from everything else and could fit entirely in the new pass. Another cost is this new IR is very big and requires a lot of time and memory to process. The benefit is that this can find opportunities that are only obvious when looking at the entire program, and also it can track information that is more specialized than the normal type system in the IR - in particular, this can track an ExactType, which is the case where we know the value is of a particular type exactly and not a subtype. Both may end up useful, but it's too early to tell.
This passes fuzzing (but we don't fuzz
--nominal
well atm, so that is somewhat limited) and a large amount of new tests, but removes too much code in dart2wasm and breaks things there. It is also too large/slow to run on j2wasm atm.edit: now passes on dart2wasm. it removes 2% of code size and 5% of vars.
edit edit: This speeds up j2wasm microbenchmarks by 20%, and is ready for review.