Skip to content

Have br_if return its value #703

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Have br_if return its value #703

wants to merge 1 commit into from

Conversation

lukewagner
Copy link
Member

Currently, AstSemantics.md and ml-proto say that br_if doesn't return a value. However, V8 and SM wasm impls currently both have br_if return its value (mostly as an oversight, but also b/c that's what happens with an stack-based iterative algorithm unless you take special steps to pop and void the value). I think br_if-returning its value is more natural for a couple of reasons:

  • it's weird for a value computed before a branch to only be usable along one of its successor edges
  • from a stack-based POV (i.e., the current validator/compiler of V8/SM), it's weird for br_if to conditionally pop the value
  • the whole point of branch values is to "pass along" a value, so from an expression POV, it seems natural for br_if to pass along its value to the parent expression

(This PR is currently based on #694.)

@ghost
Copy link

ghost commented Jun 7, 2016

This PR should not be bundling in a controversial change discussed in length elsewhere, and it should not be bundling in the tee_local proposal.

Expressions fit some restricted but common patterns. They do not address the general cases of passing around values. tee_local has support because it is common. No numbers have been offered to support a real need for br_if returning value(s)?

I see two different paths to addressing the general cases:

  1. Just use local variables. In this case br_if should accept no values to pass to the target because the values are already in local variables. Local variables are currently needed for many cases, such as when the order of evaluation does not allow the value to be placed in the br_if value position, or when the value is used multiple times such as being part of the condition, or for multiple values.
  2. Embracing the get_value proposal in which case the values can be computed before the br_if and will be on the values stack. The br_if could then accept an immediate sequence of stack indexes for the values to pick to pass to the target allowing multiple values to be chosen and in any order, and the fall-through successor would already have these on the stack.

I just don't see a plan or the stats to evaluate these changes?

@@ -243,6 +242,7 @@ of the arguments passed to the function.

* `get_local`: read the current value of a local variable
* `set_local`: set the current value of a local variable
* `tee_local`: like `set_local`, but also returns the set value
Copy link

@qwertie qwertie Jun 7, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tee_local - I'm guessing this is meant to be in a different PR? Edit - sorry for the noise, I see it's "based on #694"

@qwertie
Copy link

qwertie commented Jun 7, 2016

@lukewagner The points you've made make sense, but if drops are made explicit, wouldn't this change increase file size by mandating more drops?

@rossberg
Copy link
Member

rossberg commented Jun 7, 2016

As the original comment says, the base line is #694 -- I think the PR that Luke actually meant to create would be for this.

@titzer
Copy link

titzer commented Jun 7, 2016

I agree with this PR (as discussed previously). Can we separate out just the br_if value change and land that independently? (btw I didn't see the appropriate changes to the AstSemantics in this PR, but maybe I am looking at the wrong revision somehow....)

@@ -285,13 +284,16 @@ before any others.

### Yielding values from control constructs

The `nop`, `if`, `br`, `br_if`, and `return` constructs do not yield values.
The `nop`, `br`, `br_if`, `br_table`, and `return` constructs do not yield values.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should drop br_if from this sentence.

@rossberg
Copy link
Member

rossberg commented Jun 7, 2016

It might be worth gathering some data first. I assume that the case of actually using the return value is not very common, such that the majority of uses of br_if will now require an additional drop. That might easily lead to a measurable increase in code size.

@lukewagner
Copy link
Member Author

I only based on #694 to avoid the pending merge conflict (assuming people would click "Commits" and then look at the one new commit), but happy to rebase to master if that makes it simpler for people.

@lukewagner lukewagner reopened this Jun 7, 2016
@lukewagner
Copy link
Member Author

(Sorry, GH shenanigans when I "rebased" by recreating a same-named branch)

@lukewagner
Copy link
Member Author

At a high level, I don't think we should be worried about the extra drop: if we today or later determine it is a size issue, we can mitigate by adding void-returning br_if, just like set_local/tee_local in #694. Second, providing the value to both successors (instead of arbitrarily killing it along one successor edge) could provide opportunities for savings, so it's not even clear what the net effect would be after optimizers had a chance to use this fully. Also, to be clear: br_if with arity 0 (no value) does not return a value.

Measuring the AngryBots demo right now (which is 12M bytes), there are 32K br_ifs. Now, that could be an artifact of the asm.js2wasm conversion (which may not be maximally emitting br_if); for comparison there are 326K ifs and 80K brs. Now, only a fraction of those br_ifs will have values. And, only some of those will need a drop (b/c the value isn't used along both successors). So overhead from explicit drop starts at a fraction of .3% of total binary size and could, in the extreme case (no use of if), go up to a fraction of 3%.

So overall I don't think we need to worry about code size issues here. This mostly feels like a symmetry/regularity fix.

@ghost
Copy link

ghost commented Jun 8, 2016

@lukewagner Without a plan for multiple values it is surely not possible to judge 'symmetry/regularity'. An 'arity' count has been added, so lets say there are three values to pass to the target, and lets say that the target and the fall-through need different sets, then what?? There is just no plan here.

  1. Using local variables is a plan. Multiple values are passed in multiple local variables.
  2. Using get_value is a partial plan that needs more work, but it would also address multiple values here as each value could be a separate values stack element. In some ways this PR could be seen as picking the top-of-stack value (rather than popping it), but it could be generalized to pick any value from the stack and to pick multiple values from the stack. But it's not a complete plan yet. If runtime compilers can benefit from knowing the live range of values in a single pass, and having the producer encode this is not too much of a burden on the encoded size, then this might require a different design again.

@rossberg
Copy link
Member

rossberg commented Jun 8, 2016

Honestly, I'd be very surprised if even 10% of br_if's with value wanted to pass that value to both their continuations. The two continuations are highly asymmetric! The usual pattern for br_if is to occur statement-like in the middle of some block:

;; ...
br_if $l (condition)
;; ...

If the target requires a value, the operator will take an extra argument. But that's almost always entirely independent from the local context. In fact, it would be rather irregular if you had to wrap br_if into a drop only because the target takes an argument -- to see how weird that is, just imagine we did something analogous with calls (somehow necessitating a drop if a call takes arguments).

Also, we should avoid a proliferation of duplicated void/non-void operators. If that happened then I'd regard explicit drop semantics a failure. ;) So for most relevant operators, we should pick the semantics that matches the majority use case (instead of penalizing it). We decided to do so for stores, why choose differently here?

@lukewagner
Copy link
Member Author

lukewagner commented Jun 8, 2016

@kripken was actually making a similar point (having performed an optimization in Binaryen that used branch values to optimize branches to join points); still waiting to see what % of those branch-with-values were br vs. br_if, though. I expect most are unconditional br.

Really what motivates me here is symmetry (why pop the already-computed-and-pushed value only along one path?), not size wins. But, talking about this more, there is also a symmetry with store, set_local and tee_local to consider: with store and set_local, we've defaulted to returning void, only adding tee_local b/c of its demonstrated wins and reserving the option to add tee_store in the future if we can show a win. Following this precedent suggests keeping br_if returning void (closing the PR) but reserving the option to add a tee_br_if in the future if we can show a win. I'd be fine with that, given there is now an argument for why the current state is actually symmetric (assuming #694; without #694 there's no practical downside nor do we have the set_local/tee_local split).

@rossberg
Copy link
Member

rossberg commented Jun 8, 2016

@lukewagner, fully agreed.

@sunfishcode
Copy link
Member

sunfishcode commented Jun 10, 2016

To me, br_if is a lot more symmetric than is being acknowledged here. It's a conditional branch that transfers control to one of two possible successors. With this symmetry, having a construct that transmits values along one edge and not along the other edge is still awkward.

@titzer
Copy link

titzer commented Jun 10, 2016

It seems symmetric to me to have it produce a value in both situations, but if it means drops in the common case, then it feels awkward somehow.

@ghost
Copy link

ghost commented Jun 10, 2016

@sunfishcode The value might not even be live in the fall-through path, this is just the reality of the data flow and control flow - I just don't see this 'symmetric' argument. There might be a lot of definitions on the values stack that are discarded if the branch is taken, and these are values 'that transmits values along one edge and not along the other edge', and I don't see people complaining about this not being 'symmetric'.

@lukewagner
Copy link
Member Author

I definitely agree it's symmetric to think of br_if as returning values on both successors. And I also see it as being practically useful to have a void-returning variant symmetric with set_local/tee_local if drop would be common.

But here's the thing, I don't think we even know what the common case is yet. In previous discussions it had sounded like there were a bunch of br_if-with-value, but actually the measurements ("4% reduction in set_local") were for an entire optimization pass doing much more than br_if-with-value. In fact, looking at an optimized build-suite/BananaBread build, there are exactly zero br_if-with-value. So I don't think we know which is the common case at this point in time.

We could be conservative here by simply requiring the arity of br_if to be 0 in the MVP while we take the time to actually write optimizations that take full advantage of this and make an informed decision.

@kripken
Copy link
Member

kripken commented Jun 10, 2016

Yes, the 4% figure was for return value optimizations in general, which includes ifs and blocks, and for blocks, includes br, br_if, br_table and the fallthrough at the end of the block. And looking at the details, indeed br_if with value is not currently generated by the binaryen optimizer, due to some current limitations.

So I agree that we don't know yet for sure if returning the value or not is the more efficient thing to do.

I can look into doing more work on the binaryen optimizer for this very soon, so if we want to wait on data it shouldn't be long.

However, I believe @rossberg-chromium is right, and not returning the value is the more efficient form (similar to store), since

  1. br_if not returning the value corresponds directly to a phi: a branch with a value associated with it. For a phi, the value is part of the branch.
  2. And on the other hand, to be able to use the returned value, it must happen to be used in both the branch target and the code right after it. That's less likely to happen than either event by itself.

@lukewagner
Copy link
Member Author

I can look into doing more work on the binaryen optimizer for this very soon, so if we want to wait
on data it shouldn't be long.

It'd be useful to see how often br_if-with-value is even used at all, but unless you're doing the full analysis to detect when the value of br_if can be used along both successors (instead of just looking at phis), then the ratio of droped-br_if to not won't be meaningful.

br_if not returning the value corresponds directly to a phi

That makes sense for br, not so clear for br_if.

@kripken
Copy link
Member

kripken commented Jun 11, 2016

@lukewagner: Sorry, maybe I wasn't clear: br_if with an unreturned value corresponds to a phi on the target of the branch.

Whereas if the value is returned, it could correspond to two phis, one on the branch target and one on the fallthrough block after the branch, where both phis happen to have the exact same value.

@ghost
Copy link

ghost commented Jun 11, 2016

@kripken It will not create a phi on the fall-through because it is not a control flow merge point. In this respect the paths are not symmetric. http://ssabook.gforge.inria.fr/latest/book.pdf

@kripken
Copy link
Member

kripken commented Jun 17, 2016

I analyzed all br_if appearances in asm2wasm output on libc++. 3% of br_ifs can use a value, which seems low, but the reason is that almost all br_ifs are used in loops (for the condition at the beginning or the end). Of those 3%, all are returning either a get_local or a const, and in some cases the get_local is also used on the other side (the fallthrough), so a return value could be used sometimes, but in most it would be better to not have a return value.

I agree with what others said that more data could be useful here. Overall, I think it's fair to say we don't know for sure if having a return value or not is the best choice, even if the data so far does show not having a return value is better. So I guess we could either work on gathering more data now, or defer the decision, by not allowing br_if with a value, as mentioned in this discussion and earlier by @sunfishcode in #681

I'm ok to defer since we'd be fine for the MVP: if it would be important to have a specific br_if with a value (e.g. if many brs exist to a block, and if only the br_if that also goes there had a value too then we could use a value for them all), then we could do a br with a value inside an if.

@kripken
Copy link
Member

kripken commented Jun 18, 2016

I also did a similar analysis on libc++ but on the wasm backend, here is what it looks like after binaryen's block return value opts:

1249 brs
3469 br_ifs
32% of brs have a value (!)
6% of br_ifs have a value (double from asm2wasm)
over 1% of total nodes decreased due to this pass

Most of those differences from asm2wasm, and that there are more br_ifs than brs, appear to be due to the wasm backend not emitting if and if_else.

The values returned from br and br_ifs look similar to asm2wasm: many consts and get_locals, but also a significant amount of other stuff as well. Going through them, I don't see obvious opportunities for psis, i.e., the values cannot be used as return values in the fallthrough path.

@lukewagner
Copy link
Member Author

Closing in favor of #709.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants