Skip to content

Make dropping of values explicit #694

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 15, 2016
Merged

Make dropping of values explicit #694

merged 2 commits into from
Jun 15, 2016

Conversation

rossberg
Copy link
Member

Luke, Ben, Dan, and I have been discussing this change for slightly simplifying Wasm, and allowing a more natural interpretation of Wasm as a stack machine. The main changes are:

  • Values can no longer be discarded implicitly.
  • Instead, there is an explicit drop operator.
  • To compensate, store operators no longer return a value.

The constructs affected by this are mainly blocks and block-like sequences. Before, all expressions in a block could yield a value, and it would just be dropped on the floor implicitly (except the last one). Now we require explicit drop operations in those cases. With stores no longer returning a value, the only places were we expect drops to actually arise are calls to side-effecting functions that also return a value, but this value isn't used.

(Also fixing a bunch of other out-of-date text on the way.)

The corresponding spec changes can be found here for more details:

https://github.com/WebAssembly/spec/compare/void

More specifically:

- Values can no longer be discarded implicitly.
- Instead, there is an explicit `drop` operator.
- To compensate, store operators no longer return a value.

Also fixing a bunch of other out-of-date text on the way.
@ghost
Copy link

ghost commented May 23, 2016

The block top-level expressions are still not known until the end of the block, or if drop is only usable as the first operation in a top level expression so that drop flushes all values on the blocks value stack, or flushes one value and gives a validation error if there are any remaining?

It is not possible to validate that drop has an expression unless it is encoded as a block with an end operator, in which it is just sugar for (block ... (nop)), yet this proposal just adds a single opcode for this. What does (drop (drop (nop))) do? If there are still void node stack entries then this has not solved the problem of eliminating the void type? What does (block (i32.const 1) (drop)) do and how is it different from (block (drop (i32.const 1)))?

This seems to fundamentally conflict with the proposal in #685 which would allow expression values on the values stack to be re-used. I believe this re-use of definitions is compelling and will need to be supported, and thus this drop proposal is the wrong approach. If this re-use is supported then the store and set_local operators need not return values anyway as there is a cleaner way to re-use their argument.

I think what should be done is to push one value on the values stack for each value in a result expression. Then an expression returning zero values pushes zero values on the stack. A multiple-values call would push multiple result values on the stack. Blocks might just need to define the number of values from the stack that they return for the fall-through, or end in br if they return values etc. If you want an operator to clear values from the top of the stack then it should do that rather than accepting an expression, and it should be limited to the stack at the start of the block so it would be a validation error to drop values not on the blocks value stack. But with the proposal in #685 there is no need to be able to clear values from the stack. What if a call returns three values and only the second is needed, a drop operator is not sufficient to discard all the unused values, it's just not a general solution. I suggest solving the general case, so that values at any position on the values stack can be declared no-longer-used, and then see if the drop operator is still a compelling high frequency case.

@rossberg
Copy link
Member Author

@jsstat, I don't follow half of your questions, but some concrete answers: drop is an ordinary unary operator consuming a single value; it cannot be encoded by block, because a block cannot drop anything anymore; (drop (nop)) is a type error, at least for now; (block (i32.const 1) (drop)) is a syntax error, because drop requires an operand; (block (drop (i32.const 1))) is perfectly valid and returns void.

@lukewagner
Copy link
Member

The PRs lgtm. A concrete benefit that motivated us is that this change avoids pushing a lot of implicitly-discarded values that a single-pass baseline compiler (which we're building for first-tier codegen) would otherwise have to hold onto (thereby avoiding a lot of stack spilling). Also, with this change (and set_local/store returning void), pretty much the only use of drop would be for non-void calls whose return value is ignored.

A separate question is whether we should add a tee which is defined to be equivalent to set_local before the PR; for that it seems like we should just measure the size win.

@kripken
Copy link
Member

kripken commented May 23, 2016

(and set_local/store returning void)

This is by far the biggest code change brought about by this PR. Previously all implementations supported set_local return values, and binaryen and the wasm-backend used that to get rid of get_locals (if a get has a single relevant set, and reordering is valid, then we can replace the get with the set, thus removing the get).

We discussed this previously here, and there are some numbers there that show around half of set_locals use their return value when we optimize, so taking that out could be significant - more get_locals and larger blocks. I am working to get more precise numbers now for this change.

@kripken
Copy link
Member

kripken commented May 24, 2016

Numbers on BB: without set_local return values we have 6.2% more nodes, 5.1% larger binary, 4.8% larger after gzip.

There would also be some loss from not having store return values, but I don't have a practical way to test that. It's probably much smaller though.

This suggests that removing return values from set_local and store can significantly increase our size. Nodes and binary size could be recovered with tee_local (and tee_store?), since we basically split one opcode into two. For gzip, numbers from the link before suggest the hit could be decreased to closer to 1.1%, but depending on the distribution of set_local and tee_local it could be higher.

On the other hand there are arguments for taking some % of regression here,

  • Without tee_local code is flatter and more readable. Optimizing using set return values leads to very nested code.
  • I agree it is nice and simple to remove return values from both set_local and store.
  • @lukewagner suggests the code size might be addressed in compression at a higher level. Possible, however, if we are doing this for baseline compiler speed, then that higher-level compression could be exactly what we have now - code with implicit return values, i.e. one opcode for both returning and not returning - and decompression would be generating either the tee_local or nonreturning set_local one, from the context. So a baseline could just do that directly for higher total efficiency, I suspect.

@ghost
Copy link

ghost commented May 24, 2016

@rossberg-chromium How does this validate that (block (i32.const 1) (nop)) or (block (i32.const 1) (i32.const 2)) is invalid? Will a post-order decoder check that there are no values left on the blocks values stack at the end of the block, or allow just one value for the fall-through expression result?

It lacks a plan for multiple value calls. You could argue that multiple value calls need to write their results to local variables, but the same could be done for all operators (the expressionless encoding I explored) and that experiment suggested that the values stack is an important resource and avoiding local variables speeds SSA decoding. So I think there may be compelling reasons to have multiple value calls write their results to the values stack too.

What is your proposal for multiple values??

This is a critical question for this PR. If multiple value call results are to be written to separate values of the values stack then it would be natural for nop to push no values on this stack, and it would significant change the semantics of drop. If they are to be kept in tuples on this values stack then I expect that a single pass decoder or compiler would want to rewrite these to use separate value stack slots anyway so that it is easy to manage register allocation.

If multiple value tuples are single entries on the values stack then drop can only drop the entire tuple, and it does not solve the problem of being able to discard some of these values or the problem of being able to declare some unused.

Being able to reuse values up the stack is also a compelling feature, and a far more general and cleaner solution than tee, and it leads to far easier code to read and write too. I have not had the time to get numbers on the encoding implications. If values up the stack are reusable then they could be dropped later but only in stack order which still frustrates code generation - drop is not a general solution in this case either.

@kripken If people are going to give up on the wasm encoding being space efficient and optimize for quick decode and code generation, and leave it up to a compression layer to be space efficient, then we can optimize for other criteria, optimize for fast decode and compilation. If so then the compiler needs the live ranges of definitions so lets solve this general problem. The compiler does not need to drop any values from the values stack to solve this general problem, rather it could leave all definitions on the values stack and flag their last use or unused definitions which would solve the general problem. Being able to drop the top stack value can then be seen as an optimization for a common case.

I think the single pass compiler experiment is very valuable and should be pushed hard and perhaps some experimental encodings explored for this. For example I see no reason why this could not also do type derivation and optimize away some of the bounds checks and all in a single pass. The implications for register allocation are very interesting. What would be the best encoding for such a compiler, an encoding that supports multiple values calls efficiently too, and how much would this cost in encoding space efficiency?? I think that is where the effort should be, where we will get some insights.

@kripken
Copy link
Member

kripken commented May 24, 2016

I did some further measurements by writing a binaryen pass that can remove reliance on set_local and store return values. The motivation for the pass was to be able to convert current wasts to the new format, should we go forward with this.

First thing is that it looks like there is no usage of store return values in BB. Looks like the asm.js toolchain never does it, and in binaryen we never wrote opts to use it either. (In theory asm.js might have it, though, and asm2wasm would accept it.)

For set_local, we replace a set that must return with a block that sets and then gets. This might introduce a block in places we can't eliminate it (though we merge when we can), so this is different than the previous measurements, which just didn't run the pass that did 99% of optimizations that use set_local returns. As a result these numbers are more accurate (however, doing this at the end of optimizations might also not be the best). The numbers are 9.1% more nodes, 8.0% more binary size and 6.3% more gzip size, which is somewhat worse than before.

@titzer
Copy link

titzer commented May 24, 2016

I had proposed that we have separate set_local and store_local bytecodes, with set_local retaining the current semantics and store_local including an implicit drop, to be symmetric with store_memory which will now also have an implicit drop.

@rossberg
Copy link
Member Author

Thanks @kripken, that's very useful data. I agree that this suggests that we should probably have a tee_local (or whatever the name), too.

@ghost
Copy link

ghost commented May 24, 2016

@kripken I don't think you results are good. Consider your test example:

(i32.add (set_local $x (i32.const 10)) (i32.const 20))
=>
(i32.add (block (set_local $x (i32.const 10)) (get_local $x)) (i32.const 20))

this could be transformed into the more compact:

(set_local $x (i32.const 10))
(i32.add (get_local $x) (i32.const 20))

It appears to cost a get_local for each instance.

@qwertie
Copy link

qwertie commented May 24, 2016

Aww 😢 set_local and store returning values was one of my favorite features - I'm not surprised that its removal harms optimized code size significantly. Is implicit drop measured to be harmful to compiler performance - especially compared to the opportunity cost of disabling the compiler optimization enabled by stores-returning-values?

The name "tee" is completely foreign to me (I prefer titzer's name idea). Where did it come from?

@kripken What's BB?

@titzer
Copy link

titzer commented May 24, 2016

To be clear, stores-returning-values and set-local-returning-values is no problem for any optimizing compiler; it only makes a difference to interpreters and baseline compilers.

@qwertie
Copy link

qwertie commented May 24, 2016

@titzer Forgive me, what's the definition of "baseline compiler"? (And to be clear the performance I was talking about was back-end compiler, not front-end optimizer.)

@ghost
Copy link

ghost commented May 24, 2016

@kripken Also I recall seeing code making good use of the result of stores expressions. For example storing a constant to multiple addresses.

@ghost
Copy link

ghost commented May 24, 2016

@qwertie "baseline compiler" - a very quick compilation to machine code, perhaps generating code in a single pass while decoding.

I wonder if it really makes a difference to the base-line compiler as it could keep the value in a register and the following get_local would resolve to this same definition and register? The definition is in the local variable now anyway, so the single pass compiler would have to consider it live until another value is written to the local variable. The get_value proposal might avoid the need for a local, and that could help.

@qwertie
Copy link

qwertie commented May 24, 2016

I expect wasm to be optimized for real life, so give me a thumbs up if I have this straight: you want to remove implicit drop in order to benefit the first phase of two-phase JITs, in which the first phase ("baseline compiler") needs to be ultra-fast and the second phase is the kind of optimizing JIT that uses on-stack replacement?

@taisel
Copy link

taisel commented May 24, 2016

@kripken Has it been discussed to have multiple simultaneously valid variations of WASM, that add/leave out these sort of aspects? I'm wondering if there could be profiles that could be manually selected for performance vs. size vs. compile time, and if they're different enough to be worthwhile.

@ghost
Copy link

ghost commented May 24, 2016

@qwertie The 'basline compiler' is a complete single-phase compiler, not a 'two-phase JIT'. It might be best to take a look at the actual code which is quite interesting, see https://bugzilla.mozilla.org/show_bug.cgi?id=1232205

This PR helps this single pass compiler by dropping expression results from the values stack earlier, rather than all at the end of the block. So it knows they are not live and can avoid flushing them to the memory stack. Personally I think there might be better solutions to this challenge.

@lukewagner
Copy link
Member

Adding tee_local sgtm ("tee" coming from Unix). (Personally, I don't think "store" vs. "set" really makes the distinction clear.)

@kripken Thanks for the measurements! What is the size increase for brotli and lzma?

I think that if layer 0 size stays mostly unchanged (as it sounds like it would be if we add tee_local), we can accept a little layer 2 size loss since the ultimate effect (after a long period of layer 1 experimentation) is not yet known.

@qwertie
Copy link

qwertie commented May 24, 2016

So far I'm really worried about this one. Be careful about pulling the trigger!

  • If set_local becomes store_localwithout re-adding set_local (because I don't like the name tee), there's a good chance that the extra effort need in 'layer 2' to get the savings back would likely negate the benefit you got in the baseline compiler.
  • Re-adding set_local would help, but it would also bloat the size of the standard and the number of cases that all Wasm consumers (and most Wasm producers) must deal with. Perhaps we could solve this conundrum by enhancing the opcode table to support "macros" i.e. store_local x y = (drop (set_local x y))

So is the baseline compiler intended to be the (only) final production compiler for wasm in FF?

@lukewagner
Copy link
Member

No, it's a first tier. But on weak mobile/IoT hardware, it may end up running code for a "long" time as the full optimizing compilation completes in the background. Moreover, I think we'll see a variety of compilation schemes for wasm over time, not just beefy browsers, so when low-hanging fruit is available like this at low cost (assuming tee_local), I think we should take it. Lastly, with postorder, we almost have a very plain interpretation of wasm bytecode as a simple stack machine; this change takes us there.

@kripken
Copy link
Member

kripken commented May 24, 2016

@JSStats: you are correct, the second measurements are not fully optimal. I think both sets of numbers are useful, for that reason. Each has limitations, the truth is probably in the middle. (The transformation you suggest should be added as a binaryen optimization pass, but it doesn't exist yet.)

@qwertie: BB is BananaBread, the codebase I measure a lot on. It's a fairly large (~100K) game engine with lots of different types of code in it (C style, C++ style, float, int, parsing, physics, AI, rendering, etc. etc.). I think it's big and varied enough to be representative, and small enough that it's easy to get measurements on quickly.

@taisel: I don't think I remember that being suggested. Perhaps worth opening an issue for discussion.

@kripken
Copy link
Member

kripken commented May 24, 2016

I think @qwertie is right about the efficiency. We know a single set (that either drops or returns, depending on context) is good for transmission, as it uses fewer opcodes and data shown above supports it shrinks downloads. A single set is a natural, efficient encoding. Given that, two options are

  1. A higher layer that transforms set into set-drop and set-return, then the baseline compiler knows what drops and what doesn't.
  2. The baseline compiler transforms set into set-drop and set-return as it works.

2 seems like it has to be more efficient. And 2 is possible now.

But I'm not opposed to this change. I just think baseline benefits do not justify it. There might be other benefits though:

  1. This change would make us more like a stack machine, but we haven't actually explained in this PR why that is good. What practical benefits do we expect from that?
  2. s-exprs are simpler to read without heavily nested set_local return value optimizations. Perhaps the final text format will make this moot, but also this might make the final text format simpler to design and implement (I think @sunfishcode's experiments on the text format support that).

@lukewagner
Copy link
Member

This change would make us more like a stack machine, but we haven't actually explained in
this PR why that is good. What practical benefits do we expect from that?

Other than the aesthetic benefit of being able to think of execution as both evaluation of an AST and interpretation of a stack machine (and having both mean the same thing), with this change, we are at a point where, post-MVP, and only after significant measurement and experimentation, we have the option to loosen the validation predicate on the postorder bytecode and allow more general stack machine operations (pick, dup, swap, etc) that could potentially allow far more removal of get-local/set-local. It's too early to know what's the optimal point in this space, but with this PR, we reserve ourselves the option in the future while, at the same time, providing the immediate concrete benefit of useful liveness info to single-pass compilers.

@qwertie
Copy link

qwertie commented May 24, 2016

@lukewagner You have a compelling point about cheap IoT devices. In your most recent message, I think what you're saying is that the switch to explicit-drop could allow the addition of certain features in the future, after we've had ample time to study what is most useful to add. Is that so? Can you describe any specific feature(s) that could be done with explicit drop that would be incompatible with an implicit-drop regime?

@kripken
Copy link
Member

kripken commented May 24, 2016

@JSStats: I wrote a pass to get rid of the extra blocks you noticed, WebAssembly/binaryen#540, and with that, we go from around 9% more to around 6% more, which is in line with the previous numbers. So overall it looks like the first set of numbers is now confirmed (6% more nodes, 5% larger binary, 5% larger after gzip).

@lukewagner: On the new data just mentioned, LZMA is 4.8% bigger, brotli is 7.3% bigger.

@ghost
Copy link

ghost commented May 25, 2016

@qwertie A use case for an operator table 'An operator table could allow a choice to discard the expression results of an operator.' is noted in #678

@kripken Thank you for the numbers and the extra pass.

@lukewagner I don't see how this PR moves to a more 'stack machine' style or unblocks a future path to this. Could you give a specific example of how a move to a stack machine style is blocked currently? Having some operators push back results onto the values stack, as currently done, seem consistent with a stack machine?

Getting repetitive, but I really think people need to consider the multiple value use case in such decisions.

Further, perhaps shoe horning the encoding into a stack ordering is part of the problem. How much better could the code generated by a single pass compiler be without the stack ordering restrictions? The stack ordering might not be the best order to minimise register pressure, and reordering might not be practical for a single pass compiler, but perhaps the producer could take this into account if not constrained to encode in the stack order?

For example:

(func $f1 (param $l1 i32) (param $l2 i32) (result i32)
  (call (i32.const 1) (i32.const 2) (i32.const 3) (i32.const 4) (i32.add (get_local $l1) (get_local $l2)))

Here it would help reduce register pressure to compute the i32.add first, and then load the constants. The single pass compiler might be able to delay materializing the constants in this example, but these might be more general operations that could still be better ordered to minimize register pressure. Perhaps a single pass compiler could delay code generation even further to be able to shuffle operations around, but it would be working around the stack order and perhaps this suggests a fundamental problem with the wasm stack order constraints.

The sexpr-wasm single pass compiler to a stack machine was very useful for me in exploring some ideas, and if we had something similar that targeted a simple virtual register based machine then perhaps it would be easier to explore some of these questions.

@lukewagner
Copy link
Member

@kripken Not throughput data yet, just the coarse measurement from @lars-t-hansen that this removes 687,251 spills on AngryBots. Naively assuming 4 bytes for each (ARM32), that's >2MiB of code, which is significant.

@JSStats That is validation-time; with this PR, the runtime semantics do not need to push anything for void-returning ops. If validation constraints were relaxed in the future, then even validation wouldn't need to push anything on the stack for void-returning ops.

@kripken
Copy link
Member

kripken commented May 25, 2016

@lukewagner: 2MB out of how much?

I also think it would be good to measure a 1.5 pass baseline. The 2MB figure is without that, I assume, i.e., it's doing the worst-case behavior on every set (assuming it can return)?

@lukewagner
Copy link
Member

@lars-t-hansen Do you have the total code size for AngryBots?

@kripken Based on rough compile-time data we have so far, the baseline is fast enough that raw decoding is a significant % of the total cost, meaning that taking a second pass of any kind will significantly increase overall compile-time.

@lars-t-hansen
Copy link
Contributor

@lukewagner I do not have any total code size numbers, but I'll try to collect some. I expect that, though it may be 2MB code, it's 2MB out of a lot of MB. (And it's not 2MB on x64, but probably rather less, since push-one-register is less than 1.5 bytes on average.)

@kripken Adding the drop analysis to the baseline compiler is probably not very interesting because it'll slow down the compiler and that defeats its main purpose. It's probably better to just live with the extra memory traffic in that case. (All the traffic will be writes to the stack top, probably not the most expensive operation in practice.) That said, I think it would not be a huge cost to perform the experiment.

As a general remark, I can absolutely live without the change, certainly until better data are available about the impact on the generated code's size and performance. At the moment, our baseline compiler generates ho-hum code, meaning it does not optimize things it could, does not keep values in registers when it should, and pushes values when it might have avoided it (within the limit of none of the necessary analyses costing much): in short, the data we're using for the decision are not the best. It seems to me that the discussion here is fruitful but from my point of view we do not have to rush a decision, so long as we do not close off the avenue for a drop opcode in the future.

@binji
Copy link
Member

binji commented May 25, 2016

It may be useful to look at the sexpr-wasm interpreter here. It runs two passes over the data, where the first pass performs type checking and determines which values are discarded, and the second pass emits the stack machine opcodes.

I added some instrumentation to the release build and ran it over AngryBots:

pass 0: 0.098574 sec
pass 1: 0.194209 sec

It's not easy to tell how much would be saved with explicit drops, but I wouldn't be surprised if it was a significant amount of that first pass.

@lars-t-hansen
Copy link
Contributor

Thanks for the data, that looks like a ~5% cost increase for the initial pass (plus some book-keeping data). If the extra stack traffic looks bad it may be worth paying 5% at compile time to get rid of it.

@kripken
Copy link
Member

kripken commented May 25, 2016

Very interesting data and analysis, thanks @binji, @lars-t-hansen, @lukewagner. That's more than enough to convince me, for this PR + tee.

Without tee, we do have a simpler design (which I like!), but the % code size increase (5% on total nodes, binary size, and gzip/lzma/brotli sizes, as mentioned above) is worrying. So overall I think I prefer this PR with tee.

@lukewagner
Copy link
Member

(noting addition of tee_local) lgtm

@kripken
Copy link
Member

kripken commented Jun 6, 2016

Some data: I am writing a pass to convert modules before this change to after this change. Looks like it causes a 1% increase in the number of AST nodes (measured on BB), which is more than I had expected. The culprit is mainly calls, things like libc memcpy which return a value that is almost never used.

@ghost
Copy link

ghost commented Jun 7, 2016

I don't think tee_local should be dependent on all the other changes here, rather it might be useful even without explicit dropping of values. Perhaps tee_local could be split out first, and there might be consensus to add it now, a small step.

There are reports above that the result of store operators are not frequently used. If this has been investigated and understood and still holds then might a separate PR change them to not return values too. Another small step.

@sunfishcode
Copy link
Member

The LLVM wasm back end has an optimization for memcpy et all in which it replaces uses of the first operand with uses of the return value. It would be interesting to see how that affects the metrics here.

@rossberg
Copy link
Member Author

rossberg commented Jun 7, 2016

@JSStats, tee_local is the same operator as set_local was before this change.

@ghost
Copy link

ghost commented Jun 7, 2016

@rossberg-chromium Yes, I understood this. But adding a set_local variant that does not return its value seems independent of the 'drop' PR, and there seems to be general consensus to make that change. It could be split out and landed separately first, and perhaps also changing the store operators to not return a value too if the data supports that. These would change the landscape significantly. It would then just be the call operators to consider.

@rossberg
Copy link
Member Author

rossberg commented Jun 7, 2016

@JSStats, without this PR there's no point in adding a variant of set_local that doesn't return anything.

@ghost
Copy link

ghost commented Jun 7, 2016

@rossberg-chromium I don't see it that way. One of the use cases for the drop PR is to eliminate the excess values on the values stack, and adding a variant of set_local that does not return a value significantly reduces the number of values on the values stack so has some benefit. It will diminish the case for the 'drop' PR, and combined with changing the store operators to not return values (if this is warranted) will go even further. There seems to be consensus that separate tee_local/set_local is useful - it can be landed first and be a separate PR, and it should show a useful improvement for the baseline compiler as it will not need to flush as many values to memory.

It will help a lot for an expressionless style of code where every operator with a result sets a local variable!

@lukewagner
Copy link
Member

Ok, so several lgtms here; can we merge this and create a separate PR on binary_0xc branch for the BinaryEncoding.md changes of adding drop and tee_local?

@ghost
Copy link

ghost commented Jun 12, 2016

@lukewagner I don't see any harm in adding the drop operator, an operator that consumes the single value argument and returns zero values. If people what to add that then it does not appear to limit the future path of wasm - it might have limited scope, but if you really want it then at worst it is a baggage operator.

However the following change is very significant and is making a distinct choice on the future path of wasm: 'In all constructs containing block-like sequences of expressions, all expressions but the last must not yield a value.'

Blocks currently have the potentially useful property of discarding excess values, and this could well be a strength for wasm. Being able to push values on the values stack that are not dropped in last-in-first-out order would significantly extend the expressiveness of the wasm language, it would allow a constant to be pushed and reused many times, it would allow multiples values from call operators to be pushed in the order received and used selectively in any order. If would allow expressions to be ordered to minimise register pressure rather than having to fit the stack ordering constraints. etc

I don't know which is the best path, but this change does appear to be making a distinct choice for wasm, and I don't think we have the data. With this proposed change wasm appears to be moving towards depending heavily on local variables, to be moving towards the expression-less style of coding for code patterns that do not fit the restricted stack expression pattern, and this might also be ok.

Does a decision on this really need to be made now, and do we have the data, or could this change be removed from the PR for now, adding the set_local, tee_local, drop and the 'Store operators do not produce a value.' changes alone for now and take another look at this?

@lukewagner
Copy link
Member

This change is actually the conservative one in that the validation rules could be backwards-compatibly relaxed in the future (re-allowing implicit drop), if there were a strong reason.

@ghost
Copy link

ghost commented Jun 13, 2016

@lukewagner That assumes that further changes are not made taking advantage of this limitation. You still have not articulate a plan wrt multiple values which makes it impossible to assess this matter, you still have no plan?? Would two weeks be a reasonable time frame for you to articulate a concrete plan in this area or conceded you have no plan here?

@qwertie
Copy link

qwertie commented Jun 13, 2016

@JSStats no plan wrt multiple values is needed as a prerequisite for this change. If someone later proposes "further changes" you don't like, it would make more sense to complain about those instead.

@ghost
Copy link

ghost commented Jun 14, 2016

@qwertie Perhaps, but I can see it now, people will point back here and say 'the decision was made here and if you didn't like it then you should have objected, bad luck'. Also it is more difficult to explore the alternative development paths if one is closed here, and if there is no plan then more development is needed. I am not objecting to people adding the drop operator and the tee_local and set_local changes and even the 'store' change - these give us more scope to explore so we can try different styles of code and run them through the various runtimes and explore the pros and cons.

@lukewagner
Copy link
Member

"The plan" is to have a lot more data and experience to know what the right route forward post-MVP re: multiple values. This PR preserves flexibility of future options while providing immediate short-term benefits to single-pass compilers.

@rossberg
Copy link
Member Author

Merging with LGTMs above and general consent that this is the right direction.

@rossberg rossberg merged commit 78644db into master Jun 15, 2016
@rossberg rossberg deleted the explicit-drop branch June 15, 2016 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants