Skip to content

Add end opcode to functions #666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 7 commits into from
Closed

Add end opcode to functions #666

wants to merge 7 commits into from

Conversation

rossberg
Copy link
Member

Now that #641 has landed, here's the PR for the last of the changes I suggested in #623.

Function bodies are implicitly delimited by their byte size in the current encoding. This is unfortunate for a couple of reasons:

  • With the current format, any stream decoder needs to perform two end checks on every opcode: against end-of-stream and against body size. (When streaming, you generally don't know the length of the stream up-front, nor whether it's well-formed, so neither check subsumes the other.) With this PR, only the end-of-stream check is needed.
  • This is the only piece of the binary format that stands in the way of formulating the entire format unambiguously as a grammar. Because it depends on non-local (and lower-level) context information. It would be rather sad to get 99% there but lose it on the last meter.

This PR hence adds an explicit end opcode to functions. Pros:

  • The binary format is fully structured and can be entirely parsed linearly from an abstract byte stream, without paying attention to any size information.
  • Sizes would only be needed to (a) seek through the stream if desired, (b) validating that they are consistent. That decouples concerns nicely.

Cons:

  • One extra redundant byte per function. Previous measurements suggest e.g. a 0.3% size increase on AngryBots.

titzer and others added 7 commits April 5, 2016 16:52
* Prettify section names

* Restructure encoding of function signatures

* Revert "[Binary 11] Update the version number to 0xB."

* Leave index space for growing the number of base types

* Comments addressed

* clarify how export/import names convert to JS strings (#569) (#573)

* When embedded in the web, clarify how export/import names convert to JS strings (#569)

* Fixes suggested by @jf

* Address more feedback

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html.  Simplified the decoding algorithm thanks to Luke's feedback.

* Access to proprietary APIs apart from HTML5 (#656)

* comments
@yurydelendik
Copy link

Some information can be stored after end of the function. For example, it can be used to add padding to each function for patching the binary data in the future. There shall be a statement about what to do with it (e.g. forbid any extra data after first function end).

@titzer
Copy link

titzer commented Apr 21, 2016

Another con is that it adds a redundant check in the decoder, since a decoder must always check the bounds of the bytes, anyway.

@rossberg
Copy link
Member Author

@yurydelendik, I don't see how the current format allows that. If so, it's a bug.

@titzer, see the first bullet point of the description. ;) At least for streaming decoders, I claim this change actually removes tons of checks.

@lukewagner
Copy link
Member

@rossberg-chromium I was excited that it might be possible to remove the end-of-body branch when validating unknown wasm too (you start by checking that body_length is in-bounds then that the last opcode is end), but I don't think that scheme is valid: the end opcode could be slurped up as part of an immediate of a preceding opcode so you can't rely on hitting it.

Now, when you are decoding known-valid wasm (as is in the case in SM for our parallel compiler threads), there is no body_end check to begin with and so this proposal would remove a per-top-level-iteration end-of-body check: you simply have the end case in the switch be responsible for breaking the loop so the loop becomes while(true).

So if we're doing bean-counting (none of this is remotely significant in overall perf), I think the latter saves more branches than the former.

Anyhow, +1 from me.

@rossberg
Copy link
Member Author

@lukewagner, sure, you still have to verify the size at the end of the body, and that the end opcode is there. You simply check that once you've finished decoding the body.

But only there. You don't have to check for the size limit at every operator anymore. So it's one check per function vs one check per operator in a streaming parser.

@lukewagner
Copy link
Member

@rossberg-chromium What I was trying to explain is that, if you are validating unknown wasm bytes, you still need to check some size limit (body or end of whole module) on every operator, just to make sure you're not running off the end.

@rossberg
Copy link
Member Author

@lukewagner, yes, you always need to check against end-of-stream. But without explicit end opcode you additionally need to check against the size limit. When decoding from a stream, you can't know which condition becomes true first, so you always have to check both, for each opcode. With this change, however, the latter check is no longer necessary.

@lukewagner
Copy link
Member

Ah hah, I see what you're getting at. I had been assuming the whole module was downloaded such that body-end always came first (MVP). Now technically, with streaming, you could check once against min(stream-end, body-end) etc etc, but that's a lot more complexity. Even our existing postorder code, now that I look at it, could get simpler/faster with the end opcode, so yes, double +1.

@titzer
Copy link

titzer commented Apr 21, 2016

I don't see why it's more complexity to check against a stream end. In the V8 decoder, it just uses one limit, and if there's an error, it sets the limit to the start, which terminates decoding. One check in the loop. Adding end only adds additional checks after decoding is finished; e.g. that one and only one end occurred. That seems totally redundant to me.

@ghost
Copy link

ghost commented Apr 21, 2016

Just a thought, but might some of the loss be gained back by making this an end operator rather than an end market and giving it return semantics. There are a lot of small functions in AngryBots that have a return as the last statement. It could even accept an expression with a value to return in the case that the function was defined to return a value. This might remove the encoding efficiency issue which seems to be the only objection?

@lukewagner
Copy link
Member

@titzer We're considering the streaming case where stream-end might be less than body-end (because you need to wait for the next chunk). Yes, you can still use 1 check in this case (taking the min of the two), but it seems simpler (and no more expensive) to just iterate until end. Anyhow, these are super-predictable branches so I doubt this is worth discussing from a perf angle. I do see how end would allow simpler code, though, in this streaming case.

@lukewagner
Copy link
Member

@JSStats Because wasm doesn't require a final return statement—it just uses the last expression if non-void—in some sense that is already the case for the end proposal: it's like a special "final return" opcode. Actually, I kindof like it more for that reason; it explains why it is natural to terminate your iterative loop from the end case of the switch.

@ghost
Copy link

ghost commented Apr 21, 2016

@lukewagner Yes, thank you, sorry for the noise. Making the end an explicit return probably does not address the size issue because the trailing return statements in AngryBots are redundant and the value of the last expression could be returned. I was pondering it from the perspective of a possible expressionless encoding where there is no last expression and it would need to interpret the end as an implicit return accepting values from registers.

@titzer
Copy link

titzer commented Apr 22, 2016

@lukewagner

There are a billion gotchas with a streaming decoder, like what happens when the stream ends in the middle of a LEB or between an opcode and its immediate, in the middle of a br_table, etc. I am not convinced end does anything to help here.

@rossberg
Copy link
Member Author

@titzer, the nice thing about a streaming decoder is that it keeps you honest: you have to stick to the stream abstraction, no shortcuts allowed. That actually makes it very simple. The spec decoder uses a stream for that reason.

@yurydelendik
Copy link

yurydelendik commented Jul 11, 2016

Looks like vendors might choose offset in the binary format as pc. This might be a small pro to have end opcode at the end of functions -- it will provide additional point for instrumentation/debug operations at the end of a function (e.g. play role of a physical offset of the function epilogue) or just set non-empty ranges/boundaries for empty functions.

@flagxor
Copy link
Member

flagxor commented Jul 27, 2016

I think this should get merged. Andreas, can you resolve the conflicts and land?

@rossberg
Copy link
Member Author

rossberg commented Aug 1, 2016

Merged manually to 0xC branch.

@rossberg rossberg closed this Aug 1, 2016
@rossberg rossberg deleted the body-end branch August 1, 2016 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants