Add end opcode to functions #666

rossberg · 2016-04-21T15:09:11Z

Now that #641 has landed, here's the PR for the last of the changes I suggested in #623.

Function bodies are implicitly delimited by their byte size in the current encoding. This is unfortunate for a couple of reasons:

With the current format, any stream decoder needs to perform two end checks on every opcode: against end-of-stream and against body size. (When streaming, you generally don't know the length of the stream up-front, nor whether it's well-formed, so neither check subsumes the other.) With this PR, only the end-of-stream check is needed.
This is the only piece of the binary format that stands in the way of formulating the entire format unambiguously as a grammar. Because it depends on non-local (and lower-level) context information. It would be rather sad to get 99% there but lose it on the last meter.

This PR hence adds an explicit end opcode to functions. Pros:

The binary format is fully structured and can be entirely parsed linearly from an abstract byte stream, without paying attention to any size information.
Sizes would only be needed to (a) seek through the stream if desired, (b) validating that they are consistent. That decouples concerns nicely.

Cons:

One extra redundant byte per function. Previous measurements suggest e.g. a 0.3% size increase on AngryBots.

Binary 0xb version

Add current_memory operator

@jf

* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments

Postorder opcodes

yurydelendik · 2016-04-21T15:30:52Z

Some information can be stored after end of the function. For example, it can be used to add padding to each function for patching the binary data in the future. There shall be a statement about what to do with it (e.g. forbid any extra data after first function end).

titzer · 2016-04-21T15:39:55Z

Another con is that it adds a redundant check in the decoder, since a decoder must always check the bounds of the bytes, anyway.

rossberg · 2016-04-21T15:51:13Z

@yurydelendik, I don't see how the current format allows that. If so, it's a bug.

@titzer, see the first bullet point of the description. ;) At least for streaming decoders, I claim this change actually removes tons of checks.

lukewagner · 2016-04-21T16:35:31Z

@rossberg-chromium I was excited that it might be possible to remove the end-of-body branch when validating unknown wasm too (you start by checking that body_length is in-bounds then that the last opcode is end), but I don't think that scheme is valid: the end opcode could be slurped up as part of an immediate of a preceding opcode so you can't rely on hitting it.

Now, when you are decoding known-valid wasm (as is in the case in SM for our parallel compiler threads), there is no body_end check to begin with and so this proposal would remove a per-top-level-iteration end-of-body check: you simply have the end case in the switch be responsible for breaking the loop so the loop becomes while(true).

So if we're doing bean-counting (none of this is remotely significant in overall perf), I think the latter saves more branches than the former.

Anyhow, +1 from me.

rossberg · 2016-04-21T16:50:20Z

@lukewagner, sure, you still have to verify the size at the end of the body, and that the end opcode is there. You simply check that once you've finished decoding the body.

But only there. You don't have to check for the size limit at every operator anymore. So it's one check per function vs one check per operator in a streaming parser.

lukewagner · 2016-04-21T18:10:34Z

@rossberg-chromium What I was trying to explain is that, if you are validating unknown wasm bytes, you still need to check some size limit (body or end of whole module) on every operator, just to make sure you're not running off the end.

rossberg · 2016-04-21T18:27:54Z

@lukewagner, yes, you always need to check against end-of-stream. But without explicit end opcode you additionally need to check against the size limit. When decoding from a stream, you can't know which condition becomes true first, so you always have to check both, for each opcode. With this change, however, the latter check is no longer necessary.

lukewagner · 2016-04-21T19:37:25Z

Ah hah, I see what you're getting at. I had been assuming the whole module was downloaded such that body-end always came first (MVP). Now technically, with streaming, you could check once against min(stream-end, body-end) etc etc, but that's a lot more complexity. Even our existing postorder code, now that I look at it, could get simpler/faster with the end opcode, so yes, double +1.

titzer · 2016-04-21T20:13:35Z

I don't see why it's more complexity to check against a stream end. In the V8 decoder, it just uses one limit, and if there's an error, it sets the limit to the start, which terminates decoding. One check in the loop. Adding end only adds additional checks after decoding is finished; e.g. that one and only one end occurred. That seems totally redundant to me.

ghost · 2016-04-21T20:47:19Z

Just a thought, but might some of the loss be gained back by making this an end operator rather than an end market and giving it return semantics. There are a lot of small functions in AngryBots that have a return as the last statement. It could even accept an expression with a value to return in the case that the function was defined to return a value. This might remove the encoding efficiency issue which seems to be the only objection?

lukewagner · 2016-04-21T20:57:55Z

@titzer We're considering the streaming case where stream-end might be less than body-end (because you need to wait for the next chunk). Yes, you can still use 1 check in this case (taking the min of the two), but it seems simpler (and no more expensive) to just iterate until end. Anyhow, these are super-predictable branches so I doubt this is worth discussing from a perf angle. I do see how end would allow simpler code, though, in this streaming case.

lukewagner · 2016-04-21T21:01:17Z

@JSStats Because wasm doesn't require a final return statement—it just uses the last expression if non-void—in some sense that is already the case for the end proposal: it's like a special "final return" opcode. Actually, I kindof like it more for that reason; it explains why it is natural to terminate your iterative loop from the end case of the switch.

ghost · 2016-04-21T21:03:38Z

@lukewagner Yes, thank you, sorry for the noise. Making the end an explicit return probably does not address the size issue because the trailing return statements in AngryBots are redundant and the value of the last expression could be returned. I was pondering it from the perspective of a possible expressionless encoding where there is no last expression and it would need to interpret the end as an implicit return accepting values from registers.

titzer · 2016-04-22T13:37:12Z

@lukewagner

There are a billion gotchas with a streaming decoder, like what happens when the stream ends in the middle of a LEB or between an opcode and its immediate, in the middle of a br_table, etc. I am not convinced end does anything to help here.

rossberg · 2016-04-22T13:44:54Z

@titzer, the nice thing about a streaming decoder is that it keeps you honest: you have to stick to the stream abstraction, no shortcuts allowed. That actually makes it very simple. The spec decoder uses a stream for that reason.

yurydelendik · 2016-07-11T14:21:16Z

Looks like vendors might choose offset in the binary format as pc. This might be a small pro to have end opcode at the end of functions -- it will provide additional point for instrumentation/debug operations at the end of a function (e.g. play role of a physical offset of the function epilogue) or just set non-empty ranges/boundaries for empty functions.

flagxor · 2016-07-27T13:13:06Z

I think this should get merged. Andreas, can you resolve the conflicts and land?

rossberg · 2016-08-01T11:54:42Z

Merged manually to 0xC branch.

titzer and others added 7 commits April 5, 2016 16:52

Merge pull request #643 from WebAssembly/binary_0xb_version

2bece96

Binary 0xb version

Merge pull request #648 from WebAssembly/current_memory

a15a049

Add current_memory operator

Reorder section size field (#639)

9a7f8be

Prettify section names (#638)

07c9074

Merge pull request #641 from WebAssembly/postorder_opcodes

d18e5fd

Postorder opcodes

Add end opcode to functions

55c8149

sunfishcode mentioned this pull request Apr 26, 2016

Remove loop's bottom label. #652

Closed

lukewagner force-pushed the binary_0xb branch from aa114ef to a052764 Compare April 28, 2016 21:00

sunfishcode modified the milestone: MVP Jul 8, 2016

flagxor added the binary format label Jul 20, 2016

rossberg closed this Aug 1, 2016

rossberg deleted the body-end branch August 1, 2016 11:54

Add end opcode to functions #666

Add end opcode to functions #666

Uh oh!

Conversation

rossberg commented Apr 21, 2016

Uh oh!

yurydelendik commented Apr 21, 2016

Uh oh!

titzer commented Apr 21, 2016

Uh oh!

rossberg commented Apr 21, 2016

Uh oh!

lukewagner commented Apr 21, 2016

Uh oh!

rossberg commented Apr 21, 2016

Uh oh!

lukewagner commented Apr 21, 2016

Uh oh!

rossberg commented Apr 21, 2016

Uh oh!

lukewagner commented Apr 21, 2016

Uh oh!

titzer commented Apr 21, 2016

Uh oh!

ghost commented Apr 21, 2016

Uh oh!

lukewagner commented Apr 21, 2016

Uh oh!

lukewagner commented Apr 21, 2016

Uh oh!

ghost commented Apr 21, 2016

Uh oh!

titzer commented Apr 22, 2016

Uh oh!

rossberg commented Apr 22, 2016

Uh oh!

yurydelendik commented Jul 11, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

flagxor commented Jul 27, 2016

Uh oh!

rossberg commented Aug 1, 2016

Uh oh!

Uh oh!

yurydelendik commented Jul 11, 2016 •

edited

Loading