-
Notifications
You must be signed in to change notification settings - Fork 696
Add end opcode to functions #666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Binary 0xb version
Add current_memory operator
* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments
Postorder opcodes
Some information can be stored after end of the function. For example, it can be used to add padding to each function for patching the binary data in the future. There shall be a statement about what to do with it (e.g. forbid any extra data after first function end). |
Another con is that it adds a redundant check in the decoder, since a decoder must always check the bounds of the bytes, anyway. |
@yurydelendik, I don't see how the current format allows that. If so, it's a bug. @titzer, see the first bullet point of the description. ;) At least for streaming decoders, I claim this change actually removes tons of checks. |
@rossberg-chromium I was excited that it might be possible to remove the end-of-body branch when validating unknown wasm too (you start by checking that body_length is in-bounds then that the last opcode is Now, when you are decoding known-valid wasm (as is in the case in SM for our parallel compiler threads), there is no body_end check to begin with and so this proposal would remove a per-top-level-iteration end-of-body check: you simply have the So if we're doing bean-counting (none of this is remotely significant in overall perf), I think the latter saves more branches than the former. Anyhow, +1 from me. |
@lukewagner, sure, you still have to verify the size at the end of the body, and that the But only there. You don't have to check for the size limit at every operator anymore. So it's one check per function vs one check per operator in a streaming parser. |
@rossberg-chromium What I was trying to explain is that, if you are validating unknown wasm bytes, you still need to check some size limit (body or end of whole module) on every operator, just to make sure you're not running off the end. |
@lukewagner, yes, you always need to check against end-of-stream. But without explicit |
Ah hah, I see what you're getting at. I had been assuming the whole module was downloaded such that body-end always came first (MVP). Now technically, with streaming, you could check once against |
I don't see why it's more complexity to check against a stream end. In the V8 decoder, it just uses one limit, and if there's an error, it sets the limit to the start, which terminates decoding. One check in the loop. Adding end only adds additional checks after decoding is finished; e.g. that one and only one end occurred. That seems totally redundant to me. |
Just a thought, but might some of the loss be gained back by making this an end operator rather than an end market and giving it |
@titzer We're considering the streaming case where stream-end might be less than body-end (because you need to wait for the next chunk). Yes, you can still use 1 check in this case (taking the |
@JSStats Because wasm doesn't require a final return statement—it just uses the last expression if non-void—in some sense that is already the case for the |
@lukewagner Yes, thank you, sorry for the noise. Making the end an explicit return probably does not address the size issue because the trailing |
There are a billion gotchas with a streaming decoder, like what happens when the stream ends in the middle of a LEB or between an opcode and its immediate, in the middle of a br_table, etc. I am not convinced end does anything to help here. |
@titzer, the nice thing about a streaming decoder is that it keeps you honest: you have to stick to the stream abstraction, no shortcuts allowed. That actually makes it very simple. The spec decoder uses a stream for that reason. |
Looks like vendors might choose offset in the binary format as pc. This might be a small pro to have end opcode at the end of functions -- it will provide additional point for instrumentation/debug operations at the end of a function (e.g. play role of a physical offset of the function epilogue) or just set non-empty ranges/boundaries for empty functions. |
I think this should get merged. Andreas, can you resolve the conflicts and land? |
Merged manually to 0xC branch. |
Now that #641 has landed, here's the PR for the last of the changes I suggested in #623.
Function bodies are implicitly delimited by their byte size in the current encoding. This is unfortunate for a couple of reasons:
This PR hence adds an explicit
end
opcode to functions. Pros:Cons: