-
Notifications
You must be signed in to change notification settings - Fork 696
Tweaks to binary section format? #623
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
If I understand correctly the whole string is going into the binary. Could each section type be mapped to an
If you have an end op could you drop the function body size? Could you reuse the |
+1 to all of these. For the last one, we'd be adding an extra byte, but removing the varuint32 for "body size", shouldn't that be an overall win? 1 byte vs. 1+. |
Agreed, I found it strange that the size of the section occurs before the section name as well.
+1. These are the basic sections that all modules will have, let's make them succinct.
OK with all of these except maybe
+1 |
This is how it used to be, but was changed to allow extending the format later with new section types.
The postorder format already adds an |
Does |
@rossberg-chromium I also find it a little odd that the function top level expressions are terminated by a stream byte count whereas other blocks are terminated differently (count or end-marker). I don't think one extra byte per function is a blocker. Making them consistent would appear to give a small reduction in complexity for the decoder. @kripken The function body size is still useful for the use case of skipping them quickly, for building a directory of the position and size of each function body, but I guess this could be in an optional directory section. Thus it seems possible to remove the function body size too. So a decoder reading an incorrectly balanced function body AST would just continue interpreting the following data, perhaps hitting a 'syntax' error, and at the end of the section checking that all functions had been decoded and that the section terminates on the correct section size? |
For fast skipping of function bodies, we could allow multiple function sections. However, I'm not sure we need to focus on optimizing skipping of bodies? Skipping entire sections is useful, but skipping bodies seems unnecessary given we now have signatures beforehand anyhow. Normal VM operation won't want to skip function bodies. |
@kripken we'll want to quickly skip function bodies for parallel decoding. |
Oh, right, thanks. |
Section names: it's a bikeshed of course, but I prefer the more descriptive names. In particular, "functions", "types", and "code" seem likely to confuse. |
Regarding the |
+1 That is actually how I implemented it initially in SM and I also found the switch to make the code a bit grosser.
Hah, this was also on my personal "tidy-up todo".
+1. Symmetric to this, there's also an implicit dependency on the whole module's byte size in the current proposal in #597 that we'd probably want to replace with, perhaps, an up-front section count (after magic+version, before first section). |
On Thu, Mar 24, 2016 at 3:36 PM, rossberg-chromium <[email protected]
Ok by me.
+1
|
@titzer Yes, I guess with 37k functions and one byte each which gives 0.3% of the wasm file size this is not insignificant for such a small matter and there seems no prospect of it being optimized in the operator table or with macros. Might question if the current AngryBots is representative as it seems to have a lot of small functions which might be trivial methods that should have been inlined anyway. |
[Back from vacation.] It seems there is mostly consensus for the first two suggestions, and sympathy for the latter. I'll create separate PRs for each suggestion, and we can iterate on the details there. @binji, I picked "code" and "data" to mirror the usual code/data dichotomy familiar from other binaries. Not attached to that, though. @jfbastien, yes, "string" is supposed to denote a (length, bytes) pair. @kripken, as others pointed out, we still have use for the function size. @JSStats, regarding the types vs signatures section, we already decided at the Feb summit to make the function type constructor explicit to support extensibility. It's just that nobody got around turning that one into a PR yet. I'll do. @titzer, the size bound introduces a secondary end-of-stream condition. With actual streaming, you cannot know upfront which end-of-stream condition will fire first (as the size value might be invalid), so you always have to check against both. |
Looking at the sexp-wasm single pass compiler-to-stack-machine, there are some frustrating pieces of code that seem relevant to the function end declaration matter. For each function top level expression it needs to know the number of expected values to decide if they are to be discarded, but not knowing this it emits a discard operation and then if at the end of the function it needs the values it removes this discard operation. This also blocks top-down typing in a single pass because the expected type of a function top level expression is not known when starting to build the expression. This all suggests that for the pre-order encoding having an expression count, just like for the blocks, would help simplify the decoder. This comment seems specific to the pre-order decoder, so might be of limited relevance moving forward. |
I think this is long done. @rossberg-chromium please reopen if that's not the case. |
Having finished a first iteration of the encoder & decoder for the spec, I'd like to make a couple of small suggestions regarding the structure of sections in the binary. Lumping the together here.
Section Headers
Instead of
(payload_size; name_string; payload)
as the overall section structure, can we swap that to
(name_string; payload_size; payload)
? Two minor advantages:
(name_string; payload_string)
,which is particularly natural wrt handling and skipping unknown sections.
With that change, skipping over sections requires skipping over their name first. But I have a hard time imagining a reason for skipping a section without even knowing what it is.
Section Names
Honestly, the current names are super verbose. Would anybody be opposed to shorten them a bit, to something nicer? I'd suggest:
Note that the
signatures
section may be generalised to contain other kinds of type definitions in the future, so the current name is not a good fit.Function Bodies
Function bodies are implicitly delimited by the byte size of their encoding. This is unfortunate for a couple of reasons:
I'd hence like to suggest adding an explicit
end
opcode to functions. Pros:I'm aware that there are concerns that the extra byte is "redundant", but is one byte per function a big deal? We recently saved much more by improving the representation of locals (or would by shortening section names :) ).
The text was updated successfully, but these errors were encountered: