Skip to content

Padding/Byte alignment for binary file format #626

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sailro opened this issue Mar 25, 2016 · 6 comments
Closed

Padding/Byte alignment for binary file format #626

sailro opened this issue Mar 25, 2016 · 6 comments

Comments

@sailro
Copy link

sailro commented Mar 25, 2016

Hi,

I'm playing a bit with the wasm binary file format, and I think something important is missing in the documentation. (Or perhaps I am missing something).

Let's take an example in the build suite:
https://github.com/WebAssembly/build-suite/blob/master/emscripten/hello_world/a.out.wasm

For (I think) padding and alignment purposes, the padding feature of the LEB128 encoding is used when writing section sizes.

In the file above, the first "signature" section has a size of 41 bytes.
Using LEB128, the resulting binary should be 0x29
But in the file it is encoded as 0xA9 0x80 0x80 0x80 0x00
Thanks to the LEB128 padding support, when decoded, we have the same value '41' (0x80 is used for padding with LEB128, 0 is the end marker).

image

It would be great to have the exact padding/alignment strategy explained in the documentation, so that we can write tools able to write byte-perfect wasm files.

Thanks for your help!

@binji
Copy link
Member

binji commented Mar 25, 2016

@binji binji closed this as completed Mar 25, 2016
@sailro
Copy link
Author

sailro commented Mar 25, 2016

No it is NOT. I tried to explain my issue and you just want to close any issue as fast as possible, without reading or understanding the whole thing.

Ok understood, I will never post again. You can go back to sleep...

@binji binji reopened this Mar 25, 2016
@binji
Copy link
Member

binji commented Mar 25, 2016

My apologies. I thought you were asking where this behavior was specified, but it sounds like you're asking to change the LEB128 encoding to require canonicalized LEB values? This was discussed here #562 and here #564. Allowing padding makes it easier to write a single pass encoder without having to shift data down.

@sailro
Copy link
Author

sailro commented Mar 25, 2016

Well, I just want to know what is the formula used to say "Ok we'll pad with x bytes this time for this specific section". What are the rules. It is not related to the LEB128, it is related to the way wasm modules are written. Why 3 bytes for a section, why 4 bytes for another, why 2 bytes for the next one.

@binji
Copy link
Member

binji commented Mar 25, 2016

There are no explicit rules, other than that padding a LEB128 value with zeroes is allowed. For example, https://github.com/WebAssembly/sexpr-wasm-prototype/ has a flag to pad unsigned LEB128 values or to canonicalize them. SpiderMonkey (which was used to generate the files in https://github.com/WebAssembly/build-suite) always pads section sizes to 5 bytes.

@sailro sailro closed this as completed Mar 28, 2016
@lukewagner
Copy link
Member

On interesting consequence of #601 is that some alternative varuint32 encodings don't support padding.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants