Padding/Byte alignment for binary file format #626

sailro · 2016-03-25T21:14:56Z

Hi,

I'm playing a bit with the wasm binary file format, and I think something important is missing in the documentation. (Or perhaps I am missing something).

Let's take an example in the build suite:
https://github.com/WebAssembly/build-suite/blob/master/emscripten/hello_world/a.out.wasm

For (I think) padding and alignment purposes, the padding feature of the LEB128 encoding is used when writing section sizes.

In the file above, the first "signature" section has a size of 41 bytes.
Using LEB128, the resulting binary should be 0x29
But in the file it is encoded as 0xA9 0x80 0x80 0x80 0x00
Thanks to the LEB128 padding support, when decoded, we have the same value '41' (0x80 is used for padding with LEB128, 0 is the end marker).

It would be great to have the exact padding/alignment strategy explained in the documentation, so that we can write tools able to write byte-perfect wasm files.

Thanks for your help!

binji · 2016-03-25T21:23:31Z

This is specified here: https://github.com/WebAssembly/design/blob/master/BinaryEncoding.md#varuint32

sailro · 2016-03-25T21:26:42Z

No it is NOT. I tried to explain my issue and you just want to close any issue as fast as possible, without reading or understanding the whole thing.

Ok understood, I will never post again. You can go back to sleep...

binji · 2016-03-25T21:31:44Z

My apologies. I thought you were asking where this behavior was specified, but it sounds like you're asking to change the LEB128 encoding to require canonicalized LEB values? This was discussed here #562 and here #564. Allowing padding makes it easier to write a single pass encoder without having to shift data down.

sailro · 2016-03-25T21:42:40Z

Well, I just want to know what is the formula used to say "Ok we'll pad with x bytes this time for this specific section". What are the rules. It is not related to the LEB128, it is related to the way wasm modules are written. Why 3 bytes for a section, why 4 bytes for another, why 2 bytes for the next one.

binji · 2016-03-25T21:47:37Z

There are no explicit rules, other than that padding a LEB128 value with zeroes is allowed. For example, https://github.com/WebAssembly/sexpr-wasm-prototype/ has a flag to pad unsigned LEB128 values or to canonicalize them. SpiderMonkey (which was used to generate the files in https://github.com/WebAssembly/build-suite) always pads section sizes to 5 bytes.

lukewagner · 2016-03-28T17:14:25Z

On interesting consequence of #601 is that some alternative varuint32 encodings don't support padding.

binji closed this as completed Mar 25, 2016

binji reopened this Mar 25, 2016

sailro closed this as completed Mar 28, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Padding/Byte alignment for binary file format #626

Padding/Byte alignment for binary file format #626

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

lukewagner commented Mar 28, 2016

Padding/Byte alignment for binary file format #626

Padding/Byte alignment for binary file format #626

Comments

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

sailro commented Mar 25, 2016

binji commented Mar 25, 2016

lukewagner commented Mar 28, 2016