Note that leading zeros in leb128 values have no semantic effect. #564

ghost · 2016-02-28T01:46:36Z

While this is implict in the current usage of the values, this clause is intended to guard against future specification additions that disguish wasm files based on redundant leading zeros. Also to guard again consumers treating files differently based on such choices. Perhaps also to require canonicalization if hashing for resource integrity.

ghost · 2016-02-28T01:51:27Z

See also #562

jfbastien · 2016-02-29T03:16:48Z

BinaryEncoding.md

@@ -39,7 +39,7 @@ A two-byte little endian unsigned integer.
 A four-byte little endian unsigned integer.

 ### varuint32
-A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros.
+A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros. Leading zeros in `varuint32` values have no sematic effect and may be canonicalized away by tools or the communication pipeline.


jfbastien · 2016-02-29T03:18:34Z

We use leb128 in many places, so you can't just shorten an leb128 without updating "parent" lengths, e.g. a leb128-prefixed string's length is counted in the section's own length. What you wrote is technically correct, but slightly misleading IMO.

ghost · 2016-02-29T04:56:36Z

@jfbastien Thank you for the feedback. Could someone point me to producers that actually need a fixed length leb128 to help me understand their perspective? At present it seems like a tooling issue to me that could be solved by a pack utility.

mbebenita · 2016-02-29T05:13:32Z

@JSStats see https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmBinary.h#471 and https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmBinary.h#360. You could certainly pack it, but it's inconvenient.

ghost · 2016-02-29T11:27:31Z

@mbebenita Thank you. This example appears to be an embedded wast text to wasm binary encoder - it's a wasm assembly. The need for the patchable leb128s seems entirely due to the design of this assembly being single pass. I though a principle of wasm was to offload work to the producers and thus it seems very reasonable to expect assemblers to use multiple passes and thus I do not believe the demands of this assembler are reasonable, and I believe that the appropriate path is to improve the assembler.

Further I note in the source code 'Patching is necessary due to the combination of a preorder encoding and a single-pass algorithm that only knows the precise opcode after visiting children. Switching to a postorder encoding will remove these methods.'

I believe this problem is due entirely to this being 'single-pass', and that a multi-pass assembler would resolve the problem irrespective of the pre-order encoding.

I don't believe it is the right direction to be fundamentally optimizing the wasm binary encoding to suit the needs of single pass offline tools. It just does not seem the right balance to me.

I don't believe all the disadvantages and complexity that supporting such single pass assemblers will impose on the other tools is the right balance. Even a tool that just wants to copy thought these leb128's without loss needs to decode not only the number but also the length of the encoding. Compressors that need to be lossless will also need to encode this extra information.

This example appears to be an internal assembler, targeting only the internal wasm decoder. Or does it write to an array that is visible? Perhaps another option is for the internal decoder to also accept fixed length numbers and for this internal assembler to target this layer using fixed length encodings rather than leb128. In other words, use a custom non-standard wasm encoding internally rather than imposing on the web facing standard.

If we accept this then what do we say to the next tool developer who wants the least burden or the best performance for their tool and request the wasm spec bend to their needs? I think we need to draw a line and point to the principle that offline tools are expected to work harder to reduce the burden on the wasm runtime and pipeline.

ghost · 2016-02-29T23:59:23Z

Another obvious solution that I missed was that a decoder accepting content from a single pass embedded assembler could simply allow leading zeros in leb128 but block them from code from other sources. This use case might be addressed without the wasm spec accepting leading zeros in leb128.

jfbastien reviewed Feb 29, 2016
View reviewed changes

binji mentioned this pull request Mar 25, 2016

Padding/Byte alignment for binary file format #626

Closed

ghost closed this Apr 16, 2016

ghost deleted the leb128-not-semantic branch April 16, 2016 08:15

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Note that leading zeros in leb128 values have no semantic effect. #564

Note that leading zeros in leb128 values have no semantic effect. #564

Uh oh!

ghost commented Feb 28, 2016

Uh oh!

ghost commented Feb 28, 2016

Uh oh!

jfbastien Feb 29, 2016

Uh oh!

jfbastien commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

mbebenita commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

Uh oh!

Note that leading zeros in leb128 values have no semantic effect. #564

Note that leading zeros in leb128 values have no semantic effect. #564

Uh oh!

Conversation

ghost commented Feb 28, 2016

Uh oh!

ghost commented Feb 28, 2016

Uh oh!

jfbastien Feb 29, 2016

Choose a reason for hiding this comment

Uh oh!

jfbastien commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

mbebenita commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

ghost commented Feb 29, 2016

Uh oh!

Uh oh!