Skip to content

Note that leading zeros in leb128 values have no semantic effect. #564

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Note that leading zeros in leb128 values have no semantic effect. #564

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Feb 28, 2016

While this is implict in the current usage of the values, this clause is intended to guard against future specification additions that disguish wasm files based on redundant leading zeros. Also to guard again consumers treating files differently based on such choices. Perhaps also to require canonicalization if hashing for resource integrity.

While this is implict in the current usage of the values, this clause
is intended to guard against future specification additions that
disguish wasm files based on redundant leading zeros. Also to guard
again consumers treating files differently based on such
choices. Perhaps also to require canonicalization if hashing for
resource integrity.
@ghost
Copy link
Author

ghost commented Feb 28, 2016

See also #562

@@ -39,7 +39,7 @@ A two-byte little endian unsigned integer.
A four-byte little endian unsigned integer.

### varuint32
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros.
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros. Leading zeros in `varuint32` values have no sematic effect and may be canonicalized away by tools or the communication pipeline.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"semantic"

@jfbastien
Copy link
Member

We use leb128 in many places, so you can't just shorten an leb128 without updating "parent" lengths, e.g. a leb128-prefixed string's length is counted in the section's own length. What you wrote is technically correct, but slightly misleading IMO.

@ghost
Copy link
Author

ghost commented Feb 29, 2016

@jfbastien Thank you for the feedback. Could someone point me to producers that actually need a fixed length leb128 to help me understand their perspective? At present it seems like a tooling issue to me that could be solved by a pack utility.

@mbebenita
Copy link
Contributor

@ghost
Copy link
Author

ghost commented Feb 29, 2016

@mbebenita Thank you. This example appears to be an embedded wast text to wasm binary encoder - it's a wasm assembly. The need for the patchable leb128s seems entirely due to the design of this assembly being single pass. I though a principle of wasm was to offload work to the producers and thus it seems very reasonable to expect assemblers to use multiple passes and thus I do not believe the demands of this assembler are reasonable, and I believe that the appropriate path is to improve the assembler.

Further I note in the source code 'Patching is necessary due to the combination of a preorder encoding and a single-pass algorithm that only knows the precise opcode after visiting children. Switching to a postorder encoding will remove these methods.'

I believe this problem is due entirely to this being 'single-pass', and that a multi-pass assembler would resolve the problem irrespective of the pre-order encoding.

I don't believe it is the right direction to be fundamentally optimizing the wasm binary encoding to suit the needs of single pass offline tools. It just does not seem the right balance to me.

I don't believe all the disadvantages and complexity that supporting such single pass assemblers will impose on the other tools is the right balance. Even a tool that just wants to copy thought these leb128's without loss needs to decode not only the number but also the length of the encoding. Compressors that need to be lossless will also need to encode this extra information.

This example appears to be an internal assembler, targeting only the internal wasm decoder. Or does it write to an array that is visible? Perhaps another option is for the internal decoder to also accept fixed length numbers and for this internal assembler to target this layer using fixed length encodings rather than leb128. In other words, use a custom non-standard wasm encoding internally rather than imposing on the web facing standard.

If we accept this then what do we say to the next tool developer who wants the least burden or the best performance for their tool and request the wasm spec bend to their needs? I think we need to draw a line and point to the principle that offline tools are expected to work harder to reduce the burden on the wasm runtime and pipeline.

@ghost
Copy link
Author

ghost commented Feb 29, 2016

Another obvious solution that I missed was that a decoder accepting content from a single pass embedded assembler could simply allow leading zeros in leb128 but block them from code from other sources. This use case might be addressed without the wasm spec accepting leading zeros in leb128.

@ghost ghost closed this Apr 16, 2016
@ghost ghost deleted the leb128-not-semantic branch April 16, 2016 08:15
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants