-
Notifications
You must be signed in to change notification settings - Fork 695
Note that leading zeros in leb128 values have no semantic effect. #564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While this is implict in the current usage of the values, this clause is intended to guard against future specification additions that disguish wasm files based on redundant leading zeros. Also to guard again consumers treating files differently based on such choices. Perhaps also to require canonicalization if hashing for resource integrity.
See also #562 |
@@ -39,7 +39,7 @@ A two-byte little endian unsigned integer. | |||
A four-byte little endian unsigned integer. | |||
|
|||
### varuint32 | |||
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros. | |||
A [LEB128](https://en.wikipedia.org/wiki/LEB128) variable-length integer, limited to uint32 values. `varuint32` values may contain leading zeros. Leading zeros in `varuint32` values have no sematic effect and may be canonicalized away by tools or the communication pipeline. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"semantic"
We use leb128 in many places, so you can't just shorten an leb128 without updating "parent" lengths, e.g. a leb128-prefixed string's length is counted in the section's own length. What you wrote is technically correct, but slightly misleading IMO. |
@jfbastien Thank you for the feedback. Could someone point me to producers that actually need a fixed length leb128 to help me understand their perspective? At present it seems like a tooling issue to me that could be solved by a |
@JSStats see https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmBinary.h#471 and https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmBinary.h#360. You could certainly pack it, but it's inconvenient. |
@mbebenita Thank you. This example appears to be an embedded wast text to wasm binary encoder - it's a wasm assembly. The need for the patchable leb128s seems entirely due to the design of this assembly being single pass. I though a principle of wasm was to offload work to the producers and thus it seems very reasonable to expect assemblers to use multiple passes and thus I do not believe the demands of this assembler are reasonable, and I believe that the appropriate path is to improve the assembler. Further I note in the source code 'Patching is necessary due to the combination of a preorder encoding and a single-pass algorithm that only knows the precise opcode after visiting children. Switching to a postorder encoding will remove these methods.' I believe this problem is due entirely to this being 'single-pass', and that a multi-pass assembler would resolve the problem irrespective of the pre-order encoding. I don't believe it is the right direction to be fundamentally optimizing the wasm binary encoding to suit the needs of single pass offline tools. It just does not seem the right balance to me. I don't believe all the disadvantages and complexity that supporting such single pass assemblers will impose on the other tools is the right balance. Even a tool that just wants to copy thought these leb128's without loss needs to decode not only the number but also the length of the encoding. Compressors that need to be lossless will also need to encode this extra information. This example appears to be an internal assembler, targeting only the internal wasm decoder. Or does it write to an array that is visible? Perhaps another option is for the internal decoder to also accept fixed length numbers and for this internal assembler to target this layer using fixed length encodings rather than leb128. In other words, use a custom non-standard wasm encoding internally rather than imposing on the web facing standard. If we accept this then what do we say to the next tool developer who wants the least burden or the best performance for their tool and request the wasm spec bend to their needs? I think we need to draw a line and point to the principle that offline tools are expected to work harder to reduce the burden on the wasm runtime and pipeline. |
Another obvious solution that I missed was that a decoder accepting content from a single pass embedded assembler could simply allow leading zeros in leb128 but block them from code from other sources. This use case might be addressed without the wasm spec accepting leading zeros in leb128. |
While this is implict in the current usage of the values, this clause is intended to guard against future specification additions that disguish wasm files based on redundant leading zeros. Also to guard again consumers treating files differently based on such choices. Perhaps also to require canonicalization if hashing for resource integrity.