Skip to content
This repository was archived by the owner on Nov 3, 2021. It is now read-only.

Incompatibility with long LEB128 form from the core spec #153

Closed
RReverser opened this issue Jun 3, 2020 · 12 comments
Closed

Incompatibility with long LEB128 form from the core spec #153

RReverser opened this issue Jun 3, 2020 · 12 comments

Comments

@RReverser
Copy link
Member

While implementing bulk memory proposal in my parser, I've noticed regressions in some core spec tests. In particular these two are very representative:

(module binary
"\00asm" "\01\00\00\00"
"\05\03\01" ;; Memory section with 1 entry
"\00\00" ;; no max, minimum 0
"\0b\07\01" ;; Data section with 1 entry
"\80\00" ;; Memory index 0, encoded with 2 bytes
"\41\00\0b\00" ;; (i32.const 0) with contents ""
)
(module binary
"\00asm" "\01\00\00\00"
"\04\04\01" ;; Table section with 1 entry
"\70\00\00" ;; no max, minimum 0, funcref
"\09\07\01" ;; Element section with 1 entry
"\80\00" ;; Table index 0, encoded with 2 bytes
"\41\00\0b\00" ;; (i32.const 0) with no elements
)

Here, a 0 table index (the only one allowed previously) is encoded in a suboptimal form of 0x80 0x00.

The new layout used by this proposal relies on the table index previously being always zero and encoded as 0x00, and assigns a different meaning to byte at the same position as 0x80 and expects it to be one of the following values:

image

image

Such long form is likely not something used by implementations in the wild, but it might be worth double-checking and either supporting such long form too, or at least making an explicit decision about this backward-incompatible binary representation change and removing (?) corresponding tests.

gahaas added a commit that referenced this issue Jun 4, 2020
The encoding of the table index in binary-leb128.wast is incorrect with the bulk-memory extensions, see #153. I saw and fixed the issue first in the reference types proposal (see WebAssembly/reference-types#95), but apparently it also exists here.
@gahaas
Copy link
Contributor

gahaas commented Jun 4, 2020

I saw and fixed this issue already in the reference-type proposal. I guess we should also apply the fix here. I created a PR for that.

@RReverser
Copy link
Member Author

@gahaas Note that the test above (for data section) exhibits the same problem, as well as 2 more other tests which previously used long form of zero.

@RReverser
Copy link
Member Author

RReverser commented Jun 4, 2020

Just looked up, these are the other 2 tests this is another test incompatible with new layout (aside from 2 above):

;; Data segment memory index can have non-minimal length
(module binary
"\00asm" "\01\00\00\00"
"\05\03\01" ;; Memory section with 1 entry
"\00\00" ;; no max, minimum 0
"\0b\07\01" ;; Data section with 1 entry
"\80\00" ;; Memory index 0, encoded with 2 bytes
"\41\00\0b\00" ;; (i32.const 0) with contents ""
)

@RReverser
Copy link
Member Author

@binji @gahaas Is this going to be resolved in one way or another before moving to stage 4? Seems dangerous from backward compatibility PoV.

@binji
Copy link
Member

binji commented Jun 22, 2020

I doubt the long form is used anywhere (it would only happen if someone did it manually), so the backward compatibility risk seems relatively low. But you're right that it would be good to mention and remove tests.

@RReverser
Copy link
Member Author

it would only happen if someone did it manually

It could in theory also happen if an encoder used a 5-byte-padded LEB128 form to start with (like e.g. LLVM does at initial stage), but then didn't optimise it away in post-processing.

@binji
Copy link
Member

binji commented Jun 23, 2020

Perhaps, though generally tools only do that for LEB128 values that represent a count/size that needs to be fixed-up.

@RReverser
Copy link
Member Author

I assumed that index encoding shares same routines as count/size?

@binji
Copy link
Member

binji commented Jun 24, 2020

Binaryen has different functions for fixed- and variable-length LEB128: https://github.com/WebAssembly/binaryen/blob/master/src/wasm-binary.h#L73-L104
So does wabt: https://github.com/WebAssembly/wabt/blob/master/src/leb128.cc#L49-L65

Maybe LLVM does it differently?

@RReverser
Copy link
Member Author

Thanks for checking! I guess if both Binaryen and LLVM don't do this, we should be on safe-ish side as that covers most toolchains.

But you're right that it would be good to mention and remove tests.

Let's just do this then (test removal and explicit documentation).

gahaas added a commit that referenced this issue Oct 12, 2020
The encoding of the table index in binary-leb128.wast is incorrect with the bulk-memory extensions, see #153. I saw and fixed the issue first in the reference types proposal (see WebAssembly/reference-types#95), but apparently it also exists here.
RReverser added a commit to GoogleChromeLabs/wasmbin that referenced this issue Oct 19, 2020
@rossberg
Copy link
Member

I believe this was resolved.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants