Skip to content

Extensible encoding of function signatures #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 6 additions & 2 deletions BinaryEncoding.md
Original file line number Diff line number Diff line change
Expand Up @@ -119,11 +119,15 @@ The type section declares all function signatures that will be used in the modul
| entries | `type_entry*` | repeated type entries as described below |

#### Type entry
| Field | Type | Description |
| Field | Type/Value | Description |
| ----- | ----- | ----- |
| constructor | `0x05` | the function type constructor |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check my understanding of this interpretation of "constructor":

  • we're viewing i32 and the other local types as having nullary constructors that construct the local type
  • 0x5, the function constructor, constructs a function type given parameters/returns
  • in the local type declaration of a function body, we have a list of (lists of) locals that declare their type via integer index. In a future of richer (GC) types where we had an safe/opaque function-pointer type, the way you declare a local of func-ptr type would be to first emit the index of the function constructor, 0x5 which told the decoder to decode a second index which was an index into this types section array. At a higher level, you're saying first which constructor and then providing the arguments to the constructor.
  • If we had a "struct type" constructor, it could be, say, 0x6, and the same scheme above would apply and the validation rules would require that the index following a local type 0x5/0x6 had to match on the constructor field

If that's all right, then that seems great. Maybe it would be good to update GC.md to include this plan of record.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Apr 5, 2016 at 5:49 PM, Luke Wagner [email protected]
wrote:

In BinaryEncoding.md
#636 (comment):

| ----- | ----- | ----- |
+| constructor | 0x05 | the function type constructor |

Let me check my understanding of this interpretation of "constructor":

  • we're viewing i32 and the other local types as having nullary
    constructors that construct the local type
  • 0x5, the function constructor, constructs a function type given
    parameters/returns
  • in the local type declaration of a function body, we have a list of
    (lists of) locals that declare their type via integer index. In a future of
    richer (GC) types where we had an safe/opaque function-pointer type, the
    way you declare a local of func-ptr type would be to first emit the index
    of the function constructor, 0x5 which told the decoder to decode a
    second index which was an index into this types section array. At a higher
    level, you're saying first which constructor and then providing the
    arguments to the constructor.
  • If we had a "struct type" constructor, it could be, say, 0x6, and
    the same scheme above would apply and the validation rules would require
    that the index following a local type 0x5/0x6 had to match on the
    constructor field

If that's all right, then that seems great. Maybe it would be good to
update GC.md to include this plan of record.

I was thinking that we might unify the index spaces so that primitives are
low numbers (easily recognizable), and numbers larger than that would just
be indexes into the types section. Then there's no secondary validation
needed.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/636/files/bb8a6e43af9bc6b5b904a5124062b59ae06d1304#r58563196

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, so if the was greater-than a fixed base, you'd subtract the base and use that as the index into the types section? (Or we could do the analogous thing with signed varint32 and negatives.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And now I remember the original question that motivated my post: given this single index space, then why does the constructor field need to start at any other value than 0? Function (and later struct, array) constructors are not in the same index space as the primitive types nor the elements of the types section.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Apr 5, 2016 at 6:53 PM, Luke Wagner [email protected]
wrote:

In BinaryEncoding.md
#636 (comment):

| ----- | ----- | ----- |
+| constructor | 0x05 | the function type constructor |

And now I remember the original question that motivated my post: given
this single index space, then why does the constructor field need to
start at any other value than 0? Function (and later struct, array)
constructors are not in the same index space anymore.

Yep, good point.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/636/files/bb8a6e43af9bc6b5b904a5124062b59ae06d1304#r58574302

| param_count | `varuint32` | the number of parameters to the function |
| return_type | `value_type?` | the return type of the function, with `0` indicating no return type |
| param_types | `value_type*` | the parameter types of the function |
| return_count | `uint8` | the number of results from the function (0 or 1) |
| return_type | `value_type?` | the result type of the function (if return_count is 1) |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as with sections, a string might be needed for each class of type declaration to avoid name clashes, and a size will be needed to skip unknown type declarations.

Might it be prudent to just define the return_count as a varuint32?
Could the return_type be defined a having return_count repeated value_type entries?

The 0 and 1 return_count limitation might not even be a limit for some tools. For example a tool to convert to the text format need not be limited to just 0 and 1, so perhaps the limit should be noted elsewhere, perhaps in validation rules for the MVP, and not a property of the binary encoding?

This is adding one byte per function that seems unnecessary and elsewhere some people have objected to this.

The function signatures are ordered, in the same order as the function bodies are defined, and the count defines the number of functions which seems useful for loading a vector etc. The count now would be the number of type declarations.

Is there any technical reason for not just using separate sections for future classes of type declarations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On 5 April 2016 at 15:23, JSStats [email protected] wrote:

In BinaryEncoding.md
#636 (comment):

| param_types | value_type* | the parameter types of the function |
+| return_count | uint8 | the number of results from the function (0 or 1) |
+| return_type | value_type? | the result type of the function (if return_count is 1) |

Just as with sections, a string might be needed for each class of type
declaration to avoid name clashes, and a size will be needed to skip
unknown type declarations.

Unlike for unknown sections, which an implementation can ignore, there is
no sane way to handle unknown types, so that's not useful here.

Might it be prudent to just define the return_count as a varuint32?
Could the return_type be defined a having return_count repeated value_type
entries?

Note that uint8 is forward compatible with varuint32, and so is the
treatment of the count. That is intentional, of course.

The 0 and 1 return_count limitation might not even be a limit for some

tools. For example a tool to convert to the text format need not be limited
to just 0 and 1, so perhaps the limit should be noted elsewhere, perhaps in
validation rules for the MVP, and not a property of the binary encoding?

That would extend the AST, though. In particular, it would introduce
malformed types into the AST, which we currently don't have.

This is adding one byte per function that seems unnecessary and elsewhere

some people have objected to this.

No, this is one byte per different function type, typically a much
smaller number.

The function signatures are ordered, in the same order as the function

bodies are defined, and the count defines the number of functions which
seems useful for loading a vector etc. The count now would be the number of
type declarations.

I don't follow.

Is there any technical reason for not just using separate sections for
future classes of type declarations?

Yes: then you would need distinguished ways to reference the different
kinds of types, in places where either would be allowed. I.e., you'd put
the cost on each use site instead of the fewer definition sites.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot that the functions just have a signature index, and see the utility of them all sharing an index space, thanks. Just a thought, but might it be useful to fit the base types into this same index space, so that where base type indexes are currently accepted these could potentially be other types? If not then how would a local be declared to have a struct type in future etc?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a thought, but might it be useful to fit the base types into this
same index space, so that where base type indexes are currently accepted
these could potentially be other types? If not then how would a local be
declared to have a struct type in future etc?

Yes, that's what the third bullet in the PR description is alluding to, and
the reason for having 0x05 as the code for the function type constructor,
which is disjoint from base types.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we might want to have a bit more index space for the primitive (and other nullary) type constructors. Or maybe that doesn't matter....

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I bet you to it on the retargetted PR. :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood that it's future compatible, but can we write the length to be a varuint32 just to avoid the question/answer altogether? I was planning to remove the only other uint8 (the exported field of the memory section) and then uint8 altogether to nail down this point that, except for a handful of careful special cases, all ints are variable-length.


(Note: In the future, this section may contain other forms of type entries as well.)

### Import section

Expand Down