Skip to content

add opcode definitions section #237

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 1, 2015
Merged

add opcode definitions section #237

merged 7 commits into from
Jul 1, 2015

Conversation

MikeHolman
Copy link
Member

No description provided.

* the generic section header
* a table containing, for each opcode-space, a standardized string literal
type name (where index defines its type), offset (within the section),
sorted by offset, followed by
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you make these sub-bullets?

What do you mean by "index defines its type"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that when we have places where we need to reference a type (e.g. function definitions), we don't want to put "int32", would rather put 0 (if for example the int32 opcodes were first in this list).

@jfbastien
Copy link
Member

What does it mean to have multiple sections with functions in each? Can one section's function call a function in another section?

Or do we just have one code section for now?

How do I access different data sections?

It looks like we're close to a container format... This is related to #74 about using ELF.

The definition is also beginning to look like BNF!

@MikeHolman
Copy link
Member Author

What does it mean to have multiple sections with functions in each? Can one section's function call a function in another section? Or do we just have one code section for now?

I don't see a logical reason to have more than one code section right now, so I think limiting to one should be ok.

How do I access different data sections?

I guess there are a couple ways we could go about it. If different data section types are all singletons (e.g. you can only have a single import sections), then you can directly access it and decoder will know where to look when you ask for import[0]. Otherwise you can indirect through the section list for the section you want and access your data from there (not quite as efficient though).

It looks like we're close to a container format... This is related to #74 about using ELF.

I don't really know ELF, but the conversation made it sound like the format will not map well for us. So I'd like to move forward with something else hedging against it.

@sunfishcode
Copy link
Member

ELF isn't yet ruled out. These various tables could just map to special sections/segments in ELF. But I don't think we need to worry about that now. Let's design what we want first, and figure out whether ELF makes sense once we have that.

* a table (sorted by offset) containing, for each section, its type and offset (within the module), followed by
* a sequence of sections.
* A module contains (in this order):
* A header
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define header.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You caught me. I don't know what the headers contain. At the very least, the module header contains the magic number, but besides that I don't really have anything in particular decided. Maybe some things like whether heap is 32/64 bit, source language (for ABI), and entry point. This will need to be figured out, but based on the level of detail in the rest of our design docs I'm not sure how much detail to go in here.

Maybe I should just mention some things like this as "ideas" for what a header would contain?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that sounds good. I like what you're adding overall, so I'll step back, take this improvement, and we can iterate later :-)

@jfbastien
Copy link
Member

Maybe I'm going into too many details?

@MikeHolman
Copy link
Member Author

@jfbastien Maybe a bit with null terminated UTF8 (which is what I had in mind, but thought I was already bordering on too verbose), but you are right that "header" and "type" deserved some clarifications.

@jfbastien
Copy link
Member

lgtm

* A module contains (in this order):
- A header, containing:
+ The [magic number](https://en.wikipedia.org/wiki/Magic_number_%28programming%29)
+ Other data TBD (possibly entrypoint, memory bitness, source language, etc.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is memory bitness?

At an initial glance, source language seems like something we'd specifically try to avoid including in the main header, because it suggests special magical per-source-language semantics.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is memory bitness?

I mean whether your linear memory has a 64 bit or 32 bit address space (i.e. whether ptr type is int32 or int64). Maybe not necessary here, but was just an idea for something we might want. I think we might need some version info for the module format as well. We may never need to break compat, but I can imagine a scenario where we eventually want to make format changes, and having a byte to allow for that would be useful.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll just say other data TBD for now and remove the rest

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the AstSemantics.md, we've just been assuming that one can index the
linear memory with either 32-bit or 64-bit offsets. In the v8 native
prototype, there are different bytecode numbers for whether the memory
offset operand is an Int32 or an Int64.

On Tue, Jun 30, 2015 at 10:29 PM, Michael Holman [email protected]
wrote:

In BinaryEncoding.md
#237 (comment):

@@ -65,20 +65,40 @@ Yes:

Global structure

-* A module contains:

  • * a header followed by
  • * a table (sorted by offset) containing, for each section, its type and offset (within the module), followed by
  • * a sequence of sections.
    +* A module contains (in this order):
    • A header, containing:
    • Other data TBD (possibly entrypoint, memory bitness, source language, etc.)

What is memory bitness?

I mean whether your linear memory has a 64 bit or 32 bit address space
(i.e. whether ptr type is int32 or int64). Maybe not necessary here, but
was just an idea for something we might want. I think we might need some
version info for the module format as well. We may never need to break
compat, but I can imagine a scenario where we eventually want to make
format changes, and having a byte to allow for that would be useful.


Reply to this email directly or view it on GitHub
https://github.com/WebAssembly/design/pull/237/files#r33619752.

@jfbastien
Copy link
Member

I think this is good to go, we can iterate on top.

jfbastien added a commit that referenced this pull request Jul 1, 2015
@jfbastien jfbastien merged commit 5f700c1 into master Jul 1, 2015
@jfbastien jfbastien deleted the definition-section branch July 1, 2015 03:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants