Skip to content

Organize the operators into a smaller more familiar set. #620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Organize the operators into a smaller more familiar set. #620

wants to merge 1 commit into from

Conversation

ghost
Copy link

@ghost ghost commented Mar 23, 2016

Organizing the operators into a smaller set with familiar functionality might help the use case of a familiar text-format #617 For example, the text format could then use the familiar infix + operator with it's familiar precedence etc. It seems that the current set of operators could be reversibly transformed into a smaller set so this might not be necessary, and a text formatter could just apply this transform, but then it becomes a little more remote from the binary format. Also reducing the number of opcodes might help compression as a generic compressor would not understand the redundancy this eliminates. I have seen some objections to 'overloading', but I don't see a problem. CIL seems to overload some operations. If there is some rationale then lets discuss it.

@qwertie
Copy link

qwertie commented Mar 23, 2016

There was a reason why some operators were type-polymorphic (e.g. get_local) and others (*.add) were not: because the behavior of get_local can be described for all types we might define now and in the future, while the behavior of add cannot (and indeed there may be types for which add doesn't make sense).

Text-format infix operators like + can be syntactic sugar, so I am comfortable with the status quo. But if there's no major downside to merging the binary-format operators, it may as well be done for consistency. I look forward to hearing opinions on this.

@jfbastien
Copy link
Member

IMO the text format discussion isn't relevant to the opcodes that WebAssembly contains: a text format can represent operators without spelling out their types.

If we ignore text format then I'm not sure this change has a solid reasoning. You mention overloading. I'd like it if you provided a strong rationale for why you think it's not a problem. How does it affect implementation complexity and potential compression gains? I think this change requires data before we move forward.

@ghost
Copy link
Author

ghost commented Mar 23, 2016

@jfbastien We seem to all agree that a text format could represent multiple operators by a single label, for example i32.const(1) + i32.trunc_s(f32.const(1.2) + f32.const(0.8)) but it adds a translation stage between the text format and the binary encoding specification that might be unnecessary.

I don't think 'overloading' is a problem because it is a static decision and the argument types of an operator are needed anyway for single pass validation so it seems a small matter for the runtime decoder to choose a specialized operator in a single pass. Runtimes may do the same for the existing parametric operators anyway, or would do so later when lowering these.

There might be some difference between pre-order and post-order encodings. I have suggested combining some operators that have the same result type but different argument types, and this might suit a post-order encoding more than a pre-order encoding because the operator type can be determined from the arguments alone. e.g. It might be helpful to use < and > and == for some of the comparison operators which have this property.

Personally I am happy with s-exp but I expect most web developers would find infix syntax more familiar, but in either case being able to use familiar labels improves the information density. The operators can be translated, so it's not a show stopper, and I just wanted to test peoples opinions.

Regarding the compression: reducing the set of opcodes removes redundancy, and this should be able to help the compression ratio, but I don't have any results yet, but can explore this. A suitably clever compression predictor would be doing the same thing, looking at the history of post-order argument operators and using this to predict the limited set of operators than can validly follow and assigning them shorter codes - by removing this redundancy in the encoding it should help generic compression algorithms.

@jfbastien
Copy link
Member

A priority that I think is worth communicating is that we're not optimizing for convenience when it comes to translating from binary to text, or when it comes to non-browser tooling (i.e. outside the VM). I think we'll always lean towards solutions which make the binary format better and the VM simpler, even if other aspects such as the text format suffer.

This of course isn't absolute, but it factors heavily into how we figure out where added complexity is acceptable. Yes we want developers to feel things are familiar, but I don't think how we encode opcodes factors into this since it's pretty transparent to the text format (you seem to agree?).

@ghost
Copy link
Author

ghost commented Mar 23, 2016

@jfbastien Personally I'm prepared to go to great lengths in the text-formatter and even use meta information - I've conceded this point already. There is an element in the idea of @qwertie though that sounds interesting - to have only a fixed set of familiar labels with everything else using pre-fix encoding and thus a defined text format somewhat orthogonal to the evolving binary format which seems a similar property to the s-exp format has. This would allow the binary format operator table to add new 'overloaded' definitions and for them to be presented in even a legacy text formatter, rather than demanding a new text formatter for each binary format revision. E.g. some of the future SIMD operators could be mapped to the set of familiar labels and no change to the text formatter would be necessary, and the text representation would be static between runtime and tool versions. It seems an interesting idea that could benefit from binary format support.

@titzer
Copy link

titzer commented Mar 24, 2016

We definitely don't want to have overloading in the binary format. Overloading leads to complex rules about resolution, implicit conversions, etc, that end up introducing complexity and introducing compatibility issues. Saving opcode space is not an important enough goal to warrant this change. Overloading will likely cause problems for future evolution of the format. The key analogy here is that web assembly is closer to a virtual instruction set than to a programming language, and as such, it offers operations that don't adapt to their input types, but instead specify their input types.

@titzer titzer closed this Mar 24, 2016
@ghost
Copy link
Author

ghost commented Mar 24, 2016

@titzer I agree that it would have added some complexity, and that may be a good reason not to do this, but then again it might have been a small matter of some shared code dealing with the operator tables.

For the record it seemed a positive change to me and would have allowed new operators to be added that mapped to familiar text format symbols without all the text formatters being updated to handle this mapping. Addressing some of your points:

  1. Re: 'complex rules about resolution'. As I understand it, it is a simple matter of looking at the type or the arguments to choose a specialized operator, and this seems trivial to me.
  2. Re: 'implicit conversions'. There are none implied.
  3. This is not about 'saving opcode space' but that might be one minor benefit.
  4. Re: 'cause problems for future evolution of the format'. I see not evidence for this but am open to examples? Quite the opposite without this it will not be possible to map new operators that have 'familiar' functions to familiar text format symbols without all the text formatting tools also being updated which seems to me to make it harder to evolve the formats. What might have been an addition to the binary format now requires updating text formatting tools and explaining to developers the difference between the operator name they see in older text formatters and the syntax they see in newer text formatters.
  5. This is not a proposal to have operators adapt to their input types, rather a static encoding matter. It is not dynamic typing. CIL uses this same strategy.

This decision does not block the text format and can be worded around by meta information giving a mapping to the familiar operators but this will be specific to an particular binary file which could frustrate communication between developers, or it could be worked around by text formatting tools keeping up to date tables mapping these but again they might make different choices. The MVP requires a text format and it it expected to be familiar so the CG many need to choose one of these alternative approaches.

The wasm CG loses some control over the text format by not defining the mapping to familiar operators. If there were a mapping define at the binary level then I expect most tools would have followed it.

@ghost
Copy link
Author

ghost commented May 22, 2016

With the text format being explored in https://bugzilla.mozilla.org/show_bug.cgi?id=1274618 and https://github.com/sunfishcode/design/blob/text-syntax-strawman-proposal/TextFormat.md it seems clear that it will help the reader and human writer for the text format to use the same symbol for an operation on a range of types. For example i32.add i64.add etc being presented as +. I think it helps the reader because the type is so redundant. Eliminating some of this redundancy could well also help compression.

The text format being explored seems to use a literal integer for both i32.const and i64.const which suggests it depends on top down type information, but it's not clear? It might be adequate to use only bottom type information to specialize operation types, and this would allow selection of the operation type in a single pass without backtracking.

Also with results being discarded there would be cases in which there is no top down type information. For example how would (block (i64.const 1) (nop)) be expressed in this text format? With bottom up typing this could be { 1L; nop() }.

If the binary encoding has typed operations and the text format has operators that specializes on the argument types then I think this would make the text format confusing and frustrating for the reader and human writer. It might make it impossible to express some invalid binary code in the text format, for example (i32.add ... (i64.const 1)), or require a fallback to a prefix syntax for presentation e.g. i32.add(..., 1L) which would be frustrating.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants