-
Notifications
You must be signed in to change notification settings - Fork 695
Organize the operators into a smaller more familiar set. #620
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a reason why some operators were type-polymorphic (e.g. Text-format infix operators like |
IMO the text format discussion isn't relevant to the opcodes that WebAssembly contains: a text format can represent operators without spelling out their types. If we ignore text format then I'm not sure this change has a solid reasoning. You mention overloading. I'd like it if you provided a strong rationale for why you think it's not a problem. How does it affect implementation complexity and potential compression gains? I think this change requires data before we move forward. |
@jfbastien We seem to all agree that a text format could represent multiple operators by a single label, for example I don't think 'overloading' is a problem because it is a static decision and the argument types of an operator are needed anyway for single pass validation so it seems a small matter for the runtime decoder to choose a specialized operator in a single pass. Runtimes may do the same for the existing parametric operators anyway, or would do so later when lowering these. There might be some difference between pre-order and post-order encodings. I have suggested combining some operators that have the same result type but different argument types, and this might suit a post-order encoding more than a pre-order encoding because the operator type can be determined from the arguments alone. e.g. It might be helpful to use Personally I am happy with s-exp but I expect most web developers would find infix syntax more familiar, but in either case being able to use familiar labels improves the information density. The operators can be translated, so it's not a show stopper, and I just wanted to test peoples opinions. Regarding the compression: reducing the set of opcodes removes redundancy, and this should be able to help the compression ratio, but I don't have any results yet, but can explore this. A suitably clever compression predictor would be doing the same thing, looking at the history of post-order argument operators and using this to predict the limited set of operators than can validly follow and assigning them shorter codes - by removing this redundancy in the encoding it should help generic compression algorithms. |
A priority that I think is worth communicating is that we're not optimizing for convenience when it comes to translating from binary to text, or when it comes to non-browser tooling (i.e. outside the VM). I think we'll always lean towards solutions which make the binary format better and the VM simpler, even if other aspects such as the text format suffer. This of course isn't absolute, but it factors heavily into how we figure out where added complexity is acceptable. Yes we want developers to feel things are familiar, but I don't think how we encode opcodes factors into this since it's pretty transparent to the text format (you seem to agree?). |
@jfbastien Personally I'm prepared to go to great lengths in the text-formatter and even use meta information - I've conceded this point already. There is an element in the idea of @qwertie though that sounds interesting - to have only a fixed set of familiar labels with everything else using pre-fix encoding and thus a defined text format somewhat orthogonal to the evolving binary format which seems a similar property to the s-exp format has. This would allow the binary format operator table to add new 'overloaded' definitions and for them to be presented in even a legacy text formatter, rather than demanding a new text formatter for each binary format revision. E.g. some of the future SIMD operators could be mapped to the set of familiar labels and no change to the text formatter would be necessary, and the text representation would be static between runtime and tool versions. It seems an interesting idea that could benefit from binary format support. |
We definitely don't want to have overloading in the binary format. Overloading leads to complex rules about resolution, implicit conversions, etc, that end up introducing complexity and introducing compatibility issues. Saving opcode space is not an important enough goal to warrant this change. Overloading will likely cause problems for future evolution of the format. The key analogy here is that web assembly is closer to a virtual instruction set than to a programming language, and as such, it offers operations that don't adapt to their input types, but instead specify their input types. |
@titzer I agree that it would have added some complexity, and that may be a good reason not to do this, but then again it might have been a small matter of some shared code dealing with the operator tables. For the record it seemed a positive change to me and would have allowed new operators to be added that mapped to familiar text format symbols without all the text formatters being updated to handle this mapping. Addressing some of your points:
This decision does not block the text format and can be worded around by meta information giving a mapping to the familiar operators but this will be specific to an particular binary file which could frustrate communication between developers, or it could be worked around by text formatting tools keeping up to date tables mapping these but again they might make different choices. The MVP requires a text format and it it expected to be familiar so the CG many need to choose one of these alternative approaches. The wasm CG loses some control over the text format by not defining the mapping to familiar operators. If there were a mapping define at the binary level then I expect most tools would have followed it. |
With the text format being explored in https://bugzilla.mozilla.org/show_bug.cgi?id=1274618 and https://github.com/sunfishcode/design/blob/text-syntax-strawman-proposal/TextFormat.md it seems clear that it will help the reader and human writer for the text format to use the same symbol for an operation on a range of types. For example The text format being explored seems to use a literal integer for both Also with results being discarded there would be cases in which there is no top down type information. For example how would If the binary encoding has typed operations and the text format has operators that specializes on the argument types then I think this would make the text format confusing and frustrating for the reader and human writer. It might make it impossible to express some invalid binary code in the text format, for example |
Organizing the operators into a smaller set with familiar functionality might help the use case of a familiar text-format #617 For example, the text format could then use the familiar infix
+
operator with it's familiar precedence etc. It seems that the current set of operators could be reversibly transformed into a smaller set so this might not be necessary, and a text formatter could just apply this transform, but then it becomes a little more remote from the binary format. Also reducing the number of opcodes might help compression as a generic compressor would not understand the redundancy this eliminates. I have seen some objections to 'overloading', but I don't see a problem. CIL seems to overload some operations. If there is some rationale then lets discuss it.