-
Notifications
You must be signed in to change notification settings - Fork 1
Strawman counterproposal #3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
It's still worth pursuing, but it's unfamiliar and somewhat complex. Taking it out for now will allow us to focus the discussion on the overall approach of this text format, which is the more important discussion right now. We can revisit push/pop in the future if this overall approach works.
Another way to fix this is with precedence mechanisms. If we set up the grammar such that this parses with
That's a clever idea. I'll have to think about this more.
Conceptually, the default is not part of the table. It's the fallback for when the index is out of bounds on the table. It's a subtle distinction, but it matches the abstract performance model (jump table guarded by a branch).
I'm confused about what this means. Is
This is being discussed in #7
This creates ambiguities with other parts of the grammar. My subjective experience reading LLVM IR is that an explicit Also, in optimized code where trivial calls have been inlined, highlighting the calls that remain is interesting.
The
This was an error in the grammar example. I've fixed it now.
WebAssembly has arbitrary-byte-sequence identifiers, so the text format will need to support this. My experiment here doesn't currently cover this, so we'll need something. I'm reluctant to use
I don't understand what you're proposing here.
The return type syntax has parens because it's anticipating a future with multiple return values from functions, which generalizes function return types from a single type to a list of types.
The trailing I also like how
Conforming to LES' complex constraints is not a priority for this text format. I readily admit that this is a subjective choice.
I like this idea. Pretty printers will likely want to emit parens anyway because because the ambiguity is confusing for humans as well, so we might as well just require them. |
This has 124 commits, I assume it needs to be rebased? hard to tell what is the relevant part. |
@kripken: WTF! I swear I only ever committed TextFormat.md, I can't imagine how it decided 13 files changed! I guess I'm not good at git. Sorry.
I'm not sure what you mean precisely, but if the bottom line is that
Okay, I suppose it should be visually distinguished from the other cases in some way, e.g.
Sorry about that! Hmm, okay, then I'll suggest
I don't think it does, as locals and labels cannot be directly called. Anything in particular you're thinking of?
I proposed
A narrow use case, but okay. Would you be willing to accept a different punctuation mark for calls? Since LES has prefix operators but not keywords,
Hmm, that's a challenge. Certainly, any reasonable text format should support UTF-8, so that identifiers like "الدين" won't come out as
I chose The tradeoff between
Again, it's the no-keywords property of LES - due to which
Uhhh.... I know. 😕
You've read #697 I hope? It sounds like the "out of scope!" argument that I anticipated. Do you really feel that some minor syntactic changes aren't worth it for potentially large benefits outside wasm itself? |
@kripken Oh I see what's wrong, somehow |
Conforming to LES is not currently a priority for me in this experiment. To your remaining concerns:
|
8701525
to
b566a2f
Compare
4eecb5e
to
0167aee
Compare
I checked whether someone had solved this already, and found this:
In the text format, this might involve the following additional rules:
The third rule seems optional; if surrogate characters are permitted in the text format, it should probably produce those characters in UTF-8 form, which has the following implications:
Overlong UTF-8 characters would be treated the same as invalid surrogates. |
I was actually planning to encode bytes as 0xDB80..0xDBFF because I realized this would be better than the range originally proposed (0xDC80..0xDCFF) because 0xDB80..0xDBFF is used extremely rarely. I was in the process of making this change and noticed a couple of bugs and less-than-ideal comments, so I fixed those. But then I noticed that the encoding idea wasn't originally mine: sunfishcode/design#3 (comment) So, if anyone else were using the same idea, they'd probably choose to use the original range. Therefore, I should use the same range despite it not being optimal. Bug fix: UString.TryDecodeAt() should return an unpaired surrogate unchanged but sometimes returned -1. Bug fix: EscapeCStyle should escape everything in the surrogate range in EscapeC.UnicodeNonCharacters mode. Bug fixes weren't specifically tested
I am no longer working on a Wasm text format proposal. |
See #697 in WebAssembly/design for an introduction. Note: base branch is wrong; I don't think I can fix it here, so see #8 for the diff.
This PR is meant largely as an illustration; I know it's not likely to be merged. It includes three categories of changes to the initial strawman; the first two are prerequisites for the third, which is the real point of the PR.
1. Changes to eliminate ambiguity
i32.rotl $0, 8
is how you rotate by 8. This is not ambiguous for the proposed parser since all opcodes are keywords, but it is visually ambiguous to readers, who could easily seecall $foo(i32.rotl $0, 8, 9)
as a function that takes three arguments rather than two. So I proposei32.rotl($0, 8)
f32.store [5,+0], -0x0p0
, usef32.store [5,+0] = -0x0p0
br_table [$a, $b, $c], $default, $index
, usebr_table [a, b, c, default] : $index
(this particular syntax is LES-compatible, see below.)br $exit, $i
, usebr exit => $i
, where=> $i
is optionalbr_if
, usebr exit (if condition) => $i
i64
fromi32
andf64
fromf32
. Based on C/C++/C# I propose no required suffix forf64
ori32
, but to useL
fori64
andf
forf32
(123L
,12.0p+0f
). Another possibility would have been suffixesi64
andf32
as in Rust. Note: type inference is possible, e.g.$x + 5
could infer that5
isf32
when$x
isf32
, but what about3p+0 + 3p+0
? Since wasm is low-level, the text should be allowed to specify the types, even if they can be inferred (and I'd prefer not to require the entire opcode name).2. Simplifying changes
call $foo(...)
, use simply$foo(...)
.call_import $foo(...)
, use$$foo(...)
call_indirect $sig [1] $min(0, 2)
- wait, what is the[1]
for? I don't see it in the s-exprs I'm looking at. So, I propose$min::$sig(0, 2)
, where::
is a high-precedence operator. Having removed thecall_indirect
keyword, I think reversing the order will be more readable in case$min
is a complex expression.3. Changes to allow LES
I propose that the text format be compatible with LES - as the PR text explains, not LES as it exists today, but as it will be when the MVP is launched. This gives the CG some freedom to make some changes to LES and not others. Specifically, any elements that make sense only in WebAssembly (e.g. keywords for wasm opcodes) would not be permitted, but changes such as tweaks to operator precedence, handling of semicolons, the grammar of LES "superexpressions", or the name used for "infinity", are fine.
The proposed goals of the text format are "match existing conventions on the Web (for example, curly braces, as in JavaScript and CSS)", and since LES is fairly JS-like language, I assert that any reasonable text format is straightforwardly modified to be LES-compatible. Here are the required changes:
$foo-loop
subtractsloop
from$foo
, so you need to write$@foo-bar
to tell the parser that the hyphen is part of the identifier. However, this could be changed if people are strongly in favor of allowing dashes in identifiers. LES also allows any UTF-8 string as an identifier, even@
\n\0`` (a newline character and a null character) or@``` (the empty string). Similarly,
i32.reinterpret/f32` is changed to `i32.reinterpret'f32` to make clear it is not a division (`'` is legal in identifiers).i32.popcnt $x
, do not need parentheses, but LES currently requires additional syntax: eitheri32.popcnt($x)
or ``i32.popcnt$x
(I prefer the first option.)function $@fac-opt($a:i64) : i64
: the parens around i64 are (and must be) optional in LES.<s
and>s
are not valid operator names, because in any language other than wasm,r>s
should be parsed asr > s
. LES operators must therefore consist of punctuation, and I selected>
for signed and|>
for unsigned. Unfortunately this turns out to be a little clumsy since we need>|=
instead of|>=
(this is explained in the PR), but LES does offer backquotes for making non-punctuation operators, so$0
>s$1
or$0
>=u$1
could be used instead.var $x:i64
is changed to$x:i64
because although the former syntax is legal in LES, it is redundant. Since LES is highly regular, using essentially the same syntax everywhere, if$x:i64
is legal syntax in the formal argument list, it must also be legal in the function body.foo:
are not very practical since:
is also a binary operator. Considerfoo: $x = 0
, which would be parsed as(foo : $x) = 0
. So instead I've moved the colon to the start;foo: $x = 0
would either have to be written on two lines, or on a single line as:foo; $x = 0
.function $foo () : i32
; the space before(
must be removed; this is related to how LES manages to be a keyword-free language. However, if there's a lot of hate for this, I can eliminate the whitespace sensitivity (though something somewhere will have to be sacrificed).,
ini32.store8 [$base, +4]:align=2, $value
is bad as mentioned above, while:
and=
are not ideal punctuation. There are a variety of alternatives but I pickedi32.store8 [$base, +4, align 2] = $value
.opcode()
to ensure they do not clash with labels, as the latter cannot have arguments). Note there is no need for$
on local variables either, but I didn't remove them since we obviously can't replace variables like$0
with just0
, so there was no immediate benefit. FYI,@0
is how you write an identifier that begins with a digit in LES, in contrast to$0
which is the prefix operator$
applied to the literal0
.Note: I left the text as "The
$
sigil on function and variable names cleanly ensures that they never collide with wasm keywords, present or future." This is correct except for the word "keywords", since LES doesn't have keywords. Still, collisions may be possible at least when it comes to function calls, since for exampleloop({...})
means the same thing asloop {...}
in LES, so ifloop
were a function and the$
were not required, there would be a collision.There are other, niggly issues that deserve mention, but this is getting long so I'll stop here and let you read the rationales if you haven't yet.
TODOs
I have opinions about all the TODOs but I've left them out of the PR and also added my own TODO regarding whether or not semicolons should be required. I will just say that regarding the precedence of
&|^
, LES is spec'd to punt on that issue, by printing an error if you write an expression likex & y == z
.