Support embedded newline characters in names? #142

sunfishcode · 2015-10-15T23:54:25Z

In #141 I created a test which attempted to test all the ASCII control characters in exported symbol names. All of them worked except 0x0a, the ASCII newline character. The spec interpreter gave this error when I tried it:

test/names.wast:50.11-50.14: unclosed text literal

What is the intended behavior here? I don't presently have an opinion here; I could see arguments for restricting the character set in some says, but I could also see arguments that it should be entirely unrestricted.

The text was updated successfully, but these errors were encountered:

rossberg · 2015-10-16T09:36:11Z

There's no particular reason for the current behaviour, other than the same regexp character in the lexer being used to define comment syntax. :)

I don't have an overly strong opinion either, but for hygiene and for the sake of following standard practice, I'd lean towards disallowing any ASCII control characters in literals (this still allows UTF8).

jfbastien · 2015-10-16T15:58:03Z

Unicode has more control characters that ASCII :-)
Fun times can be had if we allow bidi in names!

IIRC @zygoloid was telling me that Unicode has a very well defined set of character literals that all languages should be using. As the editor of C++ he was appalled that C++ chose to ignore this and go its own special and nonsensical route.

sunfishcode · 2015-10-16T16:29:55Z

I guess you refer to this?

One issue is that WebAssembly does actually want a wider set than the set any high-level language will be using for identifiers, because WebAssembly aims to support compilers that need to be able to mangle names into something unrepresentable in source languages.

The document above also says: "Generally if the programming language has case-sensitive identifiers, then Normalization Form C is appropriate". However, Unicode Normalization Form C is non-trivial. It would be unfortunate if every WebAssembly tool has to know how to validate and normalize identifiers just to correctly do symbol lookups.

Another concern is homograph confusion. Since WebAssembly identifiers aren't user-facing anyway, would it make sense to restrict the character set and have frontends mangle as needed? They'll probably always have to do some mangling in any case.

Another is whether ES modules impose any constraints on this domain.

Thoughts?

jcbeyler · 2015-10-21T16:19:47Z

Since we have to do mangling anyway, I don't think it matters really what we allow since the front-end can just mangle it entirely and put whatever it likes as the internal representation. That is what I've already done in my parser to remove certain characters that LLVM did not like for example.

rossberg · 2015-10-22T10:43:59Z

With #143 merged, are people okay with closing this issue?

jcbeyler · 2015-10-22T18:25:29Z

I would vote yes

sunfishcode · 2015-10-23T18:20:42Z

Having filed this, I think we can close this. WebAssembly doesn't want to require engines to be in the business of interpreting character sets, so the simplest thing is for it to just support arbitrary uninterpreted byte strings. The mappings to JS and other languages can define the correspondence to Unicode as appropriate.

Fixes WebAssembly#142. A mismatched `DataCount` is malformed, not a validation error.

This updates the explainer text according to the new spec we agreed in the 09-15-2020 CG meeting and discussions afterwards. The following are modifications and clarifications we made after the 09-15-2020 CG meeting, and the relevant issue posts, if any: https://github.com/WebAssembly/meetings/blob/master/main/2020/CG-09-15.md - `catch_br` wasm renamed to `delegate` (WebAssembly#133) - `rethrow` gains an immediate argument (WebAssembly#126) - Removed dependences on the reference types proposal and the multivalue proposal. The multivalue proposal was previously listed as dependent because 1. `try` is basically a `block`, so it can have multivalue input/output 2. `br_on_exn` can extract multiple values from a `block`. We don't have `br_on_exn` anymore, and I'm not sure 1 is a strong enough reason to make it a dependence. - Mention `rethrow` cannot rethrow exceptions caught by `unwind` (WebAssembly#142 and WebAssembly#137) - Mention some runtimes, especially web VMs, can attach stack traces to the exception object, implying stack traces are not required for all VMs - Update label/validation rules for `delegate` and `rethrow` (WebAssembly#146) - Finalize opcodes for `delegate` (0x18) and `catch_all` (0x19) (WebAssembly#145 and WebAssembly#147) I believe this resolves many previous issue threads, so I'll close them. Please reopen them if you think there are things left for discussions in those issues. Resolves WebAssembly#113, resolves WebAssembly#126, resolves WebAssembly#127, resolves WebAssembly#128, resolves WebAssembly#130, resolves WebAssembly#142, resolves WebAssembly#145, resolves WebAssembly#146, resolves WebAssembly#147.

rossberg mentioned this issue Oct 16, 2015

Disallow control characters in literals #143

Merged

sunfishcode closed this as completed Oct 23, 2015

Connicpu pushed a commit to Connicpu/wasm-spec that referenced this issue Jun 7, 2020

Change validation error -> malformed in Overview (WebAssembly#143)

060678f

Fixes WebAssembly#142. A mismatched `DataCount` is malformed, not a validation error.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support embedded newline characters in names? #142

Support embedded newline characters in names? #142

sunfishcode commented Oct 15, 2015

rossberg commented Oct 16, 2015

Uh oh!

jfbastien commented Oct 16, 2015

Uh oh!

sunfishcode commented Oct 16, 2015

Uh oh!

jcbeyler commented Oct 21, 2015

Uh oh!

rossberg commented Oct 22, 2015

Uh oh!

jcbeyler commented Oct 22, 2015

Uh oh!

sunfishcode commented Oct 23, 2015

Uh oh!

Support embedded newline characters in names? #142

Support embedded newline characters in names? #142

Comments

sunfishcode commented Oct 15, 2015

rossberg commented Oct 16, 2015

Uh oh!

jfbastien commented Oct 16, 2015

Uh oh!

sunfishcode commented Oct 16, 2015

Uh oh!

jcbeyler commented Oct 21, 2015

Uh oh!

rossberg commented Oct 22, 2015

Uh oh!

jcbeyler commented Oct 22, 2015

Uh oh!

sunfishcode commented Oct 23, 2015

Uh oh!