-
Notifications
You must be signed in to change notification settings - Fork 473
Support embedded newline characters in names? #142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
There's no particular reason for the current behaviour, other than the same regexp I don't have an overly strong opinion either, but for hygiene and for the sake of following standard practice, I'd lean towards disallowing any ASCII control characters in literals (this still allows UTF8). |
Unicode has more control characters that ASCII :-) IIRC @zygoloid was telling me that Unicode has a very well defined set of character literals that all languages should be using. As the editor of C++ he was appalled that C++ chose to ignore this and go its own special and nonsensical route. |
I guess you refer to this? One issue is that WebAssembly does actually want a wider set than the set any high-level language will be using for identifiers, because WebAssembly aims to support compilers that need to be able to mangle names into something unrepresentable in source languages. The document above also says: "Generally if the programming language has case-sensitive identifiers, then Normalization Form C is appropriate". However, Unicode Normalization Form C is non-trivial. It would be unfortunate if every WebAssembly tool has to know how to validate and normalize identifiers just to correctly do symbol lookups. Another concern is homograph confusion. Since WebAssembly identifiers aren't user-facing anyway, would it make sense to restrict the character set and have frontends mangle as needed? They'll probably always have to do some mangling in any case. Another is whether ES modules impose any constraints on this domain. Thoughts? |
Since we have to do mangling anyway, I don't think it matters really what we allow since the front-end can just mangle it entirely and put whatever it likes as the internal representation. That is what I've already done in my parser to remove certain characters that LLVM did not like for example. |
With #143 merged, are people okay with closing this issue? |
I would vote yes |
Having filed this, I think we can close this. WebAssembly doesn't want to require engines to be in the business of interpreting character sets, so the simplest thing is for it to just support arbitrary uninterpreted byte strings. The mappings to JS and other languages can define the correspondence to Unicode as appropriate. |
Fixes WebAssembly#142. A mismatched `DataCount` is malformed, not a validation error.
This updates the explainer text according to the new spec we agreed in the 09-15-2020 CG meeting and discussions afterwards. The following are modifications and clarifications we made after the 09-15-2020 CG meeting, and the relevant issue posts, if any: https://github.com/WebAssembly/meetings/blob/master/main/2020/CG-09-15.md - `catch_br` wasm renamed to `delegate` (WebAssembly#133) - `rethrow` gains an immediate argument (WebAssembly#126) - Removed dependences on the reference types proposal and the multivalue proposal. The multivalue proposal was previously listed as dependent because 1. `try` is basically a `block`, so it can have multivalue input/output 2. `br_on_exn` can extract multiple values from a `block`. We don't have `br_on_exn` anymore, and I'm not sure 1 is a strong enough reason to make it a dependence. - Mention `rethrow` cannot rethrow exceptions caught by `unwind` (WebAssembly#142 and WebAssembly#137) - Mention some runtimes, especially web VMs, can attach stack traces to the exception object, implying stack traces are not required for all VMs - Update label/validation rules for `delegate` and `rethrow` (WebAssembly#146) - Finalize opcodes for `delegate` (0x18) and `catch_all` (0x19) (WebAssembly#145 and WebAssembly#147) I believe this resolves many previous issue threads, so I'll close them. Please reopen them if you think there are things left for discussions in those issues. Resolves WebAssembly#113, resolves WebAssembly#126, resolves WebAssembly#127, resolves WebAssembly#128, resolves WebAssembly#130, resolves WebAssembly#142, resolves WebAssembly#145, resolves WebAssembly#146, resolves WebAssembly#147.
In #141 I created a test which attempted to test all the ASCII control characters in exported symbol names. All of them worked except 0x0a, the ASCII newline character. The spec interpreter gave this error when I tried it:
What is the intended behavior here? I don't presently have an opinion here; I could see arguments for restricting the character set in some says, but I could also see arguments that it should be entirely unrestricted.
The text was updated successfully, but these errors were encountered: