-
Notifications
You must be signed in to change notification settings - Fork 695
clarify how export/import names convert to JS strings (#569) #573
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
||
A WebAssembly module imports and exports functions. WebAssembly names functions | ||
using arbitrary-length byte sequences. The null character is permitted inside | ||
WebAssembly function names. The most natural Web representation of a mapping of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"the null character and invalid Unicode code points are permitted ..."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with what this is trying to say, but it's a bit confusing since WebAssembly's notion of names are not Unicode - they're just bytes.
How about: "any 8-bit values are permitted in a WebAssembly name, including the null byte and byte sequences that don't correspond to any Unicode code point regardless of encoding"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed your phrasing is better.
lgtm besides comments. |
Alternative suggestion: mangle any non ucs16 sequences so that all valid wasm names work on the web. Perhaps break to an escaped format if it fails the already documented conversion. |
To give this serious consideration, I think you’d have to propose a specific mangling, and then provide reasoning about whether that mangling round-trips. That’s gets quite weird. It’s not obvious how to safely escape things while providing 1-1 mapping between JS UCS16 sequences and wasm byte sequences. The biggest risk if we go this route is that we get some aspect of this wrong with respect to either Unicode itself or some browsers implementation of it. Also, it will take more work for us, the spec writers, to write down and agree to an escaping format that satisfies all of the constraints that we might be interested in. On the other hand, the likelihood that you’d choose to import or export something that has a name whose byte sequence doesn’t transcode is extremely low. It’s probably not something anyone would want to do. Most clients will either use a UTF8 encoding of some Unicode string (in which case they’re fine) or they’ll use ASCII (also fine). Therefore, making that unlikely-and-undesirable thing be an error saves the spec, and all implementors of the spec, from having to worry about escaping. -Filip
|
Any escaped sequence would presumably also be a valid export/import name, giving us a different problem where escaped name is the same as a confusingly-named export. I don't think it's really worth it :) |
Good points, thank you. |
You could make the mangling bijective, but it would require mangling otherwise valid UCS-16 strings (e.g. by expanding each byte into two hexits). But I don't think it's worth distinguishing a web and non-web environment for this. Why not just say import/export names must be valid UTF-8 strings? |
This is an interesting suggestion. I think that it’s easier on clients and implementors if this is a web-only constraint. Non-web implementers may have no other need for any kind of Unicode stuff. I think that this would be the only mention of Unicode in WebAssembly. A non-Web client wishing to implement accurate verification (i.e. if fails exactly when the spec it says it must fail) would have to do Unicode logic only in this one place, and only to support a verification rule that only helps the web. On the other hand, the web embedding scenario will necessarily perform verifications that non-web clients have no need for. We’ll probably have to also add rules for what kind of object the imports object can be, though these rules may be implicit (for example if we say that the imports object is queried at module instantiation time then we’re mandating that all things named by the imports section are accessible via [[Get]] or somesuch). We’ll definitely have additional Web-specific export rules. For example I just realized that we probably want to prohibit exporting a function named “proto” because that that would probably make the world burn. Again, it would be weird if this was a rule that non-web users would have to follow. Since we will anyway have to have Web-specific verification, and since the non-Web case is probably happy with names being sequences of bytes, I think that the approach I’ve proposed is least bad. -Filip
|
Agreed, that's what we'd discussed to justify our approach. Add it to Rationale.md? |
Agreed on keeping the utf8 (and any other) requirement a Web-only thing. (As for |
Oh that’s true! Opinions on whether proto should be allowed as a function name when embedding in the web? (I vote “no”. It would cause too much agony, and I can’t imagine a well-behaved client wanting to do it.) -Filip
|
It's hard to argue "yes" with any conviction, but unless we can find a case where it breaks or bothers some particular engine or tool (I think SM used to implement |
In JSC, the following evaluates to true:
It also evaluates to true in Firefox. I didn’t try others. That’s pretty weird! It means that a function exported as proto will not be accessible if the exports object is created in a manner that is logically equivalent to what a JS developer would reasonably expect (i.e. literal construction). For this reason, I think that a special case that causes verification failure for proto is consistent with having verification failure for byte sequences that cannot be converted to JS strings: the idea is that if you would not have been able to easily reference the function name from JS, then you probably messed up. -Filip
|
Ah interesting. So object literals must not do define-property but rather something set-property-esque. We currently use define-property to build our wasm export object (set-property could trigger setters!) so
works, but I guess that's not your point. It's weird, no doubt, but I'm wondering if people will thank us for rejecting this one particular string because there isn't an equivalent object literal. I guess that's something asm.js couldn't polyfill ;) Likely people will never notice, so if you're really worried about this causing problems, I'm fine rejecting it; we could always allow it later if it caused people trouble. |
var result = decodeURIComponent(escape(string)); | ||
|
||
// Check for errors. This will throw if 'result' contains bad characters. | ||
encodeURIComponent(result); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm trying to figure out what byte sequences will not throw during decodeURIComponent(escape(string))
but will throw in encodeURIComponent(result)
. I tried both invalid utf8 byte sequences and also bad high/low surrogate pairs but decodeURIComponent
catches them all. This seems to be implied by the table at the bottom of 18.2.6.1.2 and following notes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah hah, and poking around more, it would appear that it's simplify invalid utf8 to store a code point in the surrogate range [0xd800, 0xdc00] (I had been thinking that utf8 wouldn't care about surrogates, but I guess this is a concession to utf16 baked into utf8?), so your definition here is precisely what we want (if we want to reuse our existing Utf8ToUtf16 conversion routines which I very much do. lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent
is necessary or if decodeURIComponent
catches it all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
decodeURIComponent should throw. The reason why I didn’t just rely on that is that some of the public documentation documents encodeURIComponent as throwing but doesn’t say that decodeURIComponent should throw. But having read the spec (both ES5.1 and ES6), I no longer think that the encodeURIComponent call is necessary.
I think that we can remove the encodeURIComponent call. Objections?
-Filip
On Mar 2, 2016, at 7:26 AM, Luke Wagner [email protected] wrote:
In Web.md #573 (comment):
+strings. A WebAssembly module may fail validation on the Web if it imports or
+exports functions whose names do not transcode cleanly to UTF-16 according to
+the following conversion algorithm, assuming that the WebAssembly name is in a
+Uint8Array
calledarray
:
+
+```
+function convertToJSString(array)
+{
- // Perform the actual conversion.
- var string = "";
- for (var i = 0; i < array.length; ++i)
- string += String.fromCharCode(array[i]);
- var result = decodeURIComponent(escape(string));
- // Check for errors. This will throw if 'result' contains bad characters.
- encodeURIComponent(result);
I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent is necessary or if decodeURIComponent catches it all.—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/573/files#r54737759.
On 1 March 2016 at 23:47, pizlonator [email protected] wrote:
You just fell into one of JavaScript's many corner case traps. This only
will yield false. Also note that this is a special case in the semantics of object literals |
Good catch! I agree with your analysis. -Filip
|
Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. git-svn-id: http://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback.
lgtm |
Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. git-svn-id: http://svn.webkit.org/repository/webkit/releases/WebKitGTK/webkit-2.12@197747 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. git-svn-id: http://svn.webkit.org/repository/webkit/releases/WebKitGTK/webkit-2.12@197747 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Oops, looks like this never merged. Any objections to merging now? |
lgtm as well, let's merge! |
* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments
* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments
* Merge pull request #648 from WebAssembly/current_memory Add current_memory operator * Reorder section size field (#639) * Prettify section names (#638) * Extensible encoding of function signatures (#640) * Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments * Merge pull request #641 from WebAssembly/postorder_opcodes Postorder opcodes * fix some text that seems to be in the wrong order (#670) * Clarify that br_table has a branch argument (#664) * Add explicit argument counts (#672) * Add explicit arities * Rename * Replace uint8 with varint7 in form field (#662) This needs to be variable-length.
Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. Canonical link: https://commits.webkit.org/173015@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added.
No description provided.