clarify how export/import names convert to JS strings (#569) #573

pizlonator · 2016-03-01T20:07:26Z

No description provided.

…JS strings (#569)

jfbastien · 2016-03-01T20:51:58Z

Web.md

+
+A WebAssembly module imports and exports functions. WebAssembly names functions
+using arbitrary-length byte sequences. The null character is permitted inside
+WebAssembly function names. The most natural Web representation of a mapping of


"the null character and invalid Unicode code points are permitted ..."

I agree with what this is trying to say, but it's a bit confusing since WebAssembly's notion of names are not Unicode - they're just bytes.

How about: "any 8-bit values are permitted in a WebAssembly name, including the null byte and byte sequences that don't correspond to any Unicode code point regardless of encoding"

Agreed your phrasing is better.

jfbastien · 2016-03-01T20:53:57Z

lgtm besides comments.

ghost · 2016-03-01T21:21:13Z

Alternative suggestion: mangle any non ucs16 sequences so that all valid wasm names work on the web. Perhaps break to an escaped format if it fails the already documented conversion.

pizlonator · 2016-03-01T21:27:34Z

To give this serious consideration, I think you’d have to propose a specific mangling, and then provide reasoning about whether that mangling round-trips. That’s gets quite weird. It’s not obvious how to safely escape things while providing 1-1 mapping between JS UCS16 sequences and wasm byte sequences. The biggest risk if we go this route is that we get some aspect of this wrong with respect to either Unicode itself or some browsers implementation of it. Also, it will take more work for us, the spec writers, to write down and agree to an escaping format that satisfies all of the constraints that we might be interested in.

On the other hand, the likelihood that you’d choose to import or export something that has a name whose byte sequence doesn’t transcode is extremely low. It’s probably not something anyone would want to do. Most clients will either use a UTF8 encoding of some Unicode string (in which case they’re fine) or they’ll use ASCII (also fine). Therefore, making that unlikely-and-undesirable thing be an error saves the spec, and all implementors of the spec, from having to worry about escaping.

-Filip

On Mar 1, 2016, at 1:21 PM, JSStats [email protected] wrote:

Alternative suggestion: mangle any non ucs16 sequences so that all valid wasm names work on the web. Perhaps break to an escaped format if it fails the already documented conversion.

—
Reply to this email directly or view it on GitHub #573 (comment).

jfbastien · 2016-03-01T21:44:29Z

Any escaped sequence would presumably also be a valid export/import name, giving us a different problem where escaped name is the same as a confusingly-named export. I don't think it's really worth it :)

ghost · 2016-03-01T21:45:51Z

Good points, thank you.

AndrewScheidecker · 2016-03-01T21:54:56Z

You could make the mangling bijective, but it would require mangling otherwise valid UCS-16 strings (e.g. by expanding each byte into two hexits).

But I don't think it's worth distinguishing a web and non-web environment for this. Why not just say import/export names must be valid UTF-8 strings?

pizlonator · 2016-03-01T22:02:16Z

On Mar 1, 2016, at 1:55 PM, Andrew Scheidecker [email protected] wrote:

You could make the mangling bijective, but it would require mangling otherwise valid UCS-16 strings (e.g. by expanding each byte into two hexits).

But I don't think it's worth distinguishing a web and non-web environment for this. Why not just say import/export names must be valid UTF-8 strings?

This is an interesting suggestion. I think that it’s easier on clients and implementors if this is a web-only constraint.

Non-web implementers may have no other need for any kind of Unicode stuff. I think that this would be the only mention of Unicode in WebAssembly. A non-Web client wishing to implement accurate verification (i.e. if fails exactly when the spec it says it must fail) would have to do Unicode logic only in this one place, and only to support a verification rule that only helps the web.

On the other hand, the web embedding scenario will necessarily perform verifications that non-web clients have no need for. We’ll probably have to also add rules for what kind of object the imports object can be, though these rules may be implicit (for example if we say that the imports object is queried at module instantiation time then we’re mandating that all things named by the imports section are accessible via [[Get]] or somesuch). We’ll definitely have additional Web-specific export rules. For example I just realized that we probably want to prohibit exporting a function named “proto” because that that would probably make the world burn. Again, it would be weird if this was a rule that non-web users would have to follow.

Since we will anyway have to have Web-specific verification, and since the non-Web case is probably happy with names being sequences of bytes, I think that the approach I’ve proposed is least bad.

-Filip

—
Reply to this email directly or view it on GitHub #573 (comment).

jfbastien · 2016-03-01T22:04:09Z

Non-web implementers may have no other need for any kind of Unicode stuff. I think that this would be the only mention of Unicode in WebAssembly. A non-Web client wishing to implement accurate verification (i.e. if fails exactly when the spec it says it must fail) would have to do Unicode logic only in this one place, and only to support a verification rule that only helps the web.

Agreed, that's what we'd discussed to justify our approach. Add it to Rationale.md?

lukewagner · 2016-03-01T22:11:11Z

Agreed on keeping the utf8 (and any other) requirement a Web-only thing.

(As for __proto__, it seems to be possible to Object.defineProperty(o, '__proto__', ...) (and shadow the __proto__ accessor property defined on Object.prototype). So I'm not aware of any special need to exclude this particular case, devilish though it may be)

pizlonator · 2016-03-01T22:14:57Z

Oh that’s true!

Opinions on whether proto should be allowed as a function name when embedding in the web?

(I vote “no”. It would cause too much agony, and I can’t imagine a well-behaved client wanting to do it.)

-Filip

On Mar 1, 2016, at 2:11 PM, Luke Wagner [email protected] wrote:

Agreed on keeping the utf8 (and any other) requirement a Web-only thing.

(As for proto, it seems to be possible to Object.defineProperty(o, 'proto', ...) (and shadow the proto accessor property defined on Object.prototype). So I'm not aware of any special need to exclude this particular case, devilish though it may be)

—
Reply to this email directly or view it on GitHub #573 (comment).

lukewagner · 2016-03-01T22:36:32Z

It's hard to argue "yes" with any conviction, but unless we can find a case where it breaks or bothers some particular engine or tool (I think SM used to implement __proto__ in a particularly magic way that might've had problems, but no longer), I'd vote "yes" purely on the basis of avoiding adding special cases.

pizlonator · 2016-03-01T22:47:03Z

In JSC, the following evaluates to true:

({__proto__: 42}).__proto__ == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

That’s pretty weird! It means that a function exported as proto will not be accessible if the exports object is created in a manner that is logically equivalent to what a JS developer would reasonably expect (i.e. literal construction).

For this reason, I think that a special case that causes verification failure for proto is consistent with having verification failure for byte sequences that cannot be converted to JS strings: the idea is that if you would not have been able to easily reference the function name from JS, then you probably messed up.

-Filip

On Mar 1, 2016, at 2:36 PM, Luke Wagner [email protected] wrote:

It's hard to argue "yes" with any conviction, but unless we can find a case where it breaks or bothers some particular engine or tool (I think SM used to implement proto in a particularly magic way that might've had problems, but no longer), I'd vote "yes" purely on the basis of avoiding adding special cases.

—
Reply to this email directly or view it on GitHub #573 (comment).

lukewagner · 2016-03-02T00:45:01Z

Ah interesting. So object literals must not do define-property but rather something set-property-esque. We currently use define-property to build our wasm export object (set-property could trigger setters!) so

js> var code = wasmTextToBinary('(module (func (result i32) (i32.const 42)) (export "__proto__" 0))'));
js> var o = wasmEval(code);
js> o.__proto__();
42

works, but I guess that's not your point. It's weird, no doubt, but I'm wondering if people will thank us for rejecting this one particular string because there isn't an equivalent object literal. I guess that's something asm.js couldn't polyfill ;) Likely people will never notice, so if you're really worried about this causing problems, I'm fine rejecting it; we could always allow it later if it caused people trouble.

lukewagner · 2016-03-02T06:14:34Z

Web.md

+  var result = decodeURIComponent(escape(string));
+
+  // Check for errors. This will throw if 'result' contains bad characters.
+  encodeURIComponent(result);


I'm trying to figure out what byte sequences will not throw during decodeURIComponent(escape(string)) but will throw in encodeURIComponent(result). I tried both invalid utf8 byte sequences and also bad high/low surrogate pairs but decodeURIComponent catches them all. This seems to be implied by the table at the bottom of 18.2.6.1.2 and following notes.

Ah hah, and poking around more, it would appear that it's simplify invalid utf8 to store a code point in the surrogate range [0xd800, 0xdc00] (I had been thinking that utf8 wouldn't care about surrogates, but I guess this is a concession to utf16 baked into utf8?), so your definition here is precisely what we want (if we want to reuse our existing Utf8ToUtf16 conversion routines which I very much do. lgtm!

I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent is necessary or if decodeURIComponent catches it all.

decodeURIComponent should throw. The reason why I didn’t just rely on that is that some of the public documentation documents encodeURIComponent as throwing but doesn’t say that decodeURIComponent should throw. But having read the spec (both ES5.1 and ES6), I no longer think that the encodeURIComponent call is necessary.

I think that we can remove the encodeURIComponent call. Objections?

-Filip

On Mar 2, 2016, at 7:26 AM, Luke Wagner [email protected] wrote:

In Web.md #573 (comment):

+strings. A WebAssembly module may fail validation on the Web if it imports or
+exports functions whose names do not transcode cleanly to UTF-16 according to
+the following conversion algorithm, assuming that the WebAssembly name is in a
+Uint8Array called array:
+
+```
+function convertToJSString(array)
+{

// Perform the actual conversion.

var string = "";

for (var i = 0; i < array.length; ++i)

string += String.fromCharCode(array[i]);

var result = decodeURIComponent(escape(string));

// Check for errors. This will throw if 'result' contains bad characters.

encodeURIComponent(result);
I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent is necessary or if decodeURIComponent catches it all.

—
Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/573/files#r54737759.

rossberg · 2016-03-02T14:13:14Z

On 1 March 2016 at 23:47, pizlonator [email protected] wrote:

In JSC, the following evaluates to true:

({proto: 42}).proto == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

You just fell into one of JavaScript's many corner case traps. This only
yields true because you used 42, which is not an object, nor is it
converted to one. Instead, it gets replaced by Object.prototype (see
https://tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers,
but don't ask why, this is JavaScript). In contrast,

({__proto__: {}}).__proto__ == Object.prototype

will yield false.

Also note that this is a special case in the semantics of object literals
only, which is not something that Wasm needs to be concerned with. Other
than that, and the presence of Object.prototype.__proto__ (all of which
is Annex B crap anyway), there is no special treatment of the name
__proto__ in ES6. In particular, Object.defineProperty, the core
primitive for adding properties, has no awareness of it.
Hence, I see no reason for any exceptions regarding __proto__. In fact, I
would advise against it.

pizlonator · 2016-03-02T22:06:39Z

Good catch!

I agree with your analysis.

-Filip

On Mar 2, 2016, at 6:13 AM, rossberg-chromium [email protected] wrote:

On 1 March 2016 at 23:47, pizlonator [email protected] wrote:

In JSC, the following evaluates to true:

({proto: 42}).proto == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

You just fell into one of JavaScript's many corner case traps. This only
yields true because you used 42, which is not an object, nor is it
converted to one. Instead, it gets replaced by Object.prototype (see
https://tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers,
but don't ask why, this is JavaScript). In contrast,
({__proto__: {}}).__proto__ == Object.prototype
will yield false.

Also note that this is a special case in the semantics of object literals
only, which is not something that Wasm needs to be concerned with. Other
than that, and the presence of Object.prototype.__proto__ (all of which
is Annex B crap anyway), there is no special treatment of the name
__proto__ in ES6. In particular, Object.defineProperty, the core
operation for adding properties, has no special awareness of it.

Hence, I see no reason for any exceptions regarding __proto__. In fact, I
would advise against it.
—
Reply to this email directly or view it on GitHub #573 (comment).

Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. git-svn-id: http://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback.

lukewagner · 2016-03-04T15:59:22Z

lgtm

Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. git-svn-id: http://svn.webkit.org/repository/webkit/releases/WebKitGTK/webkit-2.12@197747 268f45cc-cd09-0410-ab3c-d52691b4dbfc

lukewagner · 2016-04-15T19:50:20Z

Oops, looks like this never merged. Any objections to merging now?

jfbastien · 2016-04-15T20:17:34Z

lgtm as well, let's merge!

@jf

* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments

@jf

* Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments

@jf

* Merge pull request #648 from WebAssembly/current_memory Add current_memory operator * Reorder section size field (#639) * Prettify section names (#638) * Extensible encoding of function signatures (#640) * Prettify section names * Restructure encoding of function signatures * Revert "[Binary 11] Update the version number to 0xB." * Leave index space for growing the number of base types * Comments addressed * clarify how export/import names convert to JS strings (#569) (#573) * When embedded in the web, clarify how export/import names convert to JS strings (#569) * Fixes suggested by @jf * Address more feedback Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback. * Access to proprietary APIs apart from HTML5 (#656) * comments * Merge pull request #641 from WebAssembly/postorder_opcodes Postorder opcodes * fix some text that seems to be in the wrong order (#670) * Clarify that br_table has a branch argument (#664) * Add explicit argument counts (#672) * Add explicit arities * Rename * Replace uint8 with varint7 in form field (#662) This needs to be variable-length.

Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added. Canonical link: https://commits.webkit.org/173015@main git-svn-id: https://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc

Rubber stamped by Saam Barati. I wrote some code like this while working on WebAssembly/design#573. I thought I'd add it as a benchmark since it stresses things that we may not have good bench coverage for. * js/regress/script-tests/string-transcoding.js: Added. (decodeUTF8): (encodeUTF8): (arraysEqual): (arrayToString): (setHeader): (print): (tryArray): (doSteps): * js/regress/string-transcoding-expected.txt: Added. * js/regress/string-transcoding.html: Added.

When embedded in the web, clarify how export/import names convert to …

621df25

…JS strings (#569)

jfbastien reviewed Mar 1, 2016
View reviewed changes

Fixes suggested by @jf

34149d8

lukewagner reviewed Mar 2, 2016
View reviewed changes

Address more feedback

2ffcb67

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html. Simplified the decoding algorithm thanks to Luke's feedback.

jfbastien merged commit 04c63fb into master Apr 15, 2016

jfbastien deleted the pizlonator-function-names-1 branch April 15, 2016 20:17

clarify how export/import names convert to JS strings (#569) #573

clarify how export/import names convert to JS strings (#569) #573

Uh oh!

Conversation

pizlonator commented Mar 1, 2016

Uh oh!

jfbastien Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

pizlonator Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

jfbastien Mar 1, 2016

Choose a reason for hiding this comment

Uh oh!

jfbastien commented Mar 1, 2016

Uh oh!

ghost commented Mar 1, 2016

Uh oh!

pizlonator commented Mar 1, 2016

Uh oh!

jfbastien commented Mar 1, 2016

Uh oh!

ghost commented Mar 1, 2016

Uh oh!

AndrewScheidecker commented Mar 1, 2016

Uh oh!

pizlonator commented Mar 1, 2016

Uh oh!

jfbastien commented Mar 1, 2016

Uh oh!

lukewagner commented Mar 1, 2016

Uh oh!

pizlonator commented Mar 1, 2016

Uh oh!

lukewagner commented Mar 1, 2016

Uh oh!

pizlonator commented Mar 1, 2016

Uh oh!

lukewagner commented Mar 2, 2016

Uh oh!

lukewagner Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

lukewagner Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

lukewagner Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

pizlonator Mar 2, 2016

Choose a reason for hiding this comment

Uh oh!

rossberg commented Mar 2, 2016

Uh oh!

pizlonator commented Mar 2, 2016

Uh oh!

lukewagner commented Mar 4, 2016

Uh oh!

lukewagner commented Apr 15, 2016

Uh oh!

jfbastien commented Apr 15, 2016

Uh oh!

Uh oh!