Skip to content

clarify how export/import names convert to JS strings (#569) #573

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 15, 2016

Conversation

pizlonator
Copy link
Contributor

No description provided.


A WebAssembly module imports and exports functions. WebAssembly names functions
using arbitrary-length byte sequences. The null character is permitted inside
WebAssembly function names. The most natural Web representation of a mapping of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"the null character and invalid Unicode code points are permitted ..."

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with what this is trying to say, but it's a bit confusing since WebAssembly's notion of names are not Unicode - they're just bytes.

How about: "any 8-bit values are permitted in a WebAssembly name, including the null byte and byte sequences that don't correspond to any Unicode code point regardless of encoding"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed your phrasing is better.

@jfbastien
Copy link
Member

lgtm besides comments.

@ghost
Copy link

ghost commented Mar 1, 2016

Alternative suggestion: mangle any non ucs16 sequences so that all valid wasm names work on the web. Perhaps break to an escaped format if it fails the already documented conversion.

@pizlonator
Copy link
Contributor Author

To give this serious consideration, I think you’d have to propose a specific mangling, and then provide reasoning about whether that mangling round-trips. That’s gets quite weird. It’s not obvious how to safely escape things while providing 1-1 mapping between JS UCS16 sequences and wasm byte sequences. The biggest risk if we go this route is that we get some aspect of this wrong with respect to either Unicode itself or some browsers implementation of it. Also, it will take more work for us, the spec writers, to write down and agree to an escaping format that satisfies all of the constraints that we might be interested in.

On the other hand, the likelihood that you’d choose to import or export something that has a name whose byte sequence doesn’t transcode is extremely low. It’s probably not something anyone would want to do. Most clients will either use a UTF8 encoding of some Unicode string (in which case they’re fine) or they’ll use ASCII (also fine). Therefore, making that unlikely-and-undesirable thing be an error saves the spec, and all implementors of the spec, from having to worry about escaping.

-Filip

On Mar 1, 2016, at 1:21 PM, JSStats [email protected] wrote:

Alternative suggestion: mangle any non ucs16 sequences so that all valid wasm names work on the web. Perhaps break to an escaped format if it fails the already documented conversion.


Reply to this email directly or view it on GitHub #573 (comment).

@jfbastien
Copy link
Member

Any escaped sequence would presumably also be a valid export/import name, giving us a different problem where escaped name is the same as a confusingly-named export. I don't think it's really worth it :)

@ghost
Copy link

ghost commented Mar 1, 2016

Good points, thank you.

@AndrewScheidecker
Copy link

You could make the mangling bijective, but it would require mangling otherwise valid UCS-16 strings (e.g. by expanding each byte into two hexits).

But I don't think it's worth distinguishing a web and non-web environment for this. Why not just say import/export names must be valid UTF-8 strings?

@pizlonator
Copy link
Contributor Author

On Mar 1, 2016, at 1:55 PM, Andrew Scheidecker [email protected] wrote:

You could make the mangling bijective, but it would require mangling otherwise valid UCS-16 strings (e.g. by expanding each byte into two hexits).

But I don't think it's worth distinguishing a web and non-web environment for this. Why not just say import/export names must be valid UTF-8 strings?

This is an interesting suggestion. I think that it’s easier on clients and implementors if this is a web-only constraint.

Non-web implementers may have no other need for any kind of Unicode stuff. I think that this would be the only mention of Unicode in WebAssembly. A non-Web client wishing to implement accurate verification (i.e. if fails exactly when the spec it says it must fail) would have to do Unicode logic only in this one place, and only to support a verification rule that only helps the web.

On the other hand, the web embedding scenario will necessarily perform verifications that non-web clients have no need for. We’ll probably have to also add rules for what kind of object the imports object can be, though these rules may be implicit (for example if we say that the imports object is queried at module instantiation time then we’re mandating that all things named by the imports section are accessible via [[Get]] or somesuch). We’ll definitely have additional Web-specific export rules. For example I just realized that we probably want to prohibit exporting a function named “proto” because that that would probably make the world burn. Again, it would be weird if this was a rule that non-web users would have to follow.

Since we will anyway have to have Web-specific verification, and since the non-Web case is probably happy with names being sequences of bytes, I think that the approach I’ve proposed is least bad.

-Filip


Reply to this email directly or view it on GitHub #573 (comment).

@jfbastien
Copy link
Member

Non-web implementers may have no other need for any kind of Unicode stuff. I think that this would be the only mention of Unicode in WebAssembly. A non-Web client wishing to implement accurate verification (i.e. if fails exactly when the spec it says it must fail) would have to do Unicode logic only in this one place, and only to support a verification rule that only helps the web.

Agreed, that's what we'd discussed to justify our approach. Add it to Rationale.md?

@lukewagner
Copy link
Member

Agreed on keeping the utf8 (and any other) requirement a Web-only thing.

(As for __proto__, it seems to be possible to Object.defineProperty(o, '__proto__', ...) (and shadow the __proto__ accessor property defined on Object.prototype). So I'm not aware of any special need to exclude this particular case, devilish though it may be)

@pizlonator
Copy link
Contributor Author

Oh that’s true!

Opinions on whether proto should be allowed as a function name when embedding in the web?

(I vote “no”. It would cause too much agony, and I can’t imagine a well-behaved client wanting to do it.)

-Filip

On Mar 1, 2016, at 2:11 PM, Luke Wagner [email protected] wrote:

Agreed on keeping the utf8 (and any other) requirement a Web-only thing.

(As for proto, it seems to be possible to Object.defineProperty(o, 'proto', ...) (and shadow the proto accessor property defined on Object.prototype). So I'm not aware of any special need to exclude this particular case, devilish though it may be)


Reply to this email directly or view it on GitHub #573 (comment).

@lukewagner
Copy link
Member

It's hard to argue "yes" with any conviction, but unless we can find a case where it breaks or bothers some particular engine or tool (I think SM used to implement __proto__ in a particularly magic way that might've had problems, but no longer), I'd vote "yes" purely on the basis of avoiding adding special cases.

@pizlonator
Copy link
Contributor Author

In JSC, the following evaluates to true:

({__proto__: 42}).__proto__ == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

That’s pretty weird! It means that a function exported as proto will not be accessible if the exports object is created in a manner that is logically equivalent to what a JS developer would reasonably expect (i.e. literal construction).

For this reason, I think that a special case that causes verification failure for proto is consistent with having verification failure for byte sequences that cannot be converted to JS strings: the idea is that if you would not have been able to easily reference the function name from JS, then you probably messed up.

-Filip

On Mar 1, 2016, at 2:36 PM, Luke Wagner [email protected] wrote:

It's hard to argue "yes" with any conviction, but unless we can find a case where it breaks or bothers some particular engine or tool (I think SM used to implement proto in a particularly magic way that might've had problems, but no longer), I'd vote "yes" purely on the basis of avoiding adding special cases.


Reply to this email directly or view it on GitHub #573 (comment).

@lukewagner
Copy link
Member

Ah interesting. So object literals must not do define-property but rather something set-property-esque. We currently use define-property to build our wasm export object (set-property could trigger setters!) so

js> var code = wasmTextToBinary('(module (func (result i32) (i32.const 42)) (export "__proto__" 0))'));
js> var o = wasmEval(code);
js> o.__proto__();
42

works, but I guess that's not your point. It's weird, no doubt, but I'm wondering if people will thank us for rejecting this one particular string because there isn't an equivalent object literal. I guess that's something asm.js couldn't polyfill ;) Likely people will never notice, so if you're really worried about this causing problems, I'm fine rejecting it; we could always allow it later if it caused people trouble.

var result = decodeURIComponent(escape(string));

// Check for errors. This will throw if 'result' contains bad characters.
encodeURIComponent(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to figure out what byte sequences will not throw during decodeURIComponent(escape(string)) but will throw in encodeURIComponent(result). I tried both invalid utf8 byte sequences and also bad high/low surrogate pairs but decodeURIComponent catches them all. This seems to be implied by the table at the bottom of 18.2.6.1.2 and following notes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah hah, and poking around more, it would appear that it's simplify invalid utf8 to store a code point in the surrogate range [0xd800, 0xdc00] (I had been thinking that utf8 wouldn't care about surrogates, but I guess this is a concession to utf16 baked into utf8?), so your definition here is precisely what we want (if we want to reuse our existing Utf8ToUtf16 conversion routines which I very much do. lgtm!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent is necessary or if decodeURIComponent catches it all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

decodeURIComponent should throw. The reason why I didn’t just rely on that is that some of the public documentation documents encodeURIComponent as throwing but doesn’t say that decodeURIComponent should throw. But having read the spec (both ES5.1 and ES6), I no longer think that the encodeURIComponent call is necessary.

I think that we can remove the encodeURIComponent call. Objections?

-Filip

On Mar 2, 2016, at 7:26 AM, Luke Wagner [email protected] wrote:

In Web.md #573 (comment):

+strings. A WebAssembly module may fail validation on the Web if it imports or
+exports functions whose names do not transcode cleanly to UTF-16 according to
+the following conversion algorithm, assuming that the WebAssembly name is in a
+Uint8Array called array:
+
+```
+function convertToJSString(array)
+{

  • // Perform the actual conversion.
  • var string = "";
  • for (var i = 0; i < array.length; ++i)
  • string += String.fromCharCode(array[i]);
  • var result = decodeURIComponent(escape(string));
  • // Check for errors. This will throw if 'result' contains bad characters.
  • encodeURIComponent(result);
    I should add: it has the right behavior, but I'd still like to know if the encodeURIComponent is necessary or if decodeURIComponent catches it all.


Reply to this email directly or view it on GitHub https://github.com/WebAssembly/design/pull/573/files#r54737759.

@rossberg
Copy link
Member

rossberg commented Mar 2, 2016

On 1 March 2016 at 23:47, pizlonator [email protected] wrote:

In JSC, the following evaluates to true:

({proto: 42}).proto == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

You just fell into one of JavaScript's many corner case traps. This only
yields true because you used 42, which is not an object, nor is it
converted to one. Instead, it gets replaced by Object.prototype (see
https://tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers,
but don't ask why, this is JavaScript). In contrast,

({__proto__: {}}).__proto__ == Object.prototype

will yield false.

Also note that this is a special case in the semantics of object literals
only, which is not something that Wasm needs to be concerned with. Other
than that, and the presence of Object.prototype.__proto__ (all of which
is Annex B crap anyway), there is no special treatment of the name
__proto__ in ES6. In particular, Object.defineProperty, the core
primitive for adding properties, has no awareness of it.
Hence, I see no reason for any exceptions regarding __proto__. In fact, I
would advise against it.

@pizlonator
Copy link
Contributor Author

Good catch!

I agree with your analysis.

-Filip

On Mar 2, 2016, at 6:13 AM, rossberg-chromium [email protected] wrote:

On 1 March 2016 at 23:47, pizlonator [email protected] wrote:

In JSC, the following evaluates to true:

({proto: 42}).proto == Object.prototype

It also evaluates to true in Firefox. I didn’t try others.

You just fell into one of JavaScript's many corner case traps. This only
yields true because you used 42, which is not an object, nor is it
converted to one. Instead, it gets replaced by Object.prototype (see
https://tc39.github.io/ecma262/#sec-__proto__-property-names-in-object-initializers,
but don't ask why, this is JavaScript). In contrast,

({__proto__: {}}).__proto__ == Object.prototype

will yield false.

Also note that this is a special case in the semantics of object literals
only, which is not something that Wasm needs to be concerned with. Other
than that, and the presence of Object.prototype.__proto__ (all of which
is Annex B crap anyway), there is no special treatment of the name
__proto__ in ES6. In particular, Object.defineProperty, the core
operation for adding properties, has no special awareness of it.

Hence, I see no reason for any exceptions regarding __proto__. In fact, I
would advise against it.

Reply to this email directly or view it on GitHub #573 (comment).

kisg pushed a commit to paul99/webkit-mips that referenced this pull request Mar 3, 2016
Rubber stamped by Saam Barati.

I wrote some code like this while working on
WebAssembly/design#573. I thought I'd add it as a benchmark since
it stresses things that we may not have good bench coverage for.

* js/regress/script-tests/string-transcoding.js: Added.
(decodeUTF8):
(encodeUTF8):
(arraysEqual):
(arrayToString):
(setHeader):
(print):
(tryArray):
(doSteps):
* js/regress/string-transcoding-expected.txt: Added.
* js/regress/string-transcoding.html: Added.



git-svn-id: http://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc
Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html.  Simplified the decoding algorithm thanks to Luke's feedback.
@lukewagner
Copy link
Member

lgtm

bertogg pushed a commit to Igalia/webkit that referenced this pull request Mar 22, 2016
Rubber stamped by Saam Barati.

I wrote some code like this while working on
WebAssembly/design#573. I thought I'd add it as a benchmark since
it stresses things that we may not have good bench coverage for.

* js/regress/script-tests/string-transcoding.js: Added.
(decodeUTF8):
(encodeUTF8):
(arraysEqual):
(arrayToString):
(setHeader):
(print):
(tryArray):
(doSteps):
* js/regress/string-transcoding-expected.txt: Added.
* js/regress/string-transcoding.html: Added.

git-svn-id: http://svn.webkit.org/repository/webkit/releases/WebKitGTK/webkit-2.12@197747 268f45cc-cd09-0410-ab3c-d52691b4dbfc
annulen pushed a commit to qtwebkit/qtwebkit that referenced this pull request Mar 23, 2016
Rubber stamped by Saam Barati.

I wrote some code like this while working on
WebAssembly/design#573. I thought I'd add it as a benchmark since
it stresses things that we may not have good bench coverage for.

* js/regress/script-tests/string-transcoding.js: Added.
(decodeUTF8):
(encodeUTF8):
(arraysEqual):
(arrayToString):
(setHeader):
(print):
(tryArray):
(doSteps):
* js/regress/string-transcoding-expected.txt: Added.
* js/regress/string-transcoding.html: Added.

git-svn-id: http://svn.webkit.org/repository/webkit/releases/WebKitGTK/webkit-2.12@197747 268f45cc-cd09-0410-ab3c-d52691b4dbfc
@lukewagner
Copy link
Member

Oops, looks like this never merged. Any objections to merging now?

@jfbastien
Copy link
Member

lgtm as well, let's merge!

@jfbastien jfbastien merged commit 04c63fb into master Apr 15, 2016
@jfbastien jfbastien deleted the pizlonator-function-names-1 branch April 15, 2016 20:17
rossberg added a commit that referenced this pull request Apr 19, 2016
* Prettify section names

* Restructure encoding of function signatures

* Revert "[Binary 11] Update the version number to 0xB."

* Leave index space for growing the number of base types

* Comments addressed

* clarify how export/import names convert to JS strings (#569) (#573)

* When embedded in the web, clarify how export/import names convert to JS strings (#569)

* Fixes suggested by @jf

* Address more feedback

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html.  Simplified the decoding algorithm thanks to Luke's feedback.

* Access to proprietary APIs apart from HTML5 (#656)

* comments
lukewagner pushed a commit that referenced this pull request Apr 28, 2016
* Prettify section names

* Restructure encoding of function signatures

* Revert "[Binary 11] Update the version number to 0xB."

* Leave index space for growing the number of base types

* Comments addressed

* clarify how export/import names convert to JS strings (#569) (#573)

* When embedded in the web, clarify how export/import names convert to JS strings (#569)

* Fixes suggested by @jf

* Address more feedback

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html.  Simplified the decoding algorithm thanks to Luke's feedback.

* Access to proprietary APIs apart from HTML5 (#656)

* comments
lukewagner added a commit that referenced this pull request Apr 29, 2016
* Merge pull request #648 from WebAssembly/current_memory

Add current_memory operator

* Reorder section size field (#639)

* Prettify section names (#638)

* Extensible encoding of function signatures (#640)

* Prettify section names

* Restructure encoding of function signatures

* Revert "[Binary 11] Update the version number to 0xB."

* Leave index space for growing the number of base types

* Comments addressed

* clarify how export/import names convert to JS strings (#569) (#573)

* When embedded in the web, clarify how export/import names convert to JS strings (#569)

* Fixes suggested by @jf

* Address more feedback

Added a link to http://monsur.hossa.in/2012/07/20/utf-8-in-javascript.html.  Simplified the decoding algorithm thanks to Luke's feedback.

* Access to proprietary APIs apart from HTML5 (#656)

* comments

* Merge pull request #641 from WebAssembly/postorder_opcodes

Postorder opcodes

* fix some text that seems to be in the wrong order (#670)

* Clarify that br_table has a branch argument (#664)

* Add explicit argument counts (#672)

* Add explicit arities

* Rename

* Replace uint8 with varint7 in form field (#662)

This needs to be variable-length.
ryanhaddad pushed a commit to WebKit/WebKit that referenced this pull request Dec 22, 2020
Rubber stamped by Saam Barati.

I wrote some code like this while working on
WebAssembly/design#573. I thought I'd add it as a benchmark since
it stresses things that we may not have good bench coverage for.

* js/regress/script-tests/string-transcoding.js: Added.
(decodeUTF8):
(encodeUTF8):
(arraysEqual):
(arrayToString):
(setHeader):
(print):
(tryArray):
(doSteps):
* js/regress/string-transcoding-expected.txt: Added.
* js/regress/string-transcoding.html: Added.



Canonical link: https://commits.webkit.org/173015@main
git-svn-id: https://svn.webkit.org/repository/webkit/trunk@197465 268f45cc-cd09-0410-ab3c-d52691b4dbfc
JonWBedard pushed a commit to WebKit/WebKit that referenced this pull request Sep 30, 2022
Rubber stamped by Saam Barati.

I wrote some code like this while working on
WebAssembly/design#573. I thought I'd add it as a benchmark since
it stresses things that we may not have good bench coverage for.

* js/regress/script-tests/string-transcoding.js: Added.
(decodeUTF8):
(encodeUTF8):
(arraysEqual):
(arrayToString):
(setHeader):
(print):
(tryArray):
(doSteps):
* js/regress/string-transcoding-expected.txt: Added.
* js/regress/string-transcoding.html: Added.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants