Skip to content

Commit 4838b7a

Browse files
committed
Update to the requirement that names be UTF-8.
See WebAssembly/design#1016.
1 parent 3aa3270 commit 4838b7a

File tree

1 file changed

+29
-22
lines changed

1 file changed

+29
-22
lines changed

WebAssembly.md

Lines changed: 29 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -191,7 +191,8 @@ Except when specified otherwise, all values are encoded in
191191
### Additional Encoding Types
192192

193193
0. [Array](#array)
194-
0. [String](#string)
194+
0. [Byte Sequence](#byte-sequence)
195+
0. [Identifier](#identifier)
195196

196197
#### Array
197198

@@ -200,12 +201,20 @@ followed by a sequence of that many elements of that type.
200201

201202
> Array elements needn't all be the same size in some representations.
202203
203-
#### String
204+
#### Byte Sequence
204205

205-
A *string* is an [array] of bytes.
206+
A *byte sequence* is an [array] of bytes.
206207

207-
> Strings in this context may contain arbitrary bytes and aren't required to be
208-
valid UTF-8 or any other format, and aren't required to be NUL-terminated.
208+
> Byte sequences may contain arbitrary bytes and aren't required to be
209+
[valid UTF-8] or any other format.
210+
211+
#### Identifier
212+
213+
An *identifier* is a [byte sequence] which is [valid UTF-8].
214+
215+
> Identifiers may contain NUL characters, aren't required to be NUL-terminated,
216+
aren't required to be normalized, and aren't required to be marked with a BOM
217+
(though they aren't prohibited from containing a BOM).
209218

210219
### Value Types
211220

@@ -358,8 +367,8 @@ initializers.
358367

359368
Modules contain a version [varuint32].
360369

361-
Modules also contain a sequence of sections. Each section has a [string] *name*
362-
and associated data.
370+
Modules also contain a sequence of sections. Each section has an [identifier]
371+
*name* and associated data.
363372

364373
**Validation:**
365374
- The version index is required to be equal to `0xc`.
@@ -435,8 +444,8 @@ An *import* consists of:
435444

436445
| Field Name | Type | Description |
437446
| --------------- | -------------------- | ---------------------------------------- |
438-
| `module_name` | [string] | the name of the module to import from |
439-
| `export_name` | [string] | the name of the export in that module |
447+
| `module_name` | [identifier] | the name of the module to import from |
448+
| `export_name` | [identifier] | the name of the export in that module |
440449
| `kind` | [external kind] | the kind of import |
441450

442451
If `kind` is `Function`, the following fields are appended.
@@ -566,7 +575,7 @@ An *export* consists of:
566575

567576
| Field Name | Type | Description |
568577
| --------------- | ------------------ | --------------------------------------- |
569-
| `name` | [string] | field name |
578+
| `name` | [identifier] | field name |
570579
| `kind` | [external kind] | the kind of export |
571580
| `index` | [varuint32] | an index into an [index space] |
572581

@@ -677,7 +686,7 @@ A *data initializer* consists of:
677686
| --------------- | -------------------------------- | --------------------------------------------------- |
678687
| `index` | [varuint32] | a [linear memory index](#linear-memory-index-space) |
679688
| `offset` | [instantiation-time initializer] | the index of the byte in memory to start at |
680-
| `data` | [string] | data to initialize the contents of linear memory |
689+
| `data` | [byte sequence] | data to initialize the contents of linear memory |
681690

682691
It describes data to be loaded into the linear memory identified by the index in
683692
the [linear-memory index space] during
@@ -702,8 +711,8 @@ the [linear-memory index space] during
702711
The Names Section consists of an [array] of function name descriptors, which
703712
each describe names for the function with the corresponding index in the
704713
[function index space] and which consist of:
705-
- the function name, a [string].
706-
- the names of the locals in the function, an [array] of [strings].
714+
- the function name, an [identifier].
715+
- the names of the locals in the function, an [array] of [identifiers].
707716

708717
The Names Section doesn't change execution semantics and malformed constructs,
709718
such as out-of-bounds indices, in this section cause the section to be ignored,
@@ -718,11 +727,6 @@ human-readable format in a browser or other development environment, the names
718727
in this section are to be used as the names of functions and locals in the
719728
[text format].
720729

721-
TODO: Should the names in this section be required to be valid UTF-8 strings?
722-
This section isn't used during normal validation or execution, so it's off the
723-
"hot path" and is only used during debugging, to present strings to humans, so
724-
it might make sense.
725-
726730
### Module Index Spaces
727731

728732
Module Index Spaces are abstract mappings from indices, starting from zero, to
@@ -2664,8 +2668,9 @@ being the value of the linear-memory space's initial size field is created,
26642668
added to the instance, and initialized to all zeros. For a linear-memory import,
26652669
storage for the array is already allocated.
26662670

2667-
The contents of the [Data Section] are loaded into the byte array. Each [string]
2668-
is loaded into linear memory starting at its associated start offset value.
2671+
The contents of the [Data Section] are loaded into the byte array. Each
2672+
[byte sequence] is loaded into linear memory starting at its associated start
2673+
offset value.
26692674

26702675
**Trap:** Dynamic Resource Exhaustion, if dynamic resources are insufficient to
26712676
support creation of the array.
@@ -2863,11 +2868,14 @@ TODO: Figure out what to say about the text format.
28632868
[boolean]: #booleans
28642869
[byte]: #bytes
28652870
[bytes]: #bytes
2871+
[byte sequence]: #byte-sequence
28662872
[call-stack resources]: #call-stack-resources
28672873
[effective address]: #effective-address
28682874
[external kind]: #external-kinds)
28692875
[false]: #booleans
28702876
[Floor and Ceiling Functions]: https://en.wikipedia.org/wiki/Floor_and_ceiling_functions
2877+
[identifier]: #identifier
2878+
[identifiers]: #identifier
28712879
[index space]: #module-index-spaces
28722880
[instantiation-time initializer]: #instantiation-time-initializers
28732881
[KiB]: https://en.wikipedia.org/wiki/Kibibyte
@@ -2888,8 +2896,6 @@ TODO: Figure out what to say about the text format.
28882896
[shifted]: https://en.wikipedia.org/wiki/Logical_shift
28892897
[sign-extended]: https://en.wikipedia.org/wiki/Sign_extension
28902898
[signature kind]: #signature-kinds
2891-
[string]: #string
2892-
[strings]: #string
28932899
[table]: #tables
28942900
[table element type]: #table-element-type
28952901
[text format]: #text-format
@@ -2904,6 +2910,7 @@ TODO: Figure out what to say about the text format.
29042910
[two's complement sum]: https://en.wikipedia.org/wiki/Two%27s_complement#Addition
29052911
[value type]: #value-types
29062912
[uint32]: #primitive-type-encodings
2913+
[valid UTF-8]: https://encoding.spec.whatwg.org/#utf-8-decode-without-bom-or-fail
29072914
[varuint1]: #primitive-type-encodings
29082915
[varuint7]: #primitive-type-encodings
29092916
[varuint32]: #primitive-type-encodings

0 commit comments

Comments
 (0)