Skip to content

Commit c5ee34e

Browse files
TimothyGuaddaleax
authored andcommitted
encoding: rudimentary TextDecoder support w/o ICU
Also split up the tests. Backport-PR-URL: #14786 Backport-Reviewed-By: Anna Henningsen <[email protected]> PR-URL: #14489 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Refael Ackermann <[email protected]>
1 parent a781bb4 commit c5ee34e

File tree

8 files changed

+428
-268
lines changed

8 files changed

+428
-268
lines changed

doc/api/errors.md

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -712,6 +712,12 @@ only used in the [WHATWG URL API][] for strict compliance with the specification
712712
native Node.js APIs, `func(undefined)` and `func()` are treated identically, and
713713
the [`ERR_INVALID_ARG_TYPE`][] error code may be used instead.
714714

715+
<a id="ERR_NO_ICU"></a>
716+
### ERR_NO_ICU
717+
718+
Used when an attempt is made to use features that require [ICU][], while
719+
Node.js is not compiled with ICU support.
720+
715721
<a id="ERR_SOCKET_ALREADY_BOUND"></a>
716722
### ERR_SOCKET_ALREADY_BOUND
717723
Used when an attempt is made to bind a socket that has already been bound.
@@ -795,6 +801,7 @@ are most likely an indication of a bug within Node.js itself.
795801
[`new URLSearchParams(iterable)`]: url.html#url_constructor_new_urlsearchparams_iterable
796802
[`process.on('uncaughtException')`]: process.html#process_event_uncaughtexception
797803
[`process.send()`]: process.html#process_process_send_message_sendhandle_options_callback
804+
[ICU]: intl.html#intl_internationalization_support
798805
[Node.js Error Codes]: #nodejs-error-codes
799806
[V8's stack trace API]: https://github.com/v8/v8/wiki/Stack-Trace-API
800807
[WHATWG URL API]: url.html#url_the_whatwg_url_api

doc/api/intl.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,7 +52,7 @@ option:
5252
| [WHATWG URL Parser][] | partial (no IDN support) | full | full | full
5353
| [`require('buffer').transcode()`][] | none (function does not exist) | full | full | full
5454
| [REPL][] | partial (inaccurate line editing) | full | full | full
55-
| [`require('util').TextDecoder`][] | none (class does not exist) | partial/full (depends on OS) | partial (Unicode-only) | full
55+
| [`require('util').TextDecoder`][] | partial (basic encodings support) | partial/full (depends on OS) | partial (Unicode-only) | full
5656

5757
*Note*: The "(not locale-aware)" designation denotes that the function carries
5858
out its operation just like the non-`Locale` version of the function, if one

doc/api/util.md

Lines changed: 39 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -544,7 +544,7 @@ added: v8.0.0
544544
A Symbol that can be used to declare custom promisified variants of functions,
545545
see [Custom promisified functions][].
546546

547-
### Class: util.TextDecoder
547+
## Class: util.TextDecoder
548548
<!-- YAML
549549
added: v8.3.0
550550
-->
@@ -563,23 +563,33 @@ while (buffer = getNextChunkSomehow()) {
563563
string += decoder.decode(); // end-of-stream
564564
```
565565

566-
#### WHATWG Supported Encodings
566+
### WHATWG Supported Encodings
567567

568568
Per the [WHATWG Encoding Standard][], the encodings supported by the
569569
`TextDecoder` API are outlined in the tables below. For each encoding,
570-
one or more aliases may be used. Support for some encodings is enabled
571-
only when Node.js is using the full ICU data (see [Internationalization][]).
572-
`util.TextDecoder` is `undefined` when ICU is not enabled during build.
570+
one or more aliases may be used.
573571

574-
##### Encodings Supported By Default
572+
Different Node.js build configurations support different sets of encodings.
573+
While a very basic set of encodings is supported even on Node.js builds without
574+
ICU enabled, support for some encodings is provided only when Node.js is built
575+
with ICU and using the full ICU data (see [Internationalization][]).
576+
577+
#### Encodings Supported Without ICU
575578

576579
| Encoding | Aliases |
577580
| ----------- | --------------------------------- |
578-
| `'utf8'` | `'unicode-1-1-utf-8'`, `'utf-8'` |
579-
| `'utf-16be'`| |
581+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
580582
| `'utf-16le'`| `'utf-16'` |
581583

582-
##### Encodings Requiring Full-ICU
584+
#### Encodings Supported by Default (With ICU)
585+
586+
| Encoding | Aliases |
587+
| ----------- | --------------------------------- |
588+
| `'utf-8'` | `'unicode-1-1-utf-8'`, `'utf8'` |
589+
| `'utf-16le'`| `'utf-16'` |
590+
| `'utf-16be'`| |
591+
592+
#### Encodings Requiring Full ICU Data
583593

584594
| Encoding | Aliases |
585595
| ----------------- | -------------------------------- |
@@ -621,13 +631,14 @@ only when Node.js is using the full ICU data (see [Internationalization][]).
621631
*Note*: The `'iso-8859-16'` encoding listed in the [WHATWG Encoding Standard][]
622632
is not supported.
623633

624-
#### new TextDecoder([encoding[, options]])
634+
### new TextDecoder([encoding[, options]])
625635

626636
* `encoding` {string} Identifies the `encoding` that this `TextDecoder` instance
627637
supports. Defaults to `'utf-8'`.
628638
* `options` {Object}
629639
* `fatal` {boolean} `true` if decoding failures are fatal. Defaults to
630-
`false`.
640+
`false`. This option is only supported when ICU is enabled (see
641+
[Internationalization][]).
631642
* `ignoreBOM` {boolean} When `true`, the `TextDecoder` will include the byte
632643
order mark in the decoded result. When `false`, the byte order mark will
633644
be removed from the output. This option is only used when `encoding` is
@@ -636,7 +647,7 @@ is not supported.
636647
Creates an new `TextDecoder` instance. The `encoding` may specify one of the
637648
supported encodings or an alias.
638649

639-
#### textDecoder.decode([input[, options]])
650+
### textDecoder.decode([input[, options]])
640651

641652
* `input` {ArrayBuffer|DataView|TypedArray} An `ArrayBuffer`, `DataView` or
642653
Typed Array instance containing the encoded data.
@@ -652,49 +663,55 @@ internally and emitted after the next call to `textDecoder.decode()`.
652663
If `textDecoder.fatal` is `true`, decoding errors that occur will result in a
653664
`TypeError` being thrown.
654665

655-
#### textDecoder.encoding
666+
### textDecoder.encoding
656667

657-
* Value: {string}
668+
* {string}
658669

659670
The encoding supported by the `TextDecoder` instance.
660671

661-
#### textDecoder.fatal
672+
### textDecoder.fatal
662673

663-
* Value: {boolean}
674+
* {boolean}
664675

665676
The value will be `true` if decoding errors result in a `TypeError` being
666677
thrown.
667678

668-
#### textDecoder.ignoreBOM
679+
### textDecoder.ignoreBOM
669680

670-
* Value: {boolean}
681+
* {boolean}
671682

672683
The value will be `true` if the decoding result will include the byte order
673684
mark.
674685

675-
### Class: util.TextEncoder
686+
## Class: util.TextEncoder
676687
<!-- YAML
677688
added: v8.3.0
678689
-->
679690

680691
> Stability: 1 - Experimental
681692
682693
An implementation of the [WHATWG Encoding Standard][] `TextEncoder` API. All
683-
instances of `TextEncoder` only support `UTF-8` encoding.
694+
instances of `TextEncoder` only support UTF-8 encoding.
684695

685696
```js
686697
const encoder = new TextEncoder();
687698
const uint8array = encoder.encode('this is some data');
688699
```
689700

690-
#### textEncoder.encode([input])
701+
### textEncoder.encode([input])
691702

692703
* `input` {string} The text to encode. Defaults to an empty string.
693704
* Returns: {Uint8Array}
694705

695-
UTF-8 Encodes the `input` string and returns a `Uint8Array` containing the
706+
UTF-8 encodes the `input` string and returns a `Uint8Array` containing the
696707
encoded bytes.
697708

709+
### textDecoder.encoding
710+
711+
* {string}
712+
713+
The encoding supported by the `TextEncoder` instance. Always set to `'utf-8'`.
714+
698715
## Deprecated APIs
699716

700717
The following APIs have been deprecated and should no longer be used. Existing

0 commit comments

Comments
 (0)