Skip to content

Commit 21babb0

Browse files
author
Chris Cho
authored
DOCSP-20056: UTF-8 validation options (#280)
* DOCSP-20056: UTF-8 validation options
1 parent 6f142fd commit 21babb0

File tree

4 files changed

+123
-1
lines changed

4 files changed

+123
-1
lines changed

source/fundamentals.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,5 +20,6 @@ Fundamentals
2020
/fundamentals/gridfs
2121
/fundamentals/time-series
2222
/fundamentals/typescript
23+
/fundamentals/utf8-validation
2324

2425
.. include:: /includes/fundamentals-sections.rst
Lines changed: 120 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,120 @@
1+
.. _nodejs-utf-8-validation:
2+
3+
================
4+
UTF-8 Validation
5+
================
6+
7+
.. default-domain:: mongodb
8+
9+
.. contents:: On this page
10+
:local:
11+
:backlinks: none
12+
:depth: 2
13+
:class: singlecol
14+
15+
Overview
16+
--------
17+
18+
In this guide, you can learn how to enable or disable the {+driver-short+}'s
19+
**UTF-8** validation feature. UTF-8 is a character encoding specification
20+
that ensures compatibility and consistent presentation across most operating
21+
systems, applications, and language character sets.
22+
23+
If you *enable* validation, the driver throws an error when it attempts to
24+
convert data that contains invalid UTF-8 characters. The validation adds
25+
processing overhead since it needs to check the data.
26+
27+
If you *disable* validation, your application avoids the validation processing
28+
overhead, but cannot guarantee consistent presentation of invalid UTF-8 data.
29+
30+
The driver enables UTF-8 validation by default. It checks documents for any
31+
characters that are not encoded in a valid UTF-8 format when it transfers data
32+
between your application and MongoDB.
33+
34+
.. note::
35+
36+
The current version of the {+driver-short+} automatically substitutes
37+
invalid UTF-8 characters with alternate valid UTF-8 ones prior to
38+
validation when you send data to MongoDB. Therefore, the validation
39+
only throws an error when the setting is enabled and the driver
40+
receives invalid UTF-8 document data from MongoDB.
41+
42+
Read the sections below to learn how to set UTF-8 validation using the
43+
{+driver-short+}.
44+
45+
.. _nodejs-specify-utf-8-validation:
46+
47+
Specify the UTF-8 Validation Setting
48+
------------------------------------
49+
50+
You can specify whether the driver should perform UTF-8 validation by
51+
defining the ``enableUtf8Validation`` setting in the options parameter
52+
when you create a client, reference a database or collection, or call a
53+
CRUD operation. If you omit the setting, the driver enables UTF-8 validation.
54+
55+
See the following for code examples that demonstrate how to disable UTF-8
56+
validation on the client, database, collection, or CRUD operation:
57+
58+
.. code-block:: javascript
59+
60+
// disable UTF-8 validation on the client
61+
new MongoClient('<connection uri>', { enableUtf8Validation: false });
62+
63+
// disable UTF-8 validation on the database
64+
client.db('<database name>', { enableUtf8Validation: false });
65+
66+
// disable UTF-8 validation on the collection
67+
db.collection('<collection name>', { enableUtf8Validation: false });
68+
69+
// disable UTF-8 validation on a specific operation call
70+
await collection.findOne({ title: 'Cam Jansen'}, { enableUtf8Validation: false });
71+
72+
If your application reads invalid UTF-8 from MongoDB while the
73+
``enableUtf8Validation`` option is enabled, it throws a ``BSONError`` that
74+
contains the following message:
75+
76+
.. code-block::
77+
78+
Invalid UTF-8 string in BSON document
79+
80+
.. _nodejs-utf-8-validation-scope:
81+
82+
Set the Validation Scope
83+
~~~~~~~~~~~~~~~~~~~~~~~~
84+
85+
The ``enableUtf8Validation`` setting automatically applies to the scope of the
86+
object instance on which you included it, and any other objects created by
87+
calls on that instance.
88+
89+
For example, if you include the option on the call to instantiate a database
90+
object, any collection instance you construct from that object inherits
91+
the setting. Any operations you call on that collection instance also
92+
inherit the setting.
93+
94+
.. code-block:: javascript
95+
96+
const database = client.db('books', { enableUtf8Validation: false });
97+
98+
// The collection inherits the UTF-8 validation disabled setting from the database
99+
const collection = database.collection('mystery');
100+
101+
// CRUD operation runs with UTF-8 validation disabled
102+
await collection.findOne({ title: 'Encyclopedia Brown' });
103+
104+
You can override the setting at any level of scope by including it when
105+
constructing the object instance or when calling an operation.
106+
107+
For example, if you disable validation on the collection object, you can
108+
override the setting in individual CRUD operation calls on that
109+
collection.
110+
111+
.. code-block:: javascript
112+
113+
const collection = database.collection('mystery', { enableUtf8Validation: false });
114+
115+
// CRUD operation runs with UTF-8 validation enabled
116+
await collection.findOne({ title: 'Trixie Belden' }, { enableUtf8Validation: true });
117+
118+
// CRUD operation runs with UTF-8 validation disabled
119+
await collection.findOne({ title: 'Enola Holmes' });
120+

source/includes/fundamentals-sections.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,4 +15,5 @@ Fundamentals section:
1515
- :doc:`Store and Retrieve Large Files in MongoDB </fundamentals/gridfs>`
1616
- :doc:`Create and Query Time Series Collection </fundamentals/time-series>`
1717
- :doc:`Specify Type Parameters with TypeScript </fundamentals/typescript>`
18+
- :doc:`Specify UTF-8 Validation Settings </fundamentals/utf8-validation>`
1819

source/whats-new.txt

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ What's New in 4.3
2626
New features of the 4.3 Node.js driver release include:
2727

2828
- SOCKS5 support
29-
- Option to disable UTF-8 validation
29+
- Option to :ref:`disable UTF-8 validation <nodejs-utf-8-validation>`
3030
- Type inference for nested documents
3131

3232
.. _version-4.2:

0 commit comments

Comments
 (0)