|
| 1 | +.. _nodejs-utf-8-validation: |
| 2 | + |
| 3 | +================ |
| 4 | +UTF-8 Validation |
| 5 | +================ |
| 6 | + |
| 7 | +.. default-domain:: mongodb |
| 8 | + |
| 9 | +.. contents:: On this page |
| 10 | + :local: |
| 11 | + :backlinks: none |
| 12 | + :depth: 2 |
| 13 | + :class: singlecol |
| 14 | + |
| 15 | +Overview |
| 16 | +-------- |
| 17 | + |
| 18 | +In this guide, you can learn how to enable or disable the {+driver-short+}'s |
| 19 | +**UTF-8** validation feature. UTF-8 is a character encoding specification |
| 20 | +that ensures compatibility and consistent presentation across most operating |
| 21 | +systems, applications, and language character sets. |
| 22 | + |
| 23 | +If you *enable* validation, the driver throws an error when it attempts to |
| 24 | +convert data that contains invalid UTF-8 characters. The validation adds |
| 25 | +processing overhead since it needs to check the data. |
| 26 | + |
| 27 | +If you *disable* validation, your application avoids the validation processing |
| 28 | +overhead, but cannot guarantee consistent presentation of invalid UTF-8 data. |
| 29 | + |
| 30 | +The driver enables UTF-8 validation by default. It checks documents for any |
| 31 | +characters that are not encoded in a valid UTF-8 format when it transfers data |
| 32 | +between your application and MongoDB. |
| 33 | + |
| 34 | +.. note:: |
| 35 | + |
| 36 | + The current version of the {+driver-short+} automatically substitutes |
| 37 | + invalid UTF-8 characters with alternate valid UTF-8 ones prior to |
| 38 | + validation when you send data to MongoDB. Therefore, the validation |
| 39 | + only throws an error when the setting is enabled and the driver |
| 40 | + receives invalid UTF-8 document data from MongoDB. |
| 41 | + |
| 42 | +Read the sections below to learn how to set UTF-8 validation using the |
| 43 | +{+driver-short+}. |
| 44 | + |
| 45 | +.. _nodejs-specify-utf-8-validation: |
| 46 | + |
| 47 | +Specify the UTF-8 Validation Setting |
| 48 | +------------------------------------ |
| 49 | + |
| 50 | +You can specify whether the driver should perform UTF-8 validation by |
| 51 | +defining the ``enableUtf8Validation`` setting in the options parameter |
| 52 | +when you create a client, reference a database or collection, or call a |
| 53 | +CRUD operation. If you omit the setting, the driver enables UTF-8 validation. |
| 54 | + |
| 55 | +See the following for code examples that demonstrate how to disable UTF-8 |
| 56 | +validation on the client, database, collection, or CRUD operation: |
| 57 | + |
| 58 | +.. code-block:: javascript |
| 59 | + |
| 60 | + // disable UTF-8 validation on the client |
| 61 | + new MongoClient('<connection uri>', { enableUtf8Validation: false }); |
| 62 | + |
| 63 | + // disable UTF-8 validation on the database |
| 64 | + client.db('<database name>', { enableUtf8Validation: false }); |
| 65 | + |
| 66 | + // disable UTF-8 validation on the collection |
| 67 | + db.collection('<collection name>', { enableUtf8Validation: false }); |
| 68 | + |
| 69 | + // disable UTF-8 validation on a specific operation call |
| 70 | + await collection.findOne({ title: 'Cam Jansen'}, { enableUtf8Validation: false }); |
| 71 | + |
| 72 | +If your application reads invalid UTF-8 from MongoDB while the |
| 73 | +``enableUtf8Validation`` option is enabled, it throws a ``BSONError`` that |
| 74 | +contains the following message: |
| 75 | + |
| 76 | +.. code-block:: |
| 77 | + |
| 78 | + Invalid UTF-8 string in BSON document |
| 79 | + |
| 80 | +.. _nodejs-utf-8-validation-scope: |
| 81 | + |
| 82 | +Set the Validation Scope |
| 83 | +~~~~~~~~~~~~~~~~~~~~~~~~ |
| 84 | + |
| 85 | +The ``enableUtf8Validation`` setting automatically applies to the scope of the |
| 86 | +object instance on which you included it, and any other objects created by |
| 87 | +calls on that instance. |
| 88 | + |
| 89 | +For example, if you include the option on the call to instantiate a database |
| 90 | +object, any collection instance you construct from that object inherits |
| 91 | +the setting. Any operations you call on that collection instance also |
| 92 | +inherit the setting. |
| 93 | + |
| 94 | +.. code-block:: javascript |
| 95 | + |
| 96 | + const database = client.db('books', { enableUtf8Validation: false }); |
| 97 | + |
| 98 | + // The collection inherits the UTF-8 validation disabled setting from the database |
| 99 | + const collection = database.collection('mystery'); |
| 100 | + |
| 101 | + // CRUD operation runs with UTF-8 validation disabled |
| 102 | + await collection.findOne({ title: 'Encyclopedia Brown' }); |
| 103 | + |
| 104 | +You can override the setting at any level of scope by including it when |
| 105 | +constructing the object instance or when calling an operation. |
| 106 | + |
| 107 | +For example, if you disable validation on the collection object, you can |
| 108 | +override the setting in individual CRUD operation calls on that |
| 109 | +collection. |
| 110 | + |
| 111 | +.. code-block:: javascript |
| 112 | + |
| 113 | + const collection = database.collection('mystery', { enableUtf8Validation: false }); |
| 114 | + |
| 115 | + // CRUD operation runs with UTF-8 validation enabled |
| 116 | + await collection.findOne({ title: 'Trixie Belden' }, { enableUtf8Validation: true }); |
| 117 | + |
| 118 | + // CRUD operation runs with UTF-8 validation disabled |
| 119 | + await collection.findOne({ title: 'Enola Holmes' }); |
| 120 | + |
0 commit comments