perf: Use charCodeAt instead of TextEncoder for improved UTF string comparison performance.

### Operating System

MacOS Ventura 13.2.1

### Environment (if applicable)

Node.js 18.19.1, Chrome 131.0.6778.266

### Firebase SDK Version

11.3.0

### Firebase SDK Product(s)

Firestore

### Project Tooling

Browser react app, Node.js Express server, Electron app with React frontend.

### Detailed Problem Description

We've recently migrated a number of systems from firebase-admin to the firebase client SDKs (internal browser-based dashboards, cloud functions, electron background process, etc) and noticed a major reduction in performance for batch Firestore write operations.

This is the flame graph for a 500 document batch write operation that takes ~868ms:
<img width="1624" alt="Image" src="https://github.com/user-attachments/assets/80a309df-15b6-4422-82e8-46567eb9b647" />

Unfortunately, we have some operations that need to run hundreds of thousands of writes, and repeat this 500 document batch write many times over. All of this time adds up significantly!

Zooming in a little, we saw that the culprit appears to be a TextEncoder.encode() call from a function called `compareUtf8Strings` that 1) takes some time to encode the characters, and 2) is quickly thrown out, causing a significant amount of garbage collection.
<img width="1624" alt="Image" src="https://github.com/user-attachments/assets/e4b29f28-d2d8-4b3f-9e86-b76e4bad50af" />

The current implementation uses TextEncoder to compare strings as UTF-8 integers in byte order:
```js

/** Compare strings in UTF-8 encoded byte order */
export function compareUtf8Strings(left: string, right: string): number {
  // Convert the string to UTF-8 encoded bytes
  const encodedLeft = newTextEncoder().encode(left);
  const encodedRight = newTextEncoder().encode(right);

  for (let i = 0; i < Math.min(encodedLeft.length, encodedRight.length); i++) {
    const comparison = primitiveComparator(encodedLeft[i], encodedRight[i]);
    if (comparison !== 0) {
      return comparison;
    }
  }

  return primitiveComparator(encodedLeft.length, encodedRight.length);
}
```

A very small change to use charCodeAt instead means we avoid creating thousands of new TextEncoder objects:

```js
/** Compare strings in UTF-16 encoded byte order */
export function compareUtf16Strings(left: string, right: string): number {
  for (let i = 0; i < Math.min(left.length, right.length); i++) {
    const comparison = primitiveComparator(left.charCodeAt(i), right.charCodeAt(i));
    if (comparison !== 0) {
      return comparison;
    }
  }
  return primitiveComparator(left.length, right.length);
}
```

And reduce the runtime of this operation from 868ms to 66ms!

<img width="1624" alt="Image" src="https://github.com/user-attachments/assets/81a106f7-d196-4810-b9e3-55e22ef42d08" />

To get these perf improvements, this change introduces one small, but hopefully inconsequential, change to the way this comparator behaves: `TextEncoder` converts the string to a `UInt8Array`, and compares by looking at the individual UTF-8 encoded bytes, but `charCodeAt` fetches the character as UTF-16, represented by a standard JS 64 bit float.

On first blush, I don't see how this will introduce any practical changes to the return value of this function – please correct me if I'm wrong! I'm not sure if the library was comparing strings as UTF-8 for any particular reason (Mirroring underlying database behavior? Downstream perf implications?), but if not, this may be a very easy win for client side SDK performance.

I've opened up a PR with this change at https://github.com/firebase/firebase-js-sdk/pull/8779. I'm boarding a flight right now, so will make sure tests all pass still once I'm back the ground, but wanted to open up this issue to kick off discussion.

### Steps and code to reproduce issue

See above.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Use charCodeAt instead of TextEncoder for improved UTF string comparison performance. #8778

Operating System

Environment (if applicable)

Firebase SDK Version

Firebase SDK Product(s)

Project Tooling

Detailed Problem Description

Steps and code to reproduce issue

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

perf: Use charCodeAt instead of TextEncoder for improved UTF string comparison performance. #8778

Description

Operating System

Environment (if applicable)

Firebase SDK Version

Firebase SDK Product(s)

Project Tooling

Detailed Problem Description

Steps and code to reproduce issue

Activity

google-oss-bot commented on Feb 9, 2025

dlarocque commented on Feb 10, 2025

milaGGL commented on Feb 10, 2025

amiller-gh commented on Feb 11, 2025

dlarocque commented on Feb 11, 2025

amiller-gh commented on Feb 11, 2025

dlarocque commented on Feb 12, 2025

milaGGL commented on Feb 14, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Participants

Issue actions