backupccl: OOM while restoring backup in 22.2

While working on a roachtest (#103228), I saw a `RESTORE` fail because a couple of nodes went OOM. The backup was taken using the following command:

>```BACKUP INTO 'gs://cockroachdb-backup-testing/9_22.2.9-to-current_cluster_all-planned-and-executed-on-random-node_X4iV?AUTH=implicit' AS OF SYSTEM TIME '1684255935229011892.0000000000' WITH detached, encryption_passphrase = 'kvxN1Tmlwg0OesQw86rg8xjhsQdKBdHFZ7'```

Worth noting about this backup (may or may not be relevant):

* it was taken while the cluster was in mixed-version state.
* an incremental backup was taken shortly (30s) after the full backup finished (for full logs, see [1]).
* both jobs were paused and resumed a couple of times while they ran.
* the backup does _not_ include revision_history.

At a certain point in the test, we attempt to restore this backup on a 4-node cluster on v22.2.9. This failed because two of the nodes went OOM a few minutes after the `RESTORE` statement:

![Screenshot 2023-05-16 at 6 07 37 PM](https://github.com/cockroachdb/cockroach/assets/103441181/50ab46ca-c05d-4a0d-b59a-0d209488e704)

This backup does not contain a lot of data. The biggest table has ~2GiB of data in it:

```console
$ ./cockroach sql --insecure -e "SELECT database_name, parent_schema_name, object_name, size_bytes FROM [SHOW BACKUP LATEST IN 'gs://cockroach-tmp/backup_issue_22_2_oom/9_22.2.9-to-current_cluster_all-planned-and-executed-on-random-node_X4iV?AUTH=implicit' WITH check_files, encryption_passphrase = 'kvxN1Tmlwg0OesQw86rg8xjhsQdKBdHFZ7'] ORDER BY size_bytes DESC LIMIT 5"
database_name   parent_schema_name      object_name     size_bytes
tpcc    public  stock   3217512461
tpcc    public  order_line      1858665525
tpcc    public  customer        1848283823
bank    public  bank    1310901452
restore_1_22_2_9_to_current_database_bank_before_upgrade_in_22_2_9_1    public  bank    1310899634
```

More importantly, very similar backups in other tests can be successfully restored in 22.2, so I think something went wrong with this particular backup.

### Reproduction

The issue can be very easily reproduced by attempting to restore this backup on a 22.2 cluster (I have since moved the backup to a bucket with longer TTL [2]). This happens even on a completely empty cluster, with no workloads running.

The commands below will create a node with 14GiB of memory, just like the nodes in the failed test.

```console
$ roachprod create -n 1 $CLUSTER
$ roachprod stage $CLUSTER release v22.2.9
$ roachprod start $CLUSTER
$ roachprod ssh $CLUSTER
...
ubuntu@CLUSTER $ time ./cockroach sql --insecure -e "RESTORE FROM LATEST IN 'gs://cockroach-tmp/backup_issue_22_2_oom/9_22.2.9-to-current_cluster_all-planned-and-executed-on-random-node_X4iV?AUTH=implicit' WITH encryption_passphrase = 'kvxN1Tmlwg0OesQw86rg8xjhsQdKBdHFZ7';"
ERROR: connection lost.

ERROR: -e: unexpected EOF
Failed running "sql"

real    2m46.609s
user    0m0.623s
sys     0m0.280s
```

Finally, note that this **does not** happen on master or 23.1.1.

[1] [roachtest artifacts](https://console.cloud.google.com/storage/browser/cockroach-tmp/backup_issue_22_2_oom/roachtest_artifacts?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cockroach-shared&prefix=&forceOnObjectsSortingFiltering=false)
[2] [9_22.2.9-to-current_cluster_all-planned-and-executed-on-random-node_X4iV](https://console.cloud.google.com/storage/browser/cockroach-tmp/backup_issue_22_2_oom/9_22.2.9-to-current_cluster_all-planned-and-executed-on-random-node_X4iV?pageState=(%22StorageObjectListTable%22:(%22f%22:%22%255B%255D%22))&project=cockroach-shared&prefix=&forceOnObjectsSortingFiltering=false)

Jira issue: CRDB-28023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

backupccl: OOM while restoring backup in 22.2 #103481

Reproduction

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

backupccl: OOM while restoring backup in 22.2 #103481

Description

Reproduction

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions