Skip to content

document vector indexes #19595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
May 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- {% include {{ page.version.version }}/sql/vector-batch-inserts.md %}
- Creating a vector index through a backfill disables mutations ([`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`DELETE`]({% link {{ page.version.version }}/delete.md %})) on the table. [#144443](https://github.com/cockroachdb/cockroach/issues/144443)
- `IMPORT INTO` is not supported on tables with vector indexes. You can import the vectors first and create the index after import is complete. [#145227](https://github.com/cockroachdb/cockroach/issues/145227)
- Only L2 distance (`<->`) searches are accelerated. [#144016](https://github.com/cockroachdb/cockroach/issues/144016)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "accelerated"? Should "accelerated" be "supported"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Accelerated means that the index makes these types of searches faster. This is the correct term because VECTOR supports more operators than L2, but this is the only one that a vector index currently benefits.

- Index acceleration with filters is only supported if the filters match prefix columns. [#146145](https://github.com/cockroachdb/cockroach/issues/146145)
- Index recommendations are not provided for vector indexes. [#146146](https://github.com/cockroachdb/cockroach/issues/146146)
- Vector index queries may return incorrect results when the underlying table uses multiple column families. [#146046](https://github.com/cockroachdb/cockroach/issues/146046)
- Queries against a vector index may ignore filter conditions (e.g., a `WHERE` clause) when multiple vector indexes exist on the same `VECTOR` column, and one has a prefix column. [#146257](https://github.com/cockroachdb/cockroach/issues/146257)
1 change: 1 addition & 0 deletions src/current/_includes/v25.2/misc/session-vars.md
Original file line number Diff line number Diff line change
Expand Up @@ -90,6 +90,7 @@
| <a id="transaction-timeout"></a> `transaction_timeout` | Aborts an explicit [transaction]({% link {{ page.version.version }}/transactions.md %}) when it runs longer than the configured duration. Stored in milliseconds; can be expressed in milliseconds or as an [`INTERVAL`]({% link {{ page.version.version }}/interval.md %}). | `0` | Yes | Yes |
| <a id="troubleshooting_mode_enabled"></a> `troubleshooting_mode_enabled` | When enabled, avoid performing additional work on queries, such as collecting and emitting telemetry data. This session variable is particularly useful when the cluster is experiencing issues, unavailability, or failure. | `off` | Yes | Yes |
| <a id="use_declarative_schema_changer"></a> `use_declarative_schema_changer` | Whether to use the declarative schema changer for supported statements. | `on` | Yes | Yes |
| <a id="vector-search-beam-size"></a> `vector_search_beam_size` | The size of the vector search beam, which determines how many vector partitions are considered during query execution. For details, refer to [Tune vector indexes]({% link {{ page.version.version }}/vector-indexes.md %}#tune-vector-indexes). | `32` | Yes | Yes |
| <a id="vectorize"></a> `vectorize` | The vectorized execution engine mode. Options include `on` and `off`. For more details, see [Configure vectorized execution for CockroachDB]({% link {{ page.version.version }}/vectorized-execution.md %}#configure-vectorized-execution). | `on` | Yes | Yes |
| <a id="virtual_cluster_name"></a> `virtual_cluster_name` | The name of the virtual cluster that the SQL client is connected to. | Session-dependent | No | Yes |

Expand Down
6 changes: 6 additions & 0 deletions src/current/_includes/v25.2/sidebar-data/schema-design.json
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,12 @@
"urls": [
"/${VERSION}/spatial-indexes.html"
]
},
{
"title": "Vector Indexes",
"urls": [
"/${VERSION}/vector-indexes.html"
]
}
]
},
Expand Down
1 change: 1 addition & 0 deletions src/current/_includes/v25.2/sql/vector-batch-inserts.md
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Large batch inserts of [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) types can cause performance degradation. When inserting vectors, batching should be avoided. For an example, refer to [Create and query a vector index]({% link {{ page.version.version }}/vector-indexes.md %}#create-and-query-a-vector-index).
14 changes: 13 additions & 1 deletion src/current/v25.2/create-index.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,11 @@ Parameter | Description
----------|------------
`UNIQUE` | Apply the [`UNIQUE` constraint]({% link {{ page.version.version }}/unique.md %}) to the indexed columns.<br><br>This causes the system to check for existing duplicate values on index creation. It also applies the `UNIQUE` constraint at the table level, so the system checks for duplicate values when inserting or updating data.
`INVERTED` | Create a [GIN index]({% link {{ page.version.version }}/inverted-indexes.md %}) on the schemaless data in the specified [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.<br><br> You can also use the PostgreSQL-compatible syntax `USING GIN`. For more details, see [GIN Indexes]({% link {{ page.version.version }}/inverted-indexes.md %}#creation).
`VECTOR` | Create a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) on the specifed [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) column.<br><br>For more details, refer to [Vector Indexes]({% link {{ page.version.version }}/vector-indexes.md %}).
`IF NOT EXISTS` | Create a new index only if an index of the same name does not already exist; if one does exist, do not return an error.
`opt_index_name`<br>`index_name` | The name of the index to create, which must be unique to its table and follow these [identifier rules]({% link {{ page.version.version }}/keywords-and-identifiers.md %}#identifiers).<br><br>If you do not specify a name, CockroachDB uses the format `<table>_<columns>_key/idx`. `key` indicates the index applies the `UNIQUE` constraint; `idx` indicates it does not. Example: `accounts_balance_idx`
`table_name` | The name of the table you want to create the index on.
`USING name` | An optional clause for compatibility with third-party tools. Accepted values for `name` are `btree`, `gin`, and `gist`, with `btree` for a standard secondary index, `gin` as the PostgreSQL-compatible syntax for a [GIN index](#create-gin-indexes), and `gist` for a [spatial index]({% link {{ page.version.version }}/spatial-indexes.md %}).
`USING name` | An optional clause for compatibility with third-party tools. Accepted values for `name` are `btree`, `gin`, and `gist`, with `btree` for a standard secondary index, `gin` as the PostgreSQL-compatible syntax for a [GIN index](#create-gin-indexes), `gist` for a [spatial index]({% link {{ page.version.version }}/spatial-indexes.md %}), and `cspann` for a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}). `hnsw` is aliased to `cspann` for compatibility with [`pgvector`](https://github.com/pgvector/pgvector) syntax.
`name` | The name of the column you want to index. For [multi-region tables]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), you can use the `crdb_region` column within the index in the event the original index may contain non-unique entries across multiple, unique regions.
`ASC` or `DESC`| Sort the column in ascending (`ASC`) or descending (`DESC`) order in the index. How columns are sorted affects query results, particularly when using `LIMIT`.<br><br>__Default:__ `ASC`
`STORING ...`| Store (but do not sort) each column whose name you include.<br><br>For information on when to use `STORING`, see [Store Columns](#store-columns). Note that columns that are part of a table's [`PRIMARY KEY`]({% link {{ page.version.version }}/primary-key.md %}) cannot be specified as `STORING` columns in secondary indexes on the table.<br><br>`COVERING` and `INCLUDE` are aliases for `STORING` and work identically.
Expand Down Expand Up @@ -175,6 +176,17 @@ CREATE INDEX geom_idx_2
Most users should not change the default spatial index settings. There is a risk that you will get worse performance by changing the default settings. For more information , see [Spatial indexes]({% link {{ page.version.version }}/spatial-indexes.md %}).
{{site.data.alerts.end}}

### Create vector indexes

{% include_cached new-in.html version="v25.2" %} You can create [vector indexes]({% link {{ page.version.version }}/vector-indexes.md %}) on [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) columns.

To create a vector index on a `VECTOR` column named `embedding`:

{% include_cached copy-clipboard.html %}
~~~ sql
CREATE VECTOR INDEX ON items (embedding);
~~~

### Store columns

Storing a column improves the performance of queries that retrieve (but do not filter) its values.
Expand Down
73 changes: 53 additions & 20 deletions src/current/v25.2/create-table.md
Original file line number Diff line number Diff line change
Expand Up @@ -259,7 +259,7 @@ For performance recommendations on primary keys, see the [Schema Design: Create

### Create a table with secondary and GIN indexes

In this example, we create secondary and GIN indexes during table creation. Secondary indexes allow efficient access to data with keys other than the primary key. [GIN indexes]({% link {{ page.version.version }}/inverted-indexes.md %}) allow efficient access to the schemaless data in a [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.
In this example, we create secondary and GIN indexes during table creation. [Secondary indexes]({% link {{ page.version.version }}/schema-design-indexes.md %}) allow efficient access to data with keys other than the primary key. [GIN indexes]({% link {{ page.version.version }}/inverted-indexes.md %}) allow efficient access to the schemaless data in a [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.

{% include_cached copy-clipboard.html %}
~~~ sql
Expand All @@ -285,29 +285,61 @@ In this example, we create secondary and GIN indexes during table creation. Seco
~~~

~~~
table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
-------------+----------------+------------+--------------+-------------+-----------+---------+-----------
vehicles | index_status | true | 1 | status | ASC | false | false
vehicles | index_status | true | 2 | city | ASC | false | true
vehicles | index_status | true | 3 | id | ASC | false | true
vehicles | ix_vehicle_ext | true | 1 | ext | ASC | false | false
vehicles | ix_vehicle_ext | true | 2 | city | ASC | false | true
vehicles | ix_vehicle_ext | true | 3 | id | ASC | false | true
vehicles | vehicles_pkey | false | 1 | city | ASC | false | false
vehicles | vehicles_pkey | false | 2 | id | ASC | false | false
vehicles | vehicles_pkey | false | 3 | type | N/A | true | false
vehicles | vehicles_pkey | false | 4 | owner_id | N/A | true | false
vehicles | vehicles_pkey | false | 5 | creation_time | N/A | true | false
vehicles | vehicles_pkey | false | 6 | status | N/A | true | false
vehicles | vehicles_pkey | false | 7 | current_location | N/A | true | false
vehicles | vehicles_pkey | false | 8 | ext | N/A | true | false
table_name | index_name | non_unique | seq_in_index | column_name | definition | direction | storing | implicit | visible | visibility
-------------+----------------+------------+--------------+------------------+------------------+-----------+---------+----------+---------+-------------
vehicles | index_status | t | 1 | status | status | ASC | f | f | t | 1
vehicles | index_status | t | 2 | city | city | ASC | f | t | t | 1
vehicles | index_status | t | 3 | id | id | ASC | f | t | t | 1
vehicles | ix_vehicle_ext | t | 1 | ext | ext | ASC | f | f | t | 1
vehicles | ix_vehicle_ext | t | 2 | city | city | ASC | f | t | t | 1
vehicles | ix_vehicle_ext | t | 3 | id | id | ASC | f | t | t | 1
vehicles | primary | f | 1 | city | city | ASC | f | f | t | 1
vehicles | primary | f | 2 | id | id | ASC | f | f | t | 1
vehicles | primary | f | 3 | type | type | N/A | t | f | t | 1
vehicles | primary | f | 4 | owner_id | owner_id | N/A | t | f | t | 1
vehicles | primary | f | 5 | creation_time | creation_time | N/A | t | f | t | 1
vehicles | primary | f | 6 | status | status | N/A | t | f | t | 1
vehicles | primary | f | 7 | current_location | current_location | N/A | t | f | t | 1
vehicles | primary | f | 8 | ext | ext | N/A | t | f | t | 1
(14 rows)
~~~

We also have other resources on indexes:
### Create a table with a vector index

- Create indexes for existing tables using [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %}).
- [Learn more about indexes]({% link {{ page.version.version }}/indexes.md %}).
Enable vector indexes:

{% include_cached copy-clipboard.html %}
~~~ sql
SET CLUSTER SETTING feature.vector_index.enabled = true;
~~~

The following statement creates a table with a [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) column, along with a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) that makes vector search efficient.

{% include_cached copy-clipboard.html %}
~~~ sql
CREATE TABLE items (
id uuid DEFAULT gen_random_uuid(),
embedding VECTOR (1536),
VECTOR INDEX (embedding)
);
~~~

{% include_cached copy-clipboard.html %}
~~~ sql
SHOW INDEX FROM items;
~~~

{% include_cached copy-clipboard.html %}
~~~
table_name | index_name | non_unique | seq_in_index | column_name | definition | direction | storing | implicit | visible | visibility
-------------+----------------------+------------+--------------+-------------+------------+-----------+---------+----------+---------+-------------
items2 | items2_embedding_idx | t | 1 | embedding | embedding | ASC | f | f | t | 1
items2 | items2_embedding_idx | t | 2 | rowid | rowid | ASC | f | t | t | 1
items2 | items2_pkey | f | 1 | rowid | rowid | ASC | f | f | t | 1
items2 | items2_pkey | f | 2 | id | id | N/A | t | f | t | 1
items2 | items2_pkey | f | 3 | embedding | embedding | N/A | t | f | t | 1
(5 rows)
~~~

### Create a table with auto-generated unique row IDs

Expand Down Expand Up @@ -973,6 +1005,7 @@ To set `exclude_data_from_backup` on an existing table, see the [Exclude a table

## See also

- [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %})
- [`INSERT`]({% link {{ page.version.version }}/insert.md %})
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
- [`DELETE`]({% link {{ page.version.version }}/delete.md %})
Expand Down
1 change: 1 addition & 0 deletions src/current/v25.2/indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,7 @@ For an example that uses unique indexes but applies to all indexes on `REGIONAL
- [GIN Indexes]({% link {{ page.version.version }}/inverted-indexes.md %})
- [Partial Indexes]({% link {{ page.version.version }}/partial-indexes.md %})
- [Spatial Indexes]({% link {{ page.version.version }}/spatial-indexes.md %})
- [Vector Indexes]({% link {{ page.version.version }}/vector-indexes.md %})
- [Hash-sharded Indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %})
- [Expression Indexes]({% link {{ page.version.version }}/expression-indexes.md %})
- [Select from a specific index]({% link {{ page.version.version }}/select-clause.md %}#select-from-a-specific-index)
Expand Down
4 changes: 4 additions & 0 deletions src/current/v25.2/known-limitations.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,10 @@ docs_area: releases

This section describes newly identified limitations in CockroachDB {{ page.version.version }}.

### Vector indexes

{% include {{ page.version.version }}/known-limitations/vector-limitations.md %}

### JSONPath

{% include {{ page.version.version }}/known-limitations/jsonpath-limitations.md %}
Expand Down
Loading
Loading