Skip to content

Commit ab91500

Browse files
document vector indexes (#19595)
* document vector indexes --------- Co-authored-by: Florence Morris <[email protected]>
1 parent 19e613b commit ab91500

File tree

10 files changed

+443
-24
lines changed

10 files changed

+443
-24
lines changed
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
- {% include {{ page.version.version }}/sql/vector-batch-inserts.md %}
2+
- Creating a vector index through a backfill disables mutations ([`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`DELETE`]({% link {{ page.version.version }}/delete.md %})) on the table. [#144443](https://github.com/cockroachdb/cockroach/issues/144443)
3+
- `IMPORT INTO` is not supported on tables with vector indexes. You can import the vectors first and create the index after import is complete. [#145227](https://github.com/cockroachdb/cockroach/issues/145227)
4+
- Only L2 distance (`<->`) searches are accelerated. [#144016](https://github.com/cockroachdb/cockroach/issues/144016)
5+
- Index acceleration with filters is only supported if the filters match prefix columns. [#146145](https://github.com/cockroachdb/cockroach/issues/146145)
6+
- Index recommendations are not provided for vector indexes. [#146146](https://github.com/cockroachdb/cockroach/issues/146146)
7+
- Vector index queries may return incorrect results when the underlying table uses multiple column families. [#146046](https://github.com/cockroachdb/cockroach/issues/146046)
8+
- Queries against a vector index may ignore filter conditions (e.g., a `WHERE` clause) when multiple vector indexes exist on the same `VECTOR` column, and one has a prefix column. [#146257](https://github.com/cockroachdb/cockroach/issues/146257)

src/current/_includes/v25.2/misc/session-vars.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -90,6 +90,7 @@
9090
| <a id="transaction-timeout"></a> `transaction_timeout` | Aborts an explicit [transaction]({% link {{ page.version.version }}/transactions.md %}) when it runs longer than the configured duration. Stored in milliseconds; can be expressed in milliseconds or as an [`INTERVAL`]({% link {{ page.version.version }}/interval.md %}). | `0` | Yes | Yes |
9191
| <a id="troubleshooting_mode_enabled"></a> `troubleshooting_mode_enabled` | When enabled, avoid performing additional work on queries, such as collecting and emitting telemetry data. This session variable is particularly useful when the cluster is experiencing issues, unavailability, or failure. | `off` | Yes | Yes |
9292
| <a id="use_declarative_schema_changer"></a> `use_declarative_schema_changer` | Whether to use the declarative schema changer for supported statements. | `on` | Yes | Yes |
93+
| <a id="vector-search-beam-size"></a> `vector_search_beam_size` | The size of the vector search beam, which determines how many vector partitions are considered during query execution. For details, refer to [Tune vector indexes]({% link {{ page.version.version }}/vector-indexes.md %}#tune-vector-indexes). | `32` | Yes | Yes |
9394
| <a id="vectorize"></a> `vectorize` | The vectorized execution engine mode. Options include `on` and `off`. For more details, see [Configure vectorized execution for CockroachDB]({% link {{ page.version.version }}/vectorized-execution.md %}#configure-vectorized-execution). | `on` | Yes | Yes |
9495
| <a id="virtual_cluster_name"></a> `virtual_cluster_name` | The name of the virtual cluster that the SQL client is connected to. | Session-dependent | No | Yes |
9596

src/current/_includes/v25.2/sidebar-data/schema-design.json

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -93,6 +93,12 @@
9393
"urls": [
9494
"/${VERSION}/spatial-indexes.html"
9595
]
96+
},
97+
{
98+
"title": "Vector Indexes",
99+
"urls": [
100+
"/${VERSION}/vector-indexes.html"
101+
]
96102
}
97103
]
98104
},
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Large batch inserts of [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) types can cause performance degradation. When inserting vectors, batching should be avoided. For an example, refer to [Create and query a vector index]({% link {{ page.version.version }}/vector-indexes.md %}#create-and-query-a-vector-index).

src/current/v25.2/create-index.md

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,10 +44,11 @@ Parameter | Description
4444
----------|------------
4545
`UNIQUE` | Apply the [`UNIQUE` constraint]({% link {{ page.version.version }}/unique.md %}) to the indexed columns.<br><br>This causes the system to check for existing duplicate values on index creation. It also applies the `UNIQUE` constraint at the table level, so the system checks for duplicate values when inserting or updating data.
4646
`INVERTED` | Create a [GIN index]({% link {{ page.version.version }}/inverted-indexes.md %}) on the schemaless data in the specified [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.<br><br> You can also use the PostgreSQL-compatible syntax `USING GIN`. For more details, see [GIN Indexes]({% link {{ page.version.version }}/inverted-indexes.md %}#creation).
47+
`VECTOR` | Create a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) on the specifed [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) column.<br><br>For more details, refer to [Vector Indexes]({% link {{ page.version.version }}/vector-indexes.md %}).
4748
`IF NOT EXISTS` | Create a new index only if an index of the same name does not already exist; if one does exist, do not return an error.
4849
`opt_index_name`<br>`index_name` | The name of the index to create, which must be unique to its table and follow these [identifier rules]({% link {{ page.version.version }}/keywords-and-identifiers.md %}#identifiers).<br><br>If you do not specify a name, CockroachDB uses the format `<table>_<columns>_key/idx`. `key` indicates the index applies the `UNIQUE` constraint; `idx` indicates it does not. Example: `accounts_balance_idx`
4950
`table_name` | The name of the table you want to create the index on.
50-
`USING name` | An optional clause for compatibility with third-party tools. Accepted values for `name` are `btree`, `gin`, and `gist`, with `btree` for a standard secondary index, `gin` as the PostgreSQL-compatible syntax for a [GIN index](#create-gin-indexes), and `gist` for a [spatial index]({% link {{ page.version.version }}/spatial-indexes.md %}).
51+
`USING name` | An optional clause for compatibility with third-party tools. Accepted values for `name` are `btree`, `gin`, and `gist`, with `btree` for a standard secondary index, `gin` as the PostgreSQL-compatible syntax for a [GIN index](#create-gin-indexes), `gist` for a [spatial index]({% link {{ page.version.version }}/spatial-indexes.md %}), and `cspann` for a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}). `hnsw` is aliased to `cspann` for compatibility with [`pgvector`](https://github.com/pgvector/pgvector) syntax.
5152
`name` | The name of the column you want to index. For [multi-region tables]({% link {{ page.version.version }}/multiregion-overview.md %}#table-localities), you can use the `crdb_region` column within the index in the event the original index may contain non-unique entries across multiple, unique regions.
5253
`ASC` or `DESC`| Sort the column in ascending (`ASC`) or descending (`DESC`) order in the index. How columns are sorted affects query results, particularly when using `LIMIT`.<br><br>__Default:__ `ASC`
5354
`STORING ...`| Store (but do not sort) each column whose name you include.<br><br>For information on when to use `STORING`, see [Store Columns](#store-columns). Note that columns that are part of a table's [`PRIMARY KEY`]({% link {{ page.version.version }}/primary-key.md %}) cannot be specified as `STORING` columns in secondary indexes on the table.<br><br>`COVERING` and `INCLUDE` are aliases for `STORING` and work identically.
@@ -175,6 +176,17 @@ CREATE INDEX geom_idx_2
175176
Most users should not change the default spatial index settings. There is a risk that you will get worse performance by changing the default settings. For more information , see [Spatial indexes]({% link {{ page.version.version }}/spatial-indexes.md %}).
176177
{{site.data.alerts.end}}
177178

179+
### Create vector indexes
180+
181+
{% include_cached new-in.html version="v25.2" %} You can create [vector indexes]({% link {{ page.version.version }}/vector-indexes.md %}) on [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) columns.
182+
183+
To create a vector index on a `VECTOR` column named `embedding`:
184+
185+
{% include_cached copy-clipboard.html %}
186+
~~~ sql
187+
CREATE VECTOR INDEX ON items (embedding);
188+
~~~
189+
178190
### Store columns
179191

180192
Storing a column improves the performance of queries that retrieve (but do not filter) its values.

src/current/v25.2/create-table.md

Lines changed: 53 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -259,7 +259,7 @@ For performance recommendations on primary keys, see the [Schema Design: Create
259259

260260
### Create a table with secondary and GIN indexes
261261

262-
In this example, we create secondary and GIN indexes during table creation. Secondary indexes allow efficient access to data with keys other than the primary key. [GIN indexes]({% link {{ page.version.version }}/inverted-indexes.md %}) allow efficient access to the schemaless data in a [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.
262+
In this example, we create secondary and GIN indexes during table creation. [Secondary indexes]({% link {{ page.version.version }}/schema-design-indexes.md %}) allow efficient access to data with keys other than the primary key. [GIN indexes]({% link {{ page.version.version }}/inverted-indexes.md %}) allow efficient access to the schemaless data in a [`JSONB`]({% link {{ page.version.version }}/jsonb.md %}) column.
263263

264264
{% include_cached copy-clipboard.html %}
265265
~~~ sql
@@ -285,29 +285,61 @@ In this example, we create secondary and GIN indexes during table creation. Seco
285285
~~~
286286

287287
~~~
288-
table_name | index_name | non_unique | seq_in_index | column_name | direction | storing | implicit
289-
-------------+----------------+------------+--------------+-------------+-----------+---------+-----------
290-
vehicles | index_status | true | 1 | status | ASC | false | false
291-
vehicles | index_status | true | 2 | city | ASC | false | true
292-
vehicles | index_status | true | 3 | id | ASC | false | true
293-
vehicles | ix_vehicle_ext | true | 1 | ext | ASC | false | false
294-
vehicles | ix_vehicle_ext | true | 2 | city | ASC | false | true
295-
vehicles | ix_vehicle_ext | true | 3 | id | ASC | false | true
296-
vehicles | vehicles_pkey | false | 1 | city | ASC | false | false
297-
vehicles | vehicles_pkey | false | 2 | id | ASC | false | false
298-
vehicles | vehicles_pkey | false | 3 | type | N/A | true | false
299-
vehicles | vehicles_pkey | false | 4 | owner_id | N/A | true | false
300-
vehicles | vehicles_pkey | false | 5 | creation_time | N/A | true | false
301-
vehicles | vehicles_pkey | false | 6 | status | N/A | true | false
302-
vehicles | vehicles_pkey | false | 7 | current_location | N/A | true | false
303-
vehicles | vehicles_pkey | false | 8 | ext | N/A | true | false
288+
table_name | index_name | non_unique | seq_in_index | column_name | definition | direction | storing | implicit | visible | visibility
289+
-------------+----------------+------------+--------------+------------------+------------------+-----------+---------+----------+---------+-------------
290+
vehicles | index_status | t | 1 | status | status | ASC | f | f | t | 1
291+
vehicles | index_status | t | 2 | city | city | ASC | f | t | t | 1
292+
vehicles | index_status | t | 3 | id | id | ASC | f | t | t | 1
293+
vehicles | ix_vehicle_ext | t | 1 | ext | ext | ASC | f | f | t | 1
294+
vehicles | ix_vehicle_ext | t | 2 | city | city | ASC | f | t | t | 1
295+
vehicles | ix_vehicle_ext | t | 3 | id | id | ASC | f | t | t | 1
296+
vehicles | primary | f | 1 | city | city | ASC | f | f | t | 1
297+
vehicles | primary | f | 2 | id | id | ASC | f | f | t | 1
298+
vehicles | primary | f | 3 | type | type | N/A | t | f | t | 1
299+
vehicles | primary | f | 4 | owner_id | owner_id | N/A | t | f | t | 1
300+
vehicles | primary | f | 5 | creation_time | creation_time | N/A | t | f | t | 1
301+
vehicles | primary | f | 6 | status | status | N/A | t | f | t | 1
302+
vehicles | primary | f | 7 | current_location | current_location | N/A | t | f | t | 1
303+
vehicles | primary | f | 8 | ext | ext | N/A | t | f | t | 1
304304
(14 rows)
305305
~~~
306306

307-
We also have other resources on indexes:
307+
### Create a table with a vector index
308308

309-
- Create indexes for existing tables using [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %}).
310-
- [Learn more about indexes]({% link {{ page.version.version }}/indexes.md %}).
309+
Enable vector indexes:
310+
311+
{% include_cached copy-clipboard.html %}
312+
~~~ sql
313+
SET CLUSTER SETTING feature.vector_index.enabled = true;
314+
~~~
315+
316+
The following statement creates a table with a [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) column, along with a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) that makes vector search efficient.
317+
318+
{% include_cached copy-clipboard.html %}
319+
~~~ sql
320+
CREATE TABLE items (
321+
id uuid DEFAULT gen_random_uuid(),
322+
embedding VECTOR (1536),
323+
VECTOR INDEX (embedding)
324+
);
325+
~~~
326+
327+
{% include_cached copy-clipboard.html %}
328+
~~~ sql
329+
SHOW INDEX FROM items;
330+
~~~
331+
332+
{% include_cached copy-clipboard.html %}
333+
~~~
334+
table_name | index_name | non_unique | seq_in_index | column_name | definition | direction | storing | implicit | visible | visibility
335+
-------------+----------------------+------------+--------------+-------------+------------+-----------+---------+----------+---------+-------------
336+
items2 | items2_embedding_idx | t | 1 | embedding | embedding | ASC | f | f | t | 1
337+
items2 | items2_embedding_idx | t | 2 | rowid | rowid | ASC | f | t | t | 1
338+
items2 | items2_pkey | f | 1 | rowid | rowid | ASC | f | f | t | 1
339+
items2 | items2_pkey | f | 2 | id | id | N/A | t | f | t | 1
340+
items2 | items2_pkey | f | 3 | embedding | embedding | N/A | t | f | t | 1
341+
(5 rows)
342+
~~~
311343

312344
### Create a table with auto-generated unique row IDs
313345

@@ -973,6 +1005,7 @@ To set `exclude_data_from_backup` on an existing table, see the [Exclude a table
9731005

9741006
## See also
9751007

1008+
- [`CREATE INDEX`]({% link {{ page.version.version }}/create-index.md %})
9761009
- [`INSERT`]({% link {{ page.version.version }}/insert.md %})
9771010
- [`ALTER TABLE`]({% link {{ page.version.version }}/alter-table.md %})
9781011
- [`DELETE`]({% link {{ page.version.version }}/delete.md %})

src/current/v25.2/indexes.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,7 @@ For an example that uses unique indexes but applies to all indexes on `REGIONAL
158158
- [GIN Indexes]({% link {{ page.version.version }}/inverted-indexes.md %})
159159
- [Partial Indexes]({% link {{ page.version.version }}/partial-indexes.md %})
160160
- [Spatial Indexes]({% link {{ page.version.version }}/spatial-indexes.md %})
161+
- [Vector Indexes]({% link {{ page.version.version }}/vector-indexes.md %})
161162
- [Hash-sharded Indexes]({% link {{ page.version.version }}/hash-sharded-indexes.md %})
162163
- [Expression Indexes]({% link {{ page.version.version }}/expression-indexes.md %})
163164
- [Select from a specific index]({% link {{ page.version.version }}/select-clause.md %}#select-from-a-specific-index)

src/current/v25.2/known-limitations.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,10 @@ docs_area: releases
1212

1313
This section describes newly identified limitations in CockroachDB {{ page.version.version }}.
1414

15+
### Vector indexes
16+
17+
{% include {{ page.version.version }}/known-limitations/vector-limitations.md %}
18+
1519
### JSONPath
1620

1721
{% include {{ page.version.version }}/known-limitations/jsonpath-limitations.md %}

0 commit comments

Comments
 (0)