Skip to content

document vector indexes #19595

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

document vector indexes #19595

wants to merge 6 commits into from

Conversation

taroface
Copy link
Contributor

@taroface taroface commented May 9, 2025

@taroface taroface requested review from dikshant and andy-kimball May 9, 2025 05:16
Copy link

netlify bot commented May 9, 2025

Deploy Preview for cockroachdb-api-docs canceled.

Name Link
🔨 Latest commit 72c63d6
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-api-docs/deploys/681e4e70798edc000846b1b0

Copy link

netlify bot commented May 9, 2025

Deploy Preview for cockroachdb-interactivetutorials-docs canceled.

Name Link
🔨 Latest commit 72c63d6
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-interactivetutorials-docs/deploys/681e4e70c6f2cb00085006b7

Copy link

github-actions bot commented May 9, 2025

@taroface taroface requested a review from DrewKimball May 9, 2025 05:16
Copy link

netlify bot commented May 9, 2025

Netlify Preview

Name Link
🔨 Latest commit 72c63d6
🔍 Latest deploy log https://app.netlify.com/sites/cockroachdb-docs/deploys/681e4e702c43a30008aaaae5
😎 Deploy Preview https://deploy-preview-19595--cockroachdb-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

Copy link

@dikshant dikshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments.

@taroface
Copy link
Contributor Author

taroface commented May 9, 2025

@dikshant @andy-kimball TFTRs! I've incorporated your comments.

@taroface taroface requested a review from dikshant May 9, 2025 17:01
@taroface taroface requested a review from DrewKimball May 9, 2025 18:40
Copy link

@DrewKimball DrewKimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Nice work!

@taroface taroface requested a review from florence-crl May 9, 2025 19:02
Copy link

@dikshant dikshant left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM nice work! 🚀

Copy link
Contributor

@florence-crl florence-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm pending questions/suggestions
very informative

- {% include {{ page.version.version }}/sql/vector-batch-inserts.md %}
- Creating a vector index through a backfill disables mutations ([`INSERT`]({% link {{ page.version.version }}/insert.md %}), [`UPSERT`]({% link {{ page.version.version }}/upsert.md %}), [`UPDATE`]({% link {{ page.version.version }}/update.md %}), [`DELETE`]({% link {{ page.version.version }}/delete.md %})) on the table. [#144443](https://github.com/cockroachdb/cockroach/issues/144443)
- `IMPORT INTO` is not supported on tables with vector indexes. You can import the vectors first and create the index after import is complete. [#145227](https://github.com/cockroachdb/cockroach/issues/145227)
- Only L2 distance (`<->`) searches are accelerated. [#144016](https://github.com/cockroachdb/cockroach/issues/144016)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by "accelerated"? Should "accelerated" be "supported"?

@@ -88,7 +90,10 @@ SELECT category, vector FROM items WHERE category = 'electronics' ORDER BY vecto
electronics | [0.9,0.1,0]
~~~

You can use a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) to make searches on large numbers of high-dimensional [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) rows more efficient.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove link to same page

Suggested change
You can use a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) to make searches on large numbers of high-dimensional [`VECTOR`]({% link {{ page.version.version }}/vector.md %}) rows more efficient.
You can use a [vector index]({% link {{ page.version.version }}/vector-indexes.md %}) to make searches on large numbers of high-dimensional `VECTOR` rows more efficient.

(14 rows)
~~~

We also have other resources on indexes:
[Learn more about indexes]({% link {{ page.version.version }}/indexes.md %}).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This sentence does not seem necessary here and could be removed. Maybe "Secondary indexes" in line 262 should have a link to https://www.cockroachlabs.com/docs/v25.2/schema-design-indexes.

Comment on lines +315 to +319
CREATE TABLE items (
id uuid DEFAULT gen_random_uuid(),
embedding VECTOR (1536),
VECTOR INDEX (embedding)
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If cluster setting is not enabled, it gives an error.

Suggested change
CREATE TABLE items (
id uuid DEFAULT gen_random_uuid(),
embedding VECTOR (1536),
VECTOR INDEX (embedding)
);
SET CLUSTER SETTING feature.vector_index.enabled = true;
CREATE TABLE items (
id uuid DEFAULT gen_random_uuid(),
embedding VECTOR (1536),
VECTOR INDEX (embedding)
);


### Specify an opclass

You can optionally specify an opclass. If not specified, the default is `vector_l2_ops`:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For readers like me who do not know what an opclass is:

Suggested change
You can optionally specify an opclass. If not specified, the default is `vector_l2_ops`:
You can optionally specify an opclass (short for operator class) that defines how a `VECTOR` data type is handled by the index. If not specified, the default is `vector_l2_ops`:


Vector indexes on `VECTOR` columns support the following comparison operator:

- **L2 distance**: [`<->`]({% link {{ page.version.version }}/functions-and-operators.md %}#operators)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might be a more informative link:

Suggested change
- **L2 distance**: [`<->`]({% link {{ page.version.version }}/functions-and-operators.md %}#operators)
- **L2 distance**: [`<->`]({% link {{ page.version.version }}/vector.md %}#syntax)


Partition size and beam size interact to control both the precision of nearest neighbor search and the cost of maintaining the index. You can improve the accuracy of vector searches by increasing either the search beam size or partition size:

- A larger search beam improves accuracy by exploring more partitions, which increases the number of candidate vectors evaluated.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For parallelism:

Suggested change
- A larger search beam improves accuracy by exploring more partitions, which increases the number of candidate vectors evaluated.
- A larger search beam size improves accuracy by exploring more partitions, which increases the number of candidate vectors evaluated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants