docs: update Elasticsearch strategy names (#21530)

maxjakob · Erick Friis · hinthornw · commit ac5cf321e974 · 2024-06-20T13:52:15.000-07:00
Update documentation with the [new names for retrieval strategies](langchain-ai/langchain-elastic#22) --------- Co-authored-by: Erick Friis <erick@langchain.dev>
diff --git a/docs/docs/integrations/vectorstores/elasticsearch.ipynb b/docs/docs/integrations/vectorstores/elasticsearch.ipynb
@@ -161,7 +161,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 1,
+   "execution_count": 3,
    "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
    "metadata": {
     "id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
@@ -194,7 +194,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 2,
+   "execution_count": 4,
    "id": "aac9563e",
    "metadata": {
     "id": "aac9563e",
@@ -208,7 +208,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 3,
+   "execution_count": 5,
    "id": "a3c3999a",
    "metadata": {
     "id": "a3c3999a",
@@ -229,7 +229,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 4,
+   "execution_count": 6,
    "id": "12eb86d8",
    "metadata": {
     "id": "12eb86d8",
@@ -271,7 +271,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 5,
+   "execution_count": 7,
    "id": "5d076412",
    "metadata": {},
    "outputs": [
@@ -313,7 +313,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 8,
    "id": "b2a4bd1b",
    "metadata": {},
    "outputs": [
@@ -345,7 +345,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 8,
+   "execution_count": 9,
    "id": "f3d294ff",
    "metadata": {},
    "outputs": [
@@ -375,7 +375,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 59,
+   "execution_count": 10,
    "id": "55b63a61",
    "metadata": {},
    "outputs": [
@@ -405,7 +405,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 60,
+   "execution_count": 11,
    "id": "9b831b3d",
    "metadata": {},
    "outputs": [
@@ -435,7 +435,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 12,
    "id": "fb1482e7",
    "metadata": {},
    "outputs": [],
@@ -504,27 +504,29 @@
    "metadata": {},
    "source": [
     "# Retrieval Strategies\n",
-    "Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies.  In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
+    "Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
     "\n",
-    "By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
+    "By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
     "\n",
-    "## ApproxRetrievalStrategy\n",
-    "This will return the top `k` most similar vectors to the query vector.  The `k` parameter is set when the `ElasticsearchStore` is initialized.  The default value is `10`."
+    "## DenseVectorStrategy\n",
+    "This will return the top `k` most similar vectors to the query vector.  The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 13,
    "id": "999b5ef5",
    "metadata": {},
    "outputs": [],
    "source": [
+    "from langchain_elasticsearch import DenseVectorStrategy\n",
+    "\n",
     "db = ElasticsearchStore.from_documents(\n",
     "    docs,\n",
     "    embeddings,\n",
     "    es_url=\"http://localhost:9200\",\n",
     "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
+    "    strategy=DenseVectorStrategy(),\n",
     ")\n",
     "\n",
     "docs = db.similarity_search(\n",
@@ -537,12 +539,12 @@
    "id": "9b651be5",
    "metadata": {},
    "source": [
-    "### Example: Approx with hybrid\n",
+    "### Example: Hybrid retrieval with dense vector and keyword search\n",
     "This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
     "\n",
     "We use RRF to balance the two scores from different retrieval methods.\n",
     "\n",
-    "To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
+    "To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
     "\n",
     "```python\n",
     "\n",
@@ -551,9 +553,7 @@
     "    embeddings, \n",
     "    es_url=\"http://localhost:9200\", \n",
     "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        hybrid=True,\n",
-    "    )\n",
+    "    strategy=DenseVectorStrategy(hybrid=True)\n",
     ")\n",
     "```\n",
     "\n",
@@ -582,35 +582,33 @@
     "}\n",
     "```\n",
     "\n",
-    "### Example: Approx with Embedding Model in Elasticsearch\n",
-    "This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
+    "### Example: Dense vector search with Embedding Model in Elasticsearch\n",
+    "This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
     "\n",
-    "To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
+    "To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
     "\n",
     "**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": null,
+   "execution_count": 14,
    "id": "0a0c85e7",
    "metadata": {},
    "outputs": [],
    "source": [
-    "APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
+    "DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
     "\n",
     "# Note: This does not have an embedding function specified\n",
     "# Instead, we will use the embedding model deployed in Elasticsearch\n",
     "db = ElasticsearchStore(\n",
     "    es_cloud_id=\"<your cloud id>\",\n",
     "    es_user=\"elastic\",\n",
     "    es_password=\"<your password>\",\n",
-    "    index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
     "    query_field=\"text_field\",\n",
     "    vector_query_field=\"vector_query_field.predicted_value\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
-    "    ),\n",
+    "    strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
     ")\n",
     "\n",
     "# Setup a Ingest Pipeline to perform the embedding\n",
@@ -631,7 +629,7 @@
     "# creating a new index with the pipeline,\n",
     "# not relying on langchain to create the index\n",
     "db.client.indices.create(\n",
-    "    index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
     "    mappings={\n",
     "        \"properties\": {\n",
     "            \"text_field\": {\"type\": \"text\"},\n",
@@ -655,12 +653,10 @@
     "    es_cloud_id=\"<cloud id>\",\n",
     "    es_user=\"elastic\",\n",
     "    es_password=\"<cloud password>\",\n",
-    "    index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
+    "    index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
     "    query_field=\"text_field\",\n",
     "    vector_query_field=\"vector_query_field.predicted_value\",\n",
-    "    strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
-    "        query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
-    "    ),\n",
+    "    strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
     ")\n",
     "\n",
     "# Perform search\n",
@@ -672,12 +668,12 @@
    "id": "53959de6",
    "metadata": {},
    "source": [
-    "## SparseVectorRetrievalStrategy (ELSER)\n",
+    "## SparseVectorStrategy (ELSER)\n",
     "This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
     "\n",
     "**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
     "\n",
-    "To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
+    "To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
    ]
   },
   {
@@ -695,15 +691,17 @@
     }
    ],
    "source": [
+    "from langchain_elasticsearch import SparseVectorStrategy\n",
+    "\n",
     "# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
     "# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
     "db = ElasticsearchStore.from_documents(\n",
     "    docs,\n",
-    "    es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
+    "    es_cloud_id=\"<cloud id>\",\n",
     "    es_user=\"elastic\",\n",
-    "    es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
+    "    es_password=\"<cloud password>\",\n",
     "    index_name=\"test-elser\",\n",
-    "    strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
+    "    strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
     ")\n",
     "\n",
     "db.client.indices.refresh(index=\"test-elser\")\n",
@@ -719,19 +717,42 @@
    "id": "edf3a093",
    "metadata": {},
    "source": [
-    "## ExactRetrievalStrategy\n",
-    "This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
+    "## DenseVectorScriptScoreStrategy\n",
+    "This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
     "\n",
-    "To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
+    "To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
     "\n",
     "```python\n",
+    "from langchain_elasticsearch import SparseVectorStrategy\n",
     "\n",
     "db = ElasticsearchStore.from_documents(\n",
     "    docs, \n",
     "    embeddings, \n",
     "    es_url=\"http://localhost:9200\", \n",
     "    index_name=\"test\",\n",
-    "    strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
+    "    strategy=DenseVectorScriptScoreStrategy(),\n",
+    ")\n",
+    "```"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "11b51c47",
+   "metadata": {},
+   "source": [
+    "## BM25Strategy\n",
+    "Finally, you can use full-text keyword search.\n",
+    "\n",
+    "To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
+    "\n",
+    "```python\n",
+    "from langchain_elasticsearch import BM25Strategy\n",
+    "\n",
+    "db = ElasticsearchStore.from_documents(\n",
+    "    docs, \n",
+    "    es_url=\"http://localhost:9200\", \n",
+    "    index_name=\"test\",\n",
+    "    strategy=BM25Strategy(),\n",
     ")\n",
     "```"
    ]
@@ -924,9 +945,9 @@
     "\n",
     "## What's new?\n",
     "\n",
-    "The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
+    "The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
     "\n",
-    "## Im using ElasticKNNSearch\n",
+    "## I am using ElasticKNNSearch\n",
     "\n",
     "Old implementation:\n",
     "\n",
@@ -946,21 +967,21 @@
     "\n",
     "```python\n",
     "\n",
-    "from langchain_elasticsearch import ElasticsearchStore\n",
+    "from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
     "\n",
     "db = ElasticsearchStore(\n",
     "  es_url=\"http://localhost:9200\",\n",
     "  index_name=\"test_index\",\n",
     "  embedding=embedding,\n",
     "  # if you use the model_id\n",
-    "  # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
+    "  # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
     "  # if you use hybrid search\n",
-    "  # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
+    "  # strategy=DenseVectorStrategy(hybrid=True)\n",
     ")\n",
     "\n",
     "```\n",
     "\n",
-    "## Im using ElasticVectorSearch\n",
+    "## I am using ElasticVectorSearch\n",
     "\n",
     "Old implementation:\n",
     "\n",
@@ -980,13 +1001,13 @@
     "\n",
     "```python\n",
     "\n",
-    "from langchain_elasticsearch import ElasticsearchStore\n",
+    "from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
     "\n",
     "db = ElasticsearchStore(\n",
     "  es_url=\"http://localhost:9200\",\n",
     "  index_name=\"test_index\",\n",
     "  embedding=embedding,\n",
-    "  strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
+    "  strategy=DenseVectorScriptScoreStrategy()\n",
     ")\n",
     "\n",
     "```"