Skip to content

Commit ac5cf32

Browse files
maxjakobErick Friis
authored andcommitted
docs: update Elasticsearch strategy names (#21530)
Update documentation with the [new names for retrieval strategies](langchain-ai/langchain-elastic#22) --------- Co-authored-by: Erick Friis <[email protected]>
1 parent 78b58d7 commit ac5cf32

File tree

1 file changed

+73
-52
lines changed

1 file changed

+73
-52
lines changed

docs/docs/integrations/vectorstores/elasticsearch.ipynb

Lines changed: 73 additions & 52 deletions
Original file line numberDiff line numberDiff line change
@@ -161,7 +161,7 @@
161161
},
162162
{
163163
"cell_type": "code",
164-
"execution_count": 1,
164+
"execution_count": 3,
165165
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
166166
"metadata": {
167167
"id": "67ab8afa-f7c6-4fbf-b596-cb512da949da",
@@ -194,7 +194,7 @@
194194
},
195195
{
196196
"cell_type": "code",
197-
"execution_count": 2,
197+
"execution_count": 4,
198198
"id": "aac9563e",
199199
"metadata": {
200200
"id": "aac9563e",
@@ -208,7 +208,7 @@
208208
},
209209
{
210210
"cell_type": "code",
211-
"execution_count": 3,
211+
"execution_count": 5,
212212
"id": "a3c3999a",
213213
"metadata": {
214214
"id": "a3c3999a",
@@ -229,7 +229,7 @@
229229
},
230230
{
231231
"cell_type": "code",
232-
"execution_count": 4,
232+
"execution_count": 6,
233233
"id": "12eb86d8",
234234
"metadata": {
235235
"id": "12eb86d8",
@@ -271,7 +271,7 @@
271271
},
272272
{
273273
"cell_type": "code",
274-
"execution_count": 5,
274+
"execution_count": 7,
275275
"id": "5d076412",
276276
"metadata": {},
277277
"outputs": [
@@ -313,7 +313,7 @@
313313
},
314314
{
315315
"cell_type": "code",
316-
"execution_count": 6,
316+
"execution_count": 8,
317317
"id": "b2a4bd1b",
318318
"metadata": {},
319319
"outputs": [
@@ -345,7 +345,7 @@
345345
},
346346
{
347347
"cell_type": "code",
348-
"execution_count": 8,
348+
"execution_count": 9,
349349
"id": "f3d294ff",
350350
"metadata": {},
351351
"outputs": [
@@ -375,7 +375,7 @@
375375
},
376376
{
377377
"cell_type": "code",
378-
"execution_count": 59,
378+
"execution_count": 10,
379379
"id": "55b63a61",
380380
"metadata": {},
381381
"outputs": [
@@ -405,7 +405,7 @@
405405
},
406406
{
407407
"cell_type": "code",
408-
"execution_count": 60,
408+
"execution_count": 11,
409409
"id": "9b831b3d",
410410
"metadata": {},
411411
"outputs": [
@@ -435,7 +435,7 @@
435435
},
436436
{
437437
"cell_type": "code",
438-
"execution_count": null,
438+
"execution_count": 12,
439439
"id": "fb1482e7",
440440
"metadata": {},
441441
"outputs": [],
@@ -504,27 +504,29 @@
504504
"metadata": {},
505505
"source": [
506506
"# Retrieval Strategies\n",
507-
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
507+
"Elasticsearch has big advantages over other vector only databases from its ability to support a wide range of retrieval strategies. In this notebook we will configure `ElasticsearchStore` to support some of the most common retrieval strategies. \n",
508508
"\n",
509-
"By default, `ElasticsearchStore` uses the `ApproxRetrievalStrategy`.\n",
509+
"By default, `ElasticsearchStore` uses the `DenseVectorStrategy` (was called `ApproxRetrievalStrategy` prior to version 0.2.0).\n",
510510
"\n",
511-
"## ApproxRetrievalStrategy\n",
512-
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
511+
"## DenseVectorStrategy\n",
512+
"This will return the top `k` most similar vectors to the query vector. The `k` parameter is set when the `ElasticsearchStore` is initialized. The default value is `10`."
513513
]
514514
},
515515
{
516516
"cell_type": "code",
517-
"execution_count": null,
517+
"execution_count": 13,
518518
"id": "999b5ef5",
519519
"metadata": {},
520520
"outputs": [],
521521
"source": [
522+
"from langchain_elasticsearch import DenseVectorStrategy\n",
523+
"\n",
522524
"db = ElasticsearchStore.from_documents(\n",
523525
" docs,\n",
524526
" embeddings,\n",
525527
" es_url=\"http://localhost:9200\",\n",
526528
" index_name=\"test\",\n",
527-
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(),\n",
529+
" strategy=DenseVectorStrategy(),\n",
528530
")\n",
529531
"\n",
530532
"docs = db.similarity_search(\n",
@@ -537,12 +539,12 @@
537539
"id": "9b651be5",
538540
"metadata": {},
539541
"source": [
540-
"### Example: Approx with hybrid\n",
542+
"### Example: Hybrid retrieval with dense vector and keyword search\n",
541543
"This example will show how to configure `ElasticsearchStore` to perform a hybrid retrieval, using a combination of approximate semantic search and keyword based search. \n",
542544
"\n",
543545
"We use RRF to balance the two scores from different retrieval methods.\n",
544546
"\n",
545-
"To enable hybrid retrieval, we need to set `hybrid=True` in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor.\n",
547+
"To enable hybrid retrieval, we need to set `hybrid=True` in the `DenseVectorStrategy` constructor.\n",
546548
"\n",
547549
"```python\n",
548550
"\n",
@@ -551,9 +553,7 @@
551553
" embeddings, \n",
552554
" es_url=\"http://localhost:9200\", \n",
553555
" index_name=\"test\",\n",
554-
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
555-
" hybrid=True,\n",
556-
" )\n",
556+
" strategy=DenseVectorStrategy(hybrid=True)\n",
557557
")\n",
558558
"```\n",
559559
"\n",
@@ -582,35 +582,33 @@
582582
"}\n",
583583
"```\n",
584584
"\n",
585-
"### Example: Approx with Embedding Model in Elasticsearch\n",
586-
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for approximate retrieval. \n",
585+
"### Example: Dense vector search with Embedding Model in Elasticsearch\n",
586+
"This example will show how to configure `ElasticsearchStore` to use the embedding model deployed in Elasticsearch for dense vector retrieval.\n",
587587
"\n",
588-
"To use this, specify the model_id in `ElasticsearchStore` `ApproxRetrievalStrategy` constructor via the `query_model_id` argument.\n",
588+
"To use this, specify the model_id in `DenseVectorStrategy` constructor via the `query_model_id` argument.\n",
589589
"\n",
590590
"**NOTE** This requires the model to be deployed and running in Elasticsearch ml node. See [notebook example](https://github.com/elastic/elasticsearch-labs/blob/main/notebooks/integrations/hugging-face/loading-model-from-hugging-face.ipynb) on how to deploy the model with eland.\n"
591591
]
592592
},
593593
{
594594
"cell_type": "code",
595-
"execution_count": null,
595+
"execution_count": 14,
596596
"id": "0a0c85e7",
597597
"metadata": {},
598598
"outputs": [],
599599
"source": [
600-
"APPROX_SELF_DEPLOYED_INDEX_NAME = \"test-approx-self-deployed\"\n",
600+
"DENSE_SELF_DEPLOYED_INDEX_NAME = \"test-dense-self-deployed\"\n",
601601
"\n",
602602
"# Note: This does not have an embedding function specified\n",
603603
"# Instead, we will use the embedding model deployed in Elasticsearch\n",
604604
"db = ElasticsearchStore(\n",
605605
" es_cloud_id=\"<your cloud id>\",\n",
606606
" es_user=\"elastic\",\n",
607607
" es_password=\"<your password>\",\n",
608-
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
608+
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
609609
" query_field=\"text_field\",\n",
610610
" vector_query_field=\"vector_query_field.predicted_value\",\n",
611-
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
612-
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
613-
" ),\n",
611+
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
614612
")\n",
615613
"\n",
616614
"# Setup a Ingest Pipeline to perform the embedding\n",
@@ -631,7 +629,7 @@
631629
"# creating a new index with the pipeline,\n",
632630
"# not relying on langchain to create the index\n",
633631
"db.client.indices.create(\n",
634-
" index=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
632+
" index=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
635633
" mappings={\n",
636634
" \"properties\": {\n",
637635
" \"text_field\": {\"type\": \"text\"},\n",
@@ -655,12 +653,10 @@
655653
" es_cloud_id=\"<cloud id>\",\n",
656654
" es_user=\"elastic\",\n",
657655
" es_password=\"<cloud password>\",\n",
658-
" index_name=APPROX_SELF_DEPLOYED_INDEX_NAME,\n",
656+
" index_name=DENSE_SELF_DEPLOYED_INDEX_NAME,\n",
659657
" query_field=\"text_field\",\n",
660658
" vector_query_field=\"vector_query_field.predicted_value\",\n",
661-
" strategy=ElasticsearchStore.ApproxRetrievalStrategy(\n",
662-
" query_model_id=\"sentence-transformers__all-minilm-l6-v2\"\n",
663-
" ),\n",
659+
" strategy=DenseVectorStrategy(model_id=\"sentence-transformers__all-minilm-l6-v2\"),\n",
664660
")\n",
665661
"\n",
666662
"# Perform search\n",
@@ -672,12 +668,12 @@
672668
"id": "53959de6",
673669
"metadata": {},
674670
"source": [
675-
"## SparseVectorRetrievalStrategy (ELSER)\n",
671+
"## SparseVectorStrategy (ELSER)\n",
676672
"This strategy uses Elasticsearch's sparse vector retrieval to retrieve the top-k results. We only support our own \"ELSER\" embedding model for now.\n",
677673
"\n",
678674
"**NOTE** This requires the ELSER model to be deployed and running in Elasticsearch ml node. \n",
679675
"\n",
680-
"To use this, specify `SparseVectorRetrievalStrategy` in `ElasticsearchStore` constructor."
676+
"To use this, specify `SparseVectorStrategy` (was called `SparseVectorRetrievalStrategy` prior to version 0.2.0) in the `ElasticsearchStore` constructor. You will need to provide a model ID."
681677
]
682678
},
683679
{
@@ -695,15 +691,17 @@
695691
}
696692
],
697693
"source": [
694+
"from langchain_elasticsearch import SparseVectorStrategy\n",
695+
"\n",
698696
"# Note that this example doesn't have an embedding function. This is because we infer the tokens at index time and at query time within Elasticsearch.\n",
699697
"# This requires the ELSER model to be loaded and running in Elasticsearch.\n",
700698
"db = ElasticsearchStore.from_documents(\n",
701699
" docs,\n",
702-
" es_cloud_id=\"My_deployment:dXMtY2VudHJhbDEuZ2NwLmNsb3VkLmVzLmlvOjQ0MyQ2OGJhMjhmNDc1M2Y0MWVjYTk2NzI2ZWNkMmE5YzRkNyQ3NWI4ODRjNWQ2OTU0MTYzODFjOTkxNmQ1YzYxMGI1Mw==\",\n",
700+
" es_cloud_id=\"<cloud id>\",\n",
703701
" es_user=\"elastic\",\n",
704-
" es_password=\"GgUPiWKwEzgHIYdHdgPk1Lwi\",\n",
702+
" es_password=\"<cloud password>\",\n",
705703
" index_name=\"test-elser\",\n",
706-
" strategy=ElasticsearchStore.SparseVectorRetrievalStrategy(),\n",
704+
" strategy=SparseVectorStrategy(model_id=\".elser_model_2\"),\n",
707705
")\n",
708706
"\n",
709707
"db.client.indices.refresh(index=\"test-elser\")\n",
@@ -719,19 +717,42 @@
719717
"id": "edf3a093",
720718
"metadata": {},
721719
"source": [
722-
"## ExactRetrievalStrategy\n",
723-
"This strategy uses Elasticsearch's exact retrieval (also known as brute force) to retrieve the top-k results.\n",
720+
"## DenseVectorScriptScoreStrategy\n",
721+
"This strategy uses Elasticsearch's script score query to perform exact vector retrieval (also known as brute force) to retrieve the top-k results. (This strategy was called `ExactRetrievalStrategy` prior to version 0.2.0.)\n",
724722
"\n",
725-
"To use this, specify `ExactRetrievalStrategy` in `ElasticsearchStore` constructor.\n",
723+
"To use this, specify `DenseVectorScriptScoreStrategy` in `ElasticsearchStore` constructor.\n",
726724
"\n",
727725
"```python\n",
726+
"from langchain_elasticsearch import SparseVectorStrategy\n",
728727
"\n",
729728
"db = ElasticsearchStore.from_documents(\n",
730729
" docs, \n",
731730
" embeddings, \n",
732731
" es_url=\"http://localhost:9200\", \n",
733732
" index_name=\"test\",\n",
734-
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
733+
" strategy=DenseVectorScriptScoreStrategy(),\n",
734+
")\n",
735+
"```"
736+
]
737+
},
738+
{
739+
"cell_type": "markdown",
740+
"id": "11b51c47",
741+
"metadata": {},
742+
"source": [
743+
"## BM25Strategy\n",
744+
"Finally, you can use full-text keyword search.\n",
745+
"\n",
746+
"To use this, specify `BM25Strategy` in `ElasticsearchStore` constructor.\n",
747+
"\n",
748+
"```python\n",
749+
"from langchain_elasticsearch import BM25Strategy\n",
750+
"\n",
751+
"db = ElasticsearchStore.from_documents(\n",
752+
" docs, \n",
753+
" es_url=\"http://localhost:9200\", \n",
754+
" index_name=\"test\",\n",
755+
" strategy=BM25Strategy(),\n",
735756
")\n",
736757
"```"
737758
]
@@ -924,9 +945,9 @@
924945
"\n",
925946
"## What's new?\n",
926947
"\n",
927-
"The new implementation is now one class called `ElasticsearchStore` which can be used for approx, exact, and ELSER search retrieval, via strategies.\n",
948+
"The new implementation is now one class called `ElasticsearchStore` which can be used for approximate dense vector, exact dense vector, sparse vector (ELSER), BM25 retrieval and hybrid retrieval, via strategies.\n",
928949
"\n",
929-
"## Im using ElasticKNNSearch\n",
950+
"## I am using ElasticKNNSearch\n",
930951
"\n",
931952
"Old implementation:\n",
932953
"\n",
@@ -946,21 +967,21 @@
946967
"\n",
947968
"```python\n",
948969
"\n",
949-
"from langchain_elasticsearch import ElasticsearchStore\n",
970+
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorStrategy\n",
950971
"\n",
951972
"db = ElasticsearchStore(\n",
952973
" es_url=\"http://localhost:9200\",\n",
953974
" index_name=\"test_index\",\n",
954975
" embedding=embedding,\n",
955976
" # if you use the model_id\n",
956-
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( query_model_id=\"test_model\" )\n",
977+
" # strategy=DenseVectorStrategy(model_id=\"test_model\")\n",
957978
" # if you use hybrid search\n",
958-
" # strategy=ElasticsearchStore.ApproxRetrievalStrategy( hybrid=True )\n",
979+
" # strategy=DenseVectorStrategy(hybrid=True)\n",
959980
")\n",
960981
"\n",
961982
"```\n",
962983
"\n",
963-
"## Im using ElasticVectorSearch\n",
984+
"## I am using ElasticVectorSearch\n",
964985
"\n",
965986
"Old implementation:\n",
966987
"\n",
@@ -980,13 +1001,13 @@
9801001
"\n",
9811002
"```python\n",
9821003
"\n",
983-
"from langchain_elasticsearch import ElasticsearchStore\n",
1004+
"from langchain_elasticsearch import ElasticsearchStore, DenseVectorScriptScoreStrategy\n",
9841005
"\n",
9851006
"db = ElasticsearchStore(\n",
9861007
" es_url=\"http://localhost:9200\",\n",
9871008
" index_name=\"test_index\",\n",
9881009
" embedding=embedding,\n",
989-
" strategy=ElasticsearchStore.ExactRetrievalStrategy()\n",
1010+
" strategy=DenseVectorScriptScoreStrategy()\n",
9901011
")\n",
9911012
"\n",
9921013
"```"

0 commit comments

Comments
 (0)