Skip to content

Commit 5d4857d

Browse files
Changes requested to the RAG pipeline guide (#7214)
* Changes requested to the RAG pipeline guide * Additional edit * Additional copy edits, update docs version for the example dataset * Typo fix
1 parent 98b7992 commit 5d4857d

File tree

1 file changed

+24
-15
lines changed
  • docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke

1 file changed

+24
-15
lines changed

docs/guides/kubernetes/ai-chatbot-and-rag-pipeline-for-inference-on-lke/index.md

Lines changed: 24 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,11 @@
11
---
22
slug: ai-chatbot-and-rag-pipeline-for-inference-on-lke
3-
title: "Deploy an AI Chatbot and RAG Pipeline for Inferencing on LKE"
3+
title: "Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE"
44
description: "Utilize the Retrieval-Augmented Generation technique to supplement an LLM with your own custom data."
55
authors: ["Linode"]
66
contributors: ["Linode"]
77
published: 2025-02-11
8-
modified: 2025-02-13
8+
modified: 2025-03-11
99
keywords: ['kubernetes','lke','ai','inferencing','rag','chatbot','architecture']
1010
tags: ["kubernetes","lke"]
1111
license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)'
@@ -37,15 +37,15 @@ Follow this tutorial to deploy a RAG pipeline on Akamai’s LKE service using ou
3737
- **Kubeflow Pipeline:** Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to process the dataset and store embeddings.
3838
- **Meta’s Llama 3 LLM:** The [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model is used as the LLM. You should review and agree to the licensing agreement before deploying.
3939
- **Milvus:** Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM.
40-
- **Open WebUI:** This is an self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG solutions. Users interact with this interface to query the LLM. This can be configured to send queries straight to Llama 3 or to first load data from Milvus and send that context along with the query.
40+
- **Open WebUI:** This is a self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG solutions. Users interact with this interface to query the LLM. This can be configured to send queries straight to Llama 3 or to first load data from Milvus and send that context along with the query.
4141

4242
## Prerequisites
4343

4444
This tutorial requires you to have access to a few different services and local software tools. You should also have a custom dataset available to use for the pipeline.
4545

4646
- A [Cloud Manager](https://cloud.linode.com/) account is required to use many of Akamai’s cloud computing services, including LKE.
4747
- A [Hugging Face](https://huggingface.co/) account is used for deploying the Llama 3 LLM to KServe.
48-
- You should have both [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster.
48+
- You should have [kubectl](https://kubernetes.io/docs/reference/kubectl/), [Kustomize](https://kustomize.io/), and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster.
4949
- A **custom dataset** is needed, preferably in Markdown format, though you can use other types of data if you modify the LlamaIndex configuration provided in this tutorial. This dataset should contain all of the information you want used by the Llama 3 LLM. This tutorial uses a Markdown dataset containing all of the Linode Docs.
5050

5151
{{< note type="warning" title="Production workloads" >}}
@@ -61,7 +61,7 @@ It’s not part of the scope of this document to cover the setup required to sec
6161

6262
The first step is to provision the infrastructure needed for this tutorial and configure it with kubectl, so that you can manage it locally and install software through helm. As part of this process, we’ll also need to install the NVIDIA GPU operator at this step so that the NVIDIA cards within the GPU worker nodes can be used on Kubernetes.
6363

64-
1. **Provision an LKE cluster.** We recommend using at least 3 **RTX4000 Ada x1 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a1-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores for just their own application. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide.
64+
1. **Provision an LKE cluster.** We recommend using at least 2 **RTX4000 Ada x1 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a1-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores for just their own application. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide.
6565

6666
{{< note noTitle=true >}}
6767
GPU plans are available in a limited number of data centers. Review the [GPU product documentation](https://techdocs.akamai.com/cloud-computing/docs/gpu-compute-instances#availability) to learn more about availability.
@@ -97,10 +97,10 @@ Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all
9797
openssl rand -base64 18
9898
```
9999

100-
1. Create a hash of this password, replacing PASSWORD with the password generated in the previous step. This outputs a string starting with `$2y$12$`, which is password hash.
100+
1. Create a hash of this password, replacing PASSWORD with the password generated in the previous step. This outputs the password hash, which starts with `$2y$12$`.
101101

102102
```command
103-
htpasswd -bnBC 12 "" <PASSWORD> | tr -d ':\n'
103+
htpasswd -bnBC 12 "" PASSWORD | tr -d ':\n'
104104
```
105105

106106
1. Edit the `common/dex/base/dex-passwords.yaml` file, replacing the value for `DEX_USER_PASSWORD` with the password hash generated in the previous step.
@@ -111,7 +111,7 @@ Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all
111111
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done
112112
```
113113

114-
1. This may take some time to finish. Once it’s complete, verify that all pods are in the ready state.
114+
1. This may take some time to finish. Once it’s complete, verify that all pods are in the running state.
115115

116116
```command
117117
kubectl get pods -A
@@ -152,6 +152,7 @@ After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe.
152152
name: huggingface-llama3
153153
spec:
154154
predictor:
155+
minReplicas: 1
155156
model:
156157
modelFormat:
157158
name: huggingface
@@ -202,6 +203,11 @@ Milvus, the vector database designed for AI inference workloads, will be used as
202203
nvidia.com/gpu: "1"
203204
limits:
204205
nvidia.com/gpu: "1"
206+
persistentVolumeClaim:
207+
size: 5Gi
208+
minio:
209+
persistence:
210+
size: 50Gi
205211
```
206212

207213
1. Add Milvus to Helm.
@@ -214,7 +220,7 @@ Milvus, the vector database designed for AI inference workloads, will be used as
214220
1. Install Milvus using Helm.
215221

216222
```command
217-
helm install my-release milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f milvus-custom-values.yaml
223+
helm install my-release milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsarv3.enabled=false -f milvus-custom-values.yaml
218224
```
219225

220226
## Set up Kubeflow Pipeline to ingest data
@@ -335,19 +341,19 @@ This tutorial employs a Python script to create the YAML file used within Kubefl
335341
336342
![Screenshot of the "New Experiment" page within Kubeflow](kubeflow-new-experiment.jpg)
337343
338-
1. Next, navigate to Pipelines > Pipelines and click the **Upload Pipeline** link. Select **Upload a file** and use the **Choose file** dialog box to select the pipeline YAML file that was created in a previous step.
344+
1. Next, navigate to Pipelines > Pipelines and click the **Upload Pipeline** link. Select **Upload a file** and use the **Choose file** dialog box to select the pipeline YAML file that was created in a previous step. Click the **Create** button to create the pipeline.
339345
340346
![Screenshot of the "New Pipeline" page within Kubeflow](kubeflow-new-pipeline.jpg)
341347
342-
1. Navigate to the Pipelines > Runs page and click **Create Run**. Within the Run details section, select the pipeline and experiment that you just created. Choose *One-off* as the **Run Type** and provide the collection name and URL of the dataset (the zip file with the documents you wish to process) in the **Run parameters** section. For this tutorial, we are using `linode_docs` as the name and `https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip` and the dataset URL.
348+
1. Navigate to the Pipelines > Runs page and click **Create Run**. Within the Run details section, select the pipeline and experiment that you just created. Choose *One-off* as the **Run Type** and provide the collection name and URL of the dataset (the zip file with the documents you wish to process) in the **Run parameters** section. For this tutorial, we are using `linode_docs` as the name and `https://github.com/linode/docs/archive/refs/tags/v1.366.0.zip` as the dataset URL.
343349
344350
![Screenshot of the "Start a new run" page within Kubeflow](kubeflow-new-run.jpg)
345351
346-
1. Click **Start** to run the pipeline. This process takes some time. For reference, it took ~10 minutes for the run to complete successfully on the linode.com/docs dataset.
352+
1. Click **Start** to run the pipeline. This process takes some time. For reference, it takes about ~10 minutes for the run to complete on the linode.com/docs dataset.
347353
348354
## Deploy the chatbot
349355
350-
To finish up this tutorial, we will install the Open-WebUI chatbot and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. Once this is up and running, you can open up a browser interface to the chatbot and ask it questions. Chatbot UI will use the Milvus database to load context related to the search and send it, along with your query, to the Llama 3 instance within KServe. The LLM will send back a response to the chatbot and your browser will display an answer that is informed by your own custom data.
356+
To finish up this tutorial, install the Open-WebUI chatbot and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. Once this is up and running, you can open up a browser interface to the chatbot and ask it questions. Chatbot UI uses the Milvus database to load context related to the search and sends it, along with your query, to the Llama 3 instance within KServe. The LLM then sends back a response to the chatbot and your browser displays an answer that is informed by your own custom data.
351357
352358
### Create the RAG pipeline files
353359
@@ -360,6 +366,7 @@ Despite the naming, these RAG pipeline files are not related to the Kubeflow pip
360366
```file {title="pipeline-requirements.txt"}
361367
requests
362368
pymilvus
369+
opencv-python-headless
363370
llama-index
364371
llama-index-vector-stores-milvus
365372
llama-index-embeddings-huggingface
@@ -368,7 +375,7 @@ Despite the naming, these RAG pipeline files are not related to the Kubeflow pip
368375
369376
1. Create a file called `rag_pipeline.py` with the following contents. The filenames of both the `pipeline-requirements.txt` and `rag_pipeline.py` files should not be changed as they are referenced within the Open WebUI Pipeline configuration file.
370377
371-
```file {title="rag-pipeline.py"}
378+
```file {title="rag_pipeline.py"}
372379
"""
373380
title: RAG Pipeline
374381
version: 1.0
@@ -594,4 +601,6 @@ Now that the chatbot has been configured, the final step is to access the chatbo
594601
595602
- The **RAG Pipeline** model that you defined in a previous section does use data from your custom dataset. Ask it a question relevant to your data and the chatbot should respond with an answer that is informed by the custom dataset you configured.
596603
597-
![Screenshot of a RAG Pipeline query in Open WebUI](open-webui-rag-pipeline.jpg)
604+
![Screenshot of a RAG Pipeline query in Open WebUI](open-webui-rag-pipeline.jpg)
605+
606+
The response time depends on a variety of factors. Using similar cluster resources and the same dataset as this guide, an estimated response time is between 6 to 70 seconds.

0 commit comments

Comments
 (0)