Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions ci/vale/dictionary.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1022,6 +1022,8 @@ hybla
Hydrolix
hykes
hypercorn
Hyperdisk
Hyperdisks
hyperefficient
HyperLogLog
hyperparameter
Expand Down
4 changes: 2 additions & 2 deletions docs/assets/1238-dovecot_10-auth.conf.txt
Original file line number Diff line number Diff line change
Expand Up @@ -118,10 +118,10 @@ auth_mechanisms = plain login
#!include auth-deny.conf.ext
#!include auth-master.conf.ext

!include auth-system.conf.ext
#!include auth-system.conf.ext
!include auth-sql.conf.ext
#!include auth-ldap.conf.ext
#!include auth-passwdfile.conf.ext
#!include auth-checkpassword.conf.ext
#!include auth-vpopmail.conf.ext
#!include auth-static.conf.ext
#!include auth-static.conf.ext
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ keywords: ['build cloud-native container registry with quay','red hat quay','cen
license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)'
---

Docker doesn’t provide long term storage or image distribution capabilities, so developers need something more. [Docker Registry](https://docs.docker.com/registry/) performs these tasks, and using it guarantees the same application runtime environment through virtualization. However, building an image can involve a significant time investment, which is where [Quay](https://www.redhat.com/en/resources/quay-datasheet) (pronounced *kway*) comes in. A registry like Quay can both build and store containers. You can then deploy these containers in a shorter time and with less effort than using Docker Registry. This guide explains how Quay can be an essential part of the development process and details how to deploy a Quay registry.
Docker doesn’t provide long term storage or image distribution capabilities, so developers need something more. [Docker Registry](https://docs.docker.com/registry/) performs these tasks, and using it guarantees the same application runtime environment through virtualization. However, building an image can involve a significant time investment, which is where [Quay](https://www.redhat.com/en/resources/quay-datasheet) comes in. A registry like Quay can both build and store containers. You can then deploy these containers in a shorter time and with less effort than using Docker Registry. This guide explains how Quay can be an essential part of the development process and details how to deploy a Quay registry.

## What is Red Hat Quay?

Expand Down
2 changes: 1 addition & 1 deletion docs/guides/databases/general/database-solutions/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,4 +128,4 @@ There are many installation and configuration guides available on our docs site
- [Apache Cassandra guides](/docs/guides/databases/cassandra/)
- [Redis guides](/docs/guides/databases/redis/)
- [PostgreSQL guides](/docs/guides/databases/postgresql/)
- [CouchDB guides](/docs/guides/databases/couchdb/)
- [CouchDB guides](/docs/guides/databases/couchdb/)
Original file line number Diff line number Diff line change
Expand Up @@ -575,7 +575,7 @@ disable_plaintext_auth = yes
...
auth_mechanisms = plain login
...
!include auth-system.conf.ext
#!include auth-system.conf.ext
...
!include auth-sql.conf.ext
...
Expand Down Expand Up @@ -870,4 +870,4 @@ spamassassin unix - n n - - pipe

1. Restart the Postfix email server to get your new anti-spam settings in place:

sudo systemctl restart postfix
sudo systemctl restart postfix
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
---
slug: ai-chatbot-and-rag-pipeline-for-inference-on-lke
title: "Deploy an AI Chatbot and RAG Pipeline for Inferencing on LKE"
title: "Deploy a Chatbot and RAG Pipeline for AI Inferencing on LKE"
description: "Utilize the Retrieval-Augmented Generation technique to supplement an LLM with your own custom data."
authors: ["Linode"]
contributors: ["Linode"]
published: 2025-02-11
modified: 2025-02-13
modified: 2025-03-11
keywords: ['kubernetes','lke','ai','inferencing','rag','chatbot','architecture']
tags: ["kubernetes","lke"]
license: '[CC BY-ND 4.0](https://creativecommons.org/licenses/by-nd/4.0)'
Expand Down Expand Up @@ -37,15 +37,15 @@ Follow this tutorial to deploy a RAG pipeline on Akamai’s LKE service using ou
- **Kubeflow Pipeline:** Used to deploy pipelines, reusable machine learning workflows built using the Kubeflow Pipelines SDK. In this tutorial, a pipeline is used to run LlamaIndex to process the dataset and store embeddings.
- **Meta’s Llama 3 LLM:** The [meta-llama/Meta-Llama-3-8B](https://huggingface.co/meta-llama/Meta-Llama-3-8B) model is used as the LLM. You should review and agree to the licensing agreement before deploying.
- **Milvus:** Milvus is an open-source vector database and is used for generative AI workloads. This tutorial uses Milvus to store embeddings generated by LlamaIndex and make them available to queries sent to the Llama 3 LLM.
- **Open WebUI:** This is an self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG solutions. Users interact with this interface to query the LLM. This can be configured to send queries straight to Llama 3 or to first load data from Milvus and send that context along with the query.
- **Open WebUI:** This is a self-hosted AI chatbot application that’s compatible with LLMs like Llama 3 and includes a built-in inference engine for RAG solutions. Users interact with this interface to query the LLM. This can be configured to send queries straight to Llama 3 or to first load data from Milvus and send that context along with the query.

## Prerequisites

This tutorial requires you to have access to a few different services and local software tools. You should also have a custom dataset available to use for the pipeline.

- A [Cloud Manager](https://cloud.linode.com/) account is required to use many of Akamai’s cloud computing services, including LKE.
- A [Hugging Face](https://huggingface.co/) account is used for deploying the Llama 3 LLM to KServe.
- You should have both [kubectl](https://kubernetes.io/docs/reference/kubectl/) and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster.
- You should have [kubectl](https://kubernetes.io/docs/reference/kubectl/), [Kustomize](https://kustomize.io/), and [Helm](https://helm.sh/) installed on your local machine. These apps are used for managing your LKE cluster and installing applications to your cluster.
- A **custom dataset** is needed, preferably in Markdown format, though you can use other types of data if you modify the LlamaIndex configuration provided in this tutorial. This dataset should contain all of the information you want used by the Llama 3 LLM. This tutorial uses a Markdown dataset containing all of the Linode Docs.

{{< note type="warning" title="Production workloads" >}}
Expand All @@ -61,7 +61,7 @@ It’s not part of the scope of this document to cover the setup required to sec

The first step is to provision the infrastructure needed for this tutorial and configure it with kubectl, so that you can manage it locally and install software through helm. As part of this process, we’ll also need to install the NVIDIA GPU operator at this step so that the NVIDIA cards within the GPU worker nodes can be used on Kubernetes.

1. **Provision an LKE cluster.** We recommend using at least 3 **RTX4000 Ada x1 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a1-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores for just their own application. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide.
1. **Provision an LKE cluster.** We recommend using at least 2 **RTX4000 Ada x1 Medium** GPU plans (plan ID: `g2-gpu-rtx4000a1-m`), though you can adjust this as needed. For reference, Kubeflow recommends 32 GB of RAM and 16 CPU cores for just their own application. This tutorial has been tested using Kubernetes v1.31, though other versions should also work. To learn more about provisioning a cluster, see the [Create a cluster](https://techdocs.akamai.com/cloud-computing/docs/create-a-cluster) guide.

{{< note noTitle=true >}}
GPU plans are available in a limited number of data centers. Review the [GPU product documentation](https://techdocs.akamai.com/cloud-computing/docs/gpu-compute-instances#availability) to learn more about availability.
Expand Down Expand Up @@ -97,10 +97,10 @@ Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all
openssl rand -base64 18
```

1. Create a hash of this password, replacing PASSWORD with the password generated in the previous step. This outputs a string starting with `$2y$12$`, which is password hash.
1. Create a hash of this password, replacing PASSWORD with the password generated in the previous step. This outputs the password hash, which starts with `$2y$12$`.

```command
htpasswd -bnBC 12 "" <PASSWORD> | tr -d ':\n'
htpasswd -bnBC 12 "" PASSWORD | tr -d ':\n'
```

1. Edit the `common/dex/base/dex-passwords.yaml` file, replacing the value for `DEX_USER_PASSWORD` with the password hash generated in the previous step.
Expand All @@ -111,7 +111,7 @@ Next, let’s deploy Kubeflow on the LKE cluster. These instructions deploy all
while ! kustomize build example | kubectl apply -f -; do echo "Retrying to apply resources"; sleep 20; done
```

1. This may take some time to finish. Once it’s complete, verify that all pods are in the ready state.
1. This may take some time to finish. Once it’s complete, verify that all pods are in the running state.

```command
kubectl get pods -A
Expand Down Expand Up @@ -152,6 +152,7 @@ After Kubeflow has been installed, we can now deploy the Llama 3 LLM to KServe.
name: huggingface-llama3
spec:
predictor:
minReplicas: 1
model:
modelFormat:
name: huggingface
Expand Down Expand Up @@ -202,6 +203,11 @@ Milvus, the vector database designed for AI inference workloads, will be used as
nvidia.com/gpu: "1"
limits:
nvidia.com/gpu: "1"
persistentVolumeClaim:
size: 5Gi
minio:
persistence:
size: 50Gi
```

1. Add Milvus to Helm.
Expand All @@ -214,7 +220,7 @@ Milvus, the vector database designed for AI inference workloads, will be used as
1. Install Milvus using Helm.

```command
helm install my-release milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsar.enabled=false -f milvus-custom-values.yaml
helm install my-release milvus/milvus --set cluster.enabled=false --set etcd.replicaCount=1 --set minio.mode=standalone --set pulsarv3.enabled=false -f milvus-custom-values.yaml
```

## Set up Kubeflow Pipeline to ingest data
Expand Down Expand Up @@ -335,19 +341,19 @@ This tutorial employs a Python script to create the YAML file used within Kubefl

![Screenshot of the "New Experiment" page within Kubeflow](kubeflow-new-experiment.jpg)

1. Next, navigate to Pipelines > Pipelines and click the **Upload Pipeline** link. Select **Upload a file** and use the **Choose file** dialog box to select the pipeline YAML file that was created in a previous step.
1. Next, navigate to Pipelines > Pipelines and click the **Upload Pipeline** link. Select **Upload a file** and use the **Choose file** dialog box to select the pipeline YAML file that was created in a previous step. Click the **Create** button to create the pipeline.

![Screenshot of the "New Pipeline" page within Kubeflow](kubeflow-new-pipeline.jpg)

1. Navigate to the Pipelines > Runs page and click **Create Run**. Within the Run details section, select the pipeline and experiment that you just created. Choose *One-off* as the **Run Type** and provide the collection name and URL of the dataset (the zip file with the documents you wish to process) in the **Run parameters** section. For this tutorial, we are using `linode_docs` as the name and `https://github.com/linode/docs/archive/refs/tags/v1.360.0.zip` and the dataset URL.
1. Navigate to the Pipelines > Runs page and click **Create Run**. Within the Run details section, select the pipeline and experiment that you just created. Choose *One-off* as the **Run Type** and provide the collection name and URL of the dataset (the zip file with the documents you wish to process) in the **Run parameters** section. For this tutorial, we are using `linode_docs` as the name and `https://github.com/linode/docs/archive/refs/tags/v1.366.0.zip` as the dataset URL.

![Screenshot of the "Start a new run" page within Kubeflow](kubeflow-new-run.jpg)

1. Click **Start** to run the pipeline. This process takes some time. For reference, it took ~10 minutes for the run to complete successfully on the linode.com/docs dataset.
1. Click **Start** to run the pipeline. This process takes some time. For reference, it takes about ~10 minutes for the run to complete on the linode.com/docs dataset.

## Deploy the chatbot

To finish up this tutorial, we will install the Open-WebUI chatbot and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. Once this is up and running, you can open up a browser interface to the chatbot and ask it questions. Chatbot UI will use the Milvus database to load context related to the search and send it, along with your query, to the Llama 3 instance within KServe. The LLM will send back a response to the chatbot and your browser will display an answer that is informed by your own custom data.
To finish up this tutorial, install the Open-WebUI chatbot and configure it to connect the data generated in the Kubernetes Pipeline with the LLM deployed in KServe. Once this is up and running, you can open up a browser interface to the chatbot and ask it questions. Chatbot UI uses the Milvus database to load context related to the search and sends it, along with your query, to the Llama 3 instance within KServe. The LLM then sends back a response to the chatbot and your browser displays an answer that is informed by your own custom data.

### Create the RAG pipeline files

Expand All @@ -360,6 +366,7 @@ Despite the naming, these RAG pipeline files are not related to the Kubeflow pip
```file {title="pipeline-requirements.txt"}
requests
pymilvus
opencv-python-headless
llama-index
llama-index-vector-stores-milvus
llama-index-embeddings-huggingface
Expand All @@ -368,7 +375,7 @@ Despite the naming, these RAG pipeline files are not related to the Kubeflow pip

1. Create a file called `rag_pipeline.py` with the following contents. The filenames of both the `pipeline-requirements.txt` and `rag_pipeline.py` files should not be changed as they are referenced within the Open WebUI Pipeline configuration file.

```file {title="rag-pipeline.py"}
```file {title="rag_pipeline.py"}
"""
title: RAG Pipeline
version: 1.0
Expand Down Expand Up @@ -594,4 +601,6 @@ Now that the chatbot has been configured, the final step is to access the chatbo

- The **RAG Pipeline** model that you defined in a previous section does use data from your custom dataset. Ask it a question relevant to your data and the chatbot should respond with an answer that is informed by the custom dataset you configured.

![Screenshot of a RAG Pipeline query in Open WebUI](open-webui-rag-pipeline.jpg)
![Screenshot of a RAG Pipeline query in Open WebUI](open-webui-rag-pipeline.jpg)

The response time depends on a variety of factors. Using similar cluster resources and the same dataset as this guide, an estimated response time is between 6 to 70 seconds.
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ This guide shows how to set up a Harbor registry on a dedicated compute instance

The Harbor installation in this guide assumes that you have [a domain name registered through a domain registrar](/docs/products/networking/dns-manager/get-started/#register-the-domain), and that you can edit the DNS records for this domain. This is so that SSL connections can be configured for the Harbor server. If you do not have a domain name, register one now.

The infrastructure for this guide is created on the Akamai Connected Cloud platform. If you do not already have one, [create an account](/docs/products/platform/get-started/) for the platform.
The infrastructure for this guide is created on the Akamai Cloud platform. If you do not already have one, [create an account](/docs/products/platform/get-started/) for the platform.

The following is a summary of the infrastructure created in this guide. Instructions for creating these services are included later in the guide:

Expand Down
Loading
Loading