-
Notifications
You must be signed in to change notification settings - Fork 296
Support denvr endpoints with Litellm. #2085
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d87c288
e5432bb
a13e1d4
42b57fb
e04febb
ce89a31
4329ea2
faf4810
e640a71
6df5d64
53fc0f1
e33cf84
f125973
7a0cdf7
96ac978
3908dc3
3a323bb
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,6 +73,17 @@ CPU example with Open Telemetry feature: | |
docker compose -f compose.yaml -f compose.telemetry.yaml up -d | ||
``` | ||
|
||
To deploy ChatQnA services with remote endpoints, set the required environment variables mentioned below and run the 'compose_remote.yaml' file. | ||
|
||
**Note**: Set REMOTE_ENDPOINT variable value to "https://api.inference.denvrdata.com" when the remote endpoint to access is "https://api.inference.denvrdata.com/v1/chat/completions" | ||
|
||
```bash | ||
export REMOTE_ENDPOINT=<endpoint-url> | ||
export LLM_MODEL_ID=<model-id> | ||
export OPENAI_API_KEY=<API-KEY> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. good to change it from OPENAI_API_KEY to API_KEY since it is not for openai |
||
docker compose -f compose_remote.yaml up -d | ||
``` | ||
|
||
**Note**: developers should build docker image from source when: | ||
|
||
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image). | ||
|
@@ -147,6 +158,7 @@ In the context of deploying a ChatQnA pipeline on an Intel® Xeon® platform, we | |
| File | Description | | ||
| ------------------------------------------------------------ | --------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework and redis as vector database | | ||
| [compose_remote.yaml](./compose_remote.yaml) | Default compose file using remote inference endpoints and redis as vector database | | ||
| [compose_milvus.yaml](./compose_milvus.yaml) | Uses Milvus as the vector database. All other configurations remain the same as the default | | ||
| [compose_pinecone.yaml](./compose_pinecone.yaml) | Uses Pinecone as the vector database. All other configurations remain the same as the default. For more details, refer to [README_pinecone.md](./README_pinecone.md). | | ||
| [compose_qdrant.yaml](./compose_qdrant.yaml) | Uses Qdrant as the vector database. All other configurations remain the same as the default. For more details, refer to [README_qdrant.md](./README_qdrant.md). | | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -91,11 +91,27 @@ Different Docker Compose files are available to select the LLM serving backend. | |
- **Description:** Uses Hugging Face Text Generation Inference (TGI) optimized for Intel CPUs as the LLM serving engine. | ||
- **Services Deployed:** `codegen-tgi-server`, `codegen-llm-server`, `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`. | ||
- **To Run:** | ||
|
||
```bash | ||
# Ensure environment variables (HOST_IP, HF_TOKEN) are set | ||
docker compose -f compose_tgi.yaml up -d | ||
``` | ||
|
||
#### Deployment with remote endpoints (`compose_remote.yaml`) | ||
|
||
- **Compose File:** `compose_remote.yaml` | ||
- **Description:** Uses remote endpoints to access the served LLM's. This is the default configurations except for the LLM serving engine. | ||
- **Services Deployed:** `codegen-tei-embedding-server`, `codegen-retriever-server`, `redis-vector-db`, `codegen-dataprep-server`, `codegen-backend-server`, `codegen-gradio-ui-server`. | ||
- **To Run:** | ||
```bash | ||
export OPENAI_API_KEY=<api-key>> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OPENAI name here is confusing. good to remove OPENAI term here. |
||
export REMOTE_ENDPOINT=<remote-endpoint> #do not include /v1 | ||
export LLM_MODEL_ID=<model-id> | ||
docker compose -f compose_remote.yaml up -d | ||
``` | ||
### Configuration Parameters | ||
#### Environment Variables | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -52,6 +52,18 @@ cd intel/cpu/xeon/ | |
docker compose up -d | ||
``` | ||
|
||
To deploy DocSum services with remote endpoints, set the required environment variables mentioned below and run the 'compose_remote.yaml' file. | ||
|
||
**Note**: Set LLM_ENDPOINT variable value to "https://api.inference.denvrdata.com" when the remote endpoint to access is "https://api.inference.denvrdata.com/v1/chat/completions" | ||
|
||
```bash | ||
export LLM_ENDPOINT=<endpoint-url> | ||
export LLM_MODEL_ID=<model-id> | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Better to mention how users to set LLM_MODEL_ID correctly by getting the supported model list |
||
export OPENAI_API_KEY=<API-KEY> | ||
|
||
docker compose -f compose_remote.yaml up -d | ||
``` | ||
|
||
**Note**: developers should build docker image from source when: | ||
|
||
- Developing off the git main branch (as the container's ports in the repo may be different from the published docker image). | ||
|
@@ -113,10 +125,11 @@ All the DocSum containers will be stopped and then removed on completion of the | |
|
||
In the context of deploying a DocSum pipeline on an Intel® Xeon® platform, we can pick and choose different large language model serving frameworks. The table below outlines the various configurations that are available as part of the application. | ||
|
||
| File | Description | | ||
| -------------------------------------- | ----------------------------------------------------------------------------------------- | | ||
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework | | ||
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as the default | | ||
| File | Description | | ||
| -------------------------------------------- | -------------------------------------------------------------------------------------- | | ||
| [compose.yaml](./compose.yaml) | Default compose file using vllm as serving framework | | ||
| [compose_tgi.yaml](./compose_tgi.yaml) | The LLM serving framework is TGI. All other configurations remain the same as default | | ||
| [compose_remote.yaml](./compose_remote.yaml) | Uses remote inference endpoints for LLMs. All other configurations are same as default | | ||
|
||
## DocSum Detailed Usage | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,73 @@ | ||
# Copyright (C) 2024 Intel Corporation | ||
# SPDX-License-Identifier: Apache-2.0 | ||
|
||
services: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I suggest to provide a corresponding test script like |
||
llm-docsum-vllm: | ||
image: ${REGISTRY:-opea}/llm-docsum:${TAG:-latest} | ||
container_name: docsum-xeon-llm-server | ||
ports: | ||
- ${LLM_PORT:-9000}:9000 | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
LLM_ENDPOINT: ${LLM_ENDPOINT} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this be: LLM_ENDPOINT: ${REMOTE_ENDPOINT} |
||
LLM_MODEL_ID: ${LLM_MODEL_ID} | ||
OPENAI_API_KEY: ${OPENAI_API_KEY} | ||
HUGGINGFACEHUB_API_TOKEN: ${HF_TOKEN} | ||
HF_TOKEN: ${HF_TOKEN} | ||
MAX_INPUT_TOKENS: ${MAX_INPUT_TOKENS} | ||
MAX_TOTAL_TOKENS: ${MAX_TOTAL_TOKENS} | ||
DocSum_COMPONENT_NAME: ${DocSum_COMPONENT_NAME} | ||
LOGFLAG: ${LOGFLAG:-False} | ||
restart: unless-stopped | ||
|
||
whisper: | ||
image: ${REGISTRY:-opea}/whisper:${TAG:-latest} | ||
container_name: docsum-xeon-whisper-server | ||
ports: | ||
- "7066:7066" | ||
ipc: host | ||
environment: | ||
no_proxy: ${no_proxy} | ||
http_proxy: ${http_proxy} | ||
https_proxy: ${https_proxy} | ||
restart: unless-stopped | ||
|
||
docsum-xeon-backend-server: | ||
image: ${REGISTRY:-opea}/docsum:${TAG:-latest} | ||
container_name: docsum-xeon-backend-server | ||
depends_on: | ||
- llm-docsum-vllm | ||
ports: | ||
- "${BACKEND_SERVICE_PORT:-8888}:8888" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- MEGA_SERVICE_HOST_IP=${MEGA_SERVICE_HOST_IP} | ||
- LLM_SERVICE_HOST_IP=${LLM_SERVICE_HOST_IP} | ||
- ASR_SERVICE_HOST_IP=${ASR_SERVICE_HOST_IP} | ||
ipc: host | ||
restart: always | ||
|
||
docsum-gradio-ui: | ||
image: ${REGISTRY:-opea}/docsum-gradio-ui:${TAG:-latest} | ||
container_name: docsum-xeon-ui-server | ||
depends_on: | ||
- docsum-xeon-backend-server | ||
ports: | ||
- "${FRONTEND_SERVICE_PORT:-5173}:5173" | ||
environment: | ||
- no_proxy=${no_proxy} | ||
- https_proxy=${https_proxy} | ||
- http_proxy=${http_proxy} | ||
- BACKEND_SERVICE_ENDPOINT=${BACKEND_SERVICE_ENDPOINT} | ||
- DOC_BASE_URL=${BACKEND_SERVICE_ENDPOINT} | ||
ipc: host | ||
restart: always | ||
|
||
networks: | ||
default: | ||
driver: bridge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good to put a notice why we need to set LLM_MODEL_ID again and users need to pick one supported by remote endpoint.