Skip to content

Commit be3e1f1

Browse files
committed
Enable Remote Endpoints for LLM using Intel Enterprise Inference
Signed-off-by: Tsai, Louie <[email protected]>
1 parent 477a6ea commit be3e1f1

File tree

4 files changed

+30
-9
lines changed

4 files changed

+30
-9
lines changed

WorkflowExecAgent/README.md

Lines changed: 24 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -72,9 +72,9 @@ And finally here are the results from the microservice logs:
7272

7373
### Start Agent Microservice
7474

75-
Workflow Executor will have a single docker image.
75+
Workflow Executor will have a single docker image.
7676

77-
(Optional) Build the agent docker image with the most latest changes.
77+
(Optional) Build the agent docker image with the most latest changes.
7878
By default, Workflow Executor uses public [opea/vllm](https://hub.docker.com/r/opea/agent) docker image if no local built image exists.
7979

8080
```sh
@@ -85,6 +85,24 @@ cd GenAIExamples//WorkflowExecAgent/docker_image_build/
8585
docker compose -f build.yaml build --no-cache
8686
```
8787

88+
<details>
89+
<summary> Using Remote LLM Endpoints </summary>
90+
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to <a href="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/enterprise-ai.html"> Intel® AI for Enterprise Inference </a> offerings.
91+
92+
Set the following environment variables.
93+
94+
- `llm_endpoint_url` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.inference.denvrdata.com). **Note:** If not using LiteLLM, the second part of the model card needs to be appended to the URL i.e. `/Llama-3.3-70B-Instruct` from `meta-llama/Llama-3.3-70B-Instruct`.
95+
- `llm_endpoint_api_key` is the access token or key to access the model(s) on the server.
96+
- `LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.
97+
98+
```bash
99+
export llm_endpoint_url=<https-endpoint-of-remote-server>
100+
export llm_endpoint_api_key=<your-api-key>
101+
export LLM_MODEL_ID=<model-card>
102+
```
103+
104+
</details>
105+
88106
Configure `GenAIExamples/WorkflowExecAgent/docker_compose/.env` file with the following. Replace the variables according to your usecase.
89107

90108
```sh
@@ -93,8 +111,9 @@ export SERVING_TOKEN=${SERVING_TOKEN}
93111
export HF_TOKEN=${HF_TOKEN}
94112
export llm_engine=vllm
95113
export llm_endpoint_url=${llm_endpoint_url}
114+
export api_key=${llm_endpoint_api_key:-""}
96115
export ip_address=$(hostname -I | awk '{print $1}')
97-
export model="mistralai/Mistral-7B-Instruct-v0.3"
116+
export model=${LLM_MODEL_ID:-"mistralai/Mistral-7B-Instruct-v0.3"}
98117
export recursion_limit=${recursion_limit}
99118
export temperature=0
100119
export max_new_tokens=1000
@@ -103,9 +122,9 @@ export TOOLSET_PATH=$WORKDIR/GenAIExamples/WorkflowExecAgent/tools/
103122
export http_proxy=${http_proxy}
104123
export https_proxy=${https_proxy}
105124
```
106-
> Note: SDK_BASE_URL and SERVING_TOKEN can be obtained from Intel Data Insight Automation platform.
107-
> For llm_endpoint_url, both local vllm service or an remote vllm endpoint work for the example.
108125

126+
> Note: SDK_BASE_URL and SERVING_TOKEN can be obtained from Intel Data Insight Automation platform.
127+
> For llm_endpoint_url, both local vllm service or an remote vllm endpoint work for the example.
109128
110129
Launch service by running the docker compose command.
111130

WorkflowExecAgent/docker_compose/intel/cpu/xeon/compose_vllm.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,6 +17,7 @@ services:
1717
recursion_limit: ${recursion_limit}
1818
llm_engine: ${llm_engine}
1919
llm_endpoint_url: ${llm_endpoint_url}
20+
api_key: ${API_KEY}
2021
model: ${model}
2122
temperature: ${temperature}
2223
max_new_tokens: ${max_new_tokens}

WorkflowExecAgent/tests/2_start_vllm_service.sh

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ function build_vllm_docker_image() {
3737

3838
function start_vllm_service() {
3939
echo "start vllm service"
40-
docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it vllm-cpu-env --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral
40+
docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.9.1 --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral
4141
echo ${LOG_PATH}/vllm-service.log
4242
sleep 5s
4343
echo "Waiting vllm ready"
@@ -59,9 +59,9 @@ function start_vllm_service() {
5959
}
6060

6161
function main() {
62-
echo "==================== Build vllm docker image ===================="
63-
build_vllm_docker_image
64-
echo "==================== Build vllm docker image completed ===================="
62+
# echo "==================== Build vllm docker image ===================="
63+
# build_vllm_docker_image
64+
# echo "==================== Build vllm docker image completed ===================="
6565

6666
echo "==================== Start vllm docker service ===================="
6767
start_vllm_service

WorkflowExecAgent/tests/3_launch_and_validate_agent.sh

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ export HF_TOKEN=${HF_TOKEN}
1616
export llm_engine=vllm
1717
export ip_address=$(hostname -I | awk '{print $1}')
1818
export llm_endpoint_url=http://${ip_address}:${vllm_port}
19+
export api_key=""
1920
export model=mistralai/Mistral-7B-Instruct-v0.3
2021
export recursion_limit=25
2122
export temperature=0

0 commit comments

Comments
 (0)