Skip to content

Fix workexec agent docker build issues and enable LLM Remote Endpoint #2103

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 15, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,24 @@ export temperature=0
export max_new_tokens=1000
```

<details>
<summary> Using Remote LLM Endpoints </summary>
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to <a href="https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html"> Intel® AI for Enterprise Inference </a> offerings.

Set the following environment variables.

- `llm_endpoint_url` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.inference.denvrdata.com). **Note:** If not using LiteLLM, the second part of the model card needs to be appended to the URL i.e. `/Llama-3.3-70B-Instruct` from `meta-llama/Llama-3.3-70B-Instruct`.
- `llm_endpoint_api_key` is the access token or key to access the model(s) on the server.
- `LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.

```bash
export llm_endpoint_url=<https-endpoint-of-remote-server>
export llm_endpoint_api_key=<your-api-key>
export LLM_MODEL_ID=<model-card>
```

</details>

### Deploy the Services Using Docker Compose

For an out-of-the-box experience, this guide uses an example workflow serving API service. There are 3 services needed for the setup: the agent microservice, an LLM inference service, and the workflow serving API.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ services:
recursion_limit: ${recursion_limit}
llm_engine: ${llm_engine}
llm_endpoint_url: ${llm_endpoint_url}
api_key: ${llm_endpoint_api_key}
model: ${model}
temperature: ${temperature}
max_new_tokens: ${max_new_tokens}
Expand Down
8 changes: 4 additions & 4 deletions WorkflowExecAgent/tests/2_start_vllm_service.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@ function build_vllm_docker_image() {
function start_vllm_service() {
echo "start vllm service"
export VLLM_SKIP_WARMUP=true
docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it vllm-cpu-env --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral
docker run -d -p ${vllm_port}:${vllm_port} --rm --network=host --name test-comps-vllm-service -v ~/.cache/huggingface:/root/.cache/huggingface -v ${WORKPATH}/tests/tool_chat_template_mistral_custom.jinja:/root/tool_chat_template_mistral_custom.jinja -e HF_TOKEN=$HF_TOKEN -e http_proxy=$http_proxy -e https_proxy=$https_proxy -it public.ecr.aws/q9t5s3a7/vllm-cpu-release-repo:v0.10.0 --model ${model} --port ${vllm_port} --chat-template /root/tool_chat_template_mistral_custom.jinja --enable-auto-tool-choice --tool-call-parser mistral
echo ${LOG_PATH}/vllm-service.log
sleep 10s
echo "Waiting vllm ready"
Expand All @@ -64,9 +64,9 @@ function start_vllm_service() {
}

function main() {
echo "==================== Build vllm docker image ===================="
build_vllm_docker_image
echo "==================== Build vllm docker image completed ===================="
# echo "==================== Build vllm docker image ===================="
# build_vllm_docker_image
# echo "==================== Build vllm docker image completed ===================="

echo "==================== Start vllm docker service ===================="
start_vllm_service
Expand Down
1 change: 1 addition & 0 deletions WorkflowExecAgent/tests/3_launch_and_validate_agent.sh
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ export HF_TOKEN=${HF_TOKEN}
export llm_engine=vllm
export ip_address=$(hostname -I | awk '{print $1}')
export llm_endpoint_url=http://${ip_address}:${vllm_port}
export api_key=""
export model=mistralai/Mistral-7B-Instruct-v0.3
export recursion_limit=25
export temperature=0
Expand Down