You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to <ahref="https://www.intel.com/content/www/us/en/products/docs/accelerator-engines/enterprise-ai.html"> Intel® AI for Enterprise Inference </a> offerings.
109
-
110
-
Set the following environment variables.
111
-
112
-
-`llm_endpoint_url` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.inference.denvrdata.com). **Note:** If not using LiteLLM, the second part of the model card needs to be appended to the URL i.e. `/Llama-3.3-70B-Instruct` from `meta-llama/Llama-3.3-70B-Instruct`.
113
-
-`llm_endpoint_api_key` is the access token or key to access the model(s) on the server.
114
-
-`LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.
Copy file name to clipboardExpand all lines: WorkflowExecAgent/docker_compose/intel/cpu/xeon/README.md
+18Lines changed: 18 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -60,6 +60,24 @@ export temperature=0
60
60
export max_new_tokens=1000
61
61
```
62
62
63
+
<details>
64
+
<summary> Using Remote LLM Endpoints </summary>
65
+
When models are deployed on a remote server, a base URL and an API key are required to access them. To set up a remote server and acquire the base URL and API key, refer to <ahref="https://www.intel.com/content/www/us/en/developer/topic-technology/artificial-intelligence/enterprise-inference.html"> Intel® AI for Enterprise Inference </a> offerings.
66
+
67
+
Set the following environment variables.
68
+
69
+
-`llm_endpoint_url` is the HTTPS endpoint of the remote server with the model of choice (i.e. https://api.inference.denvrdata.com). **Note:** If not using LiteLLM, the second part of the model card needs to be appended to the URL i.e. `/Llama-3.3-70B-Instruct` from `meta-llama/Llama-3.3-70B-Instruct`.
70
+
-`llm_endpoint_api_key` is the access token or key to access the model(s) on the server.
71
+
-`LLM_MODEL_ID` is the model card which may need to be overwritten depending on what it is set to `set_env.sh`.
For an out-of-the-box experience, this guide uses an example workflow serving API service. There are 3 services needed for the setup: the agent microservice, an LLM inference service, and the workflow serving API.
0 commit comments