Skip to content

Commit fdafbd6

Browse files
committed
chore: remove --env from llama stack run
# What does this PR do? ## Test Plan
1 parent a8da6ba commit fdafbd6

File tree

18 files changed

+104
-166
lines changed

18 files changed

+104
-166
lines changed

docs/docs/building_applications/tools.mdx

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -219,13 +219,10 @@ group_tools = client.tools.list_tools(toolgroup_id="search_tools")
219219
<TabItem value="setup" label="Setup & Configuration">
220220

221221
1. Start by registering a Tavily API key at [Tavily](https://tavily.com/).
222-
2. [Optional] Provide the API key directly to the Llama Stack server
222+
2. [Optional] Set the API key in your environment before starting the Llama Stack server
223223
```bash
224224
export TAVILY_SEARCH_API_KEY="your key"
225225
```
226-
```bash
227-
--env TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY}
228-
```
229226

230227
</TabItem>
231228
<TabItem value="implementation" label="Implementation">
@@ -273,9 +270,9 @@ for log in EventLogger().log(response):
273270
<TabItem value="setup" label="Setup & Configuration">
274271

275272
1. Start by registering for a WolframAlpha API key at [WolframAlpha Developer Portal](https://developer.wolframalpha.com/access).
276-
2. Provide the API key either when starting the Llama Stack server:
273+
2. Provide the API key either by setting it in your environment before starting the Llama Stack server:
277274
```bash
278-
--env WOLFRAM_ALPHA_API_KEY=${WOLFRAM_ALPHA_API_KEY}
275+
export WOLFRAM_ALPHA_API_KEY="your key"
279276
```
280277
or from the client side:
281278
```python

docs/docs/contributing/new_api_provider.mdx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ Integration tests are located in [tests/integration](https://github.com/meta-lla
7676
Consult [tests/integration/README.md](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more details on how to run the tests.
7777

7878
Note that each provider's `sample_run_config()` method (in the configuration class for that provider)
79-
typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
79+
typically references some environment variables for specifying API keys and the like. You can set these in the environment before running the test command.
8080

8181

8282
### 2. Unit Testing

docs/docs/distributions/building_distro.mdx

Lines changed: 7 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -289,10 +289,10 @@ After this step is successful, you should be able to find the built container im
289289
docker run -d \
290290
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
291291
-v ~/.llama:/root/.llama \
292+
-e INFERENCE_MODEL=$INFERENCE_MODEL \
293+
-e OLLAMA_URL=http://host.docker.internal:11434 \
292294
localhost/distribution-ollama:dev \
293-
--port $LLAMA_STACK_PORT \
294-
--env INFERENCE_MODEL=$INFERENCE_MODEL \
295-
--env OLLAMA_URL=http://host.docker.internal:11434
295+
--port $LLAMA_STACK_PORT
296296
```
297297

298298
Here are the docker flags and their uses:
@@ -305,11 +305,11 @@ Here are the docker flags and their uses:
305305

306306
* `localhost/distribution-ollama:dev`: The name and tag of the container image to run
307307

308-
* `--port $LLAMA_STACK_PORT`: Port number for the server to listen on
308+
* `-e INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the INFERENCE_MODEL environment variable in the container
309309

310-
* `--env INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the model to use for inference
310+
* `-e OLLAMA_URL=http://host.docker.internal:11434`: Sets the OLLAMA_URL environment variable in the container
311311

312-
* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
312+
* `--port $LLAMA_STACK_PORT`: Port number for the server to listen on
313313

314314
</TabItem>
315315
</Tabs>
@@ -320,7 +320,7 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con
320320

321321
```
322322
llama stack run -h
323-
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
323+
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME]
324324
[--image-type {venv}] [--enable-ui]
325325
[config | template]
326326
@@ -334,7 +334,6 @@ options:
334334
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
335335
--image-name IMAGE_NAME
336336
Name of the image to run. Defaults to the current environment (default: None)
337-
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
338337
--image-type {venv}
339338
Image Type used during the build. This should be venv. (default: None)
340339
--enable-ui Start the UI server (default: False)

docs/docs/distributions/configuration.mdx

Lines changed: 3 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ A few things to note:
101101
- The id is a string you can choose freely.
102102
- You can instantiate any number of provider instances of the same type.
103103
- The configuration dictionary is provider-specific.
104-
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
104+
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server, you can set environment variables in your shell before running `llama stack run` to override the default values.
105105

106106
### Environment Variable Substitution
107107

@@ -173,13 +173,10 @@ optional_token: ${env.OPTIONAL_TOKEN:+}
173173

174174
#### Runtime Override
175175

176-
You can override environment variables at runtime when starting the server:
176+
You can override environment variables at runtime by setting them in your shell before starting the server:
177177

178178
```bash
179-
# Override specific environment variables
180-
llama stack run --config run.yaml --env API_KEY=sk-123 --env BASE_URL=https://custom-api.com
181-
182-
# Or set them in your shell
179+
# Set environment variables in your shell
183180
export API_KEY=sk-123
184181
export BASE_URL=https://custom-api.com
185182
llama stack run --config run.yaml

docs/docs/distributions/remote_hosted_distro/watsonx.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -69,10 +69,10 @@ docker run \
6969
-it \
7070
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
7171
-v ./run.yaml:/root/my-run.yaml \
72+
-e WATSONX_API_KEY=$WATSONX_API_KEY \
73+
-e WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
74+
-e WATSONX_BASE_URL=$WATSONX_BASE_URL \
7275
llamastack/distribution-watsonx \
7376
--config /root/my-run.yaml \
74-
--port $LLAMA_STACK_PORT \
75-
--env WATSONX_API_KEY=$WATSONX_API_KEY \
76-
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
77-
--env WATSONX_BASE_URL=$WATSONX_BASE_URL
77+
--port $LLAMA_STACK_PORT
7878
```

docs/docs/distributions/self_hosted_distro/dell.md

Lines changed: 22 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -129,11 +129,11 @@ docker run -it \
129129
# NOTE: mount the llama-stack / llama-model directories if testing local changes else not needed
130130
-v $HOME/git/llama-stack:/app/llama-stack-source -v $HOME/git/llama-models:/app/llama-models-source \
131131
# localhost/distribution-dell:dev if building / testing locally
132-
llamastack/distribution-dell\
133-
--port $LLAMA_STACK_PORT \
134-
--env INFERENCE_MODEL=$INFERENCE_MODEL \
135-
--env DEH_URL=$DEH_URL \
136-
--env CHROMA_URL=$CHROMA_URL
132+
-e INFERENCE_MODEL=$INFERENCE_MODEL \
133+
-e DEH_URL=$DEH_URL \
134+
-e CHROMA_URL=$CHROMA_URL \
135+
llamastack/distribution-dell \
136+
--port $LLAMA_STACK_PORT
137137

138138
```
139139

@@ -154,14 +154,14 @@ docker run \
154154
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
155155
-v $HOME/.llama:/root/.llama \
156156
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
157+
-e INFERENCE_MODEL=$INFERENCE_MODEL \
158+
-e DEH_URL=$DEH_URL \
159+
-e SAFETY_MODEL=$SAFETY_MODEL \
160+
-e DEH_SAFETY_URL=$DEH_SAFETY_URL \
161+
-e CHROMA_URL=$CHROMA_URL \
157162
llamastack/distribution-dell \
158163
--config /root/my-run.yaml \
159-
--port $LLAMA_STACK_PORT \
160-
--env INFERENCE_MODEL=$INFERENCE_MODEL \
161-
--env DEH_URL=$DEH_URL \
162-
--env SAFETY_MODEL=$SAFETY_MODEL \
163-
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
164-
--env CHROMA_URL=$CHROMA_URL
164+
--port $LLAMA_STACK_PORT
165165
```
166166

167167
### Via venv
@@ -170,21 +170,21 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a
170170

171171
```bash
172172
llama stack build --distro dell --image-type venv
173-
llama stack run dell
174-
--port $LLAMA_STACK_PORT \
175-
--env INFERENCE_MODEL=$INFERENCE_MODEL \
176-
--env DEH_URL=$DEH_URL \
177-
--env CHROMA_URL=$CHROMA_URL
173+
INFERENCE_MODEL=$INFERENCE_MODEL \
174+
DEH_URL=$DEH_URL \
175+
CHROMA_URL=$CHROMA_URL \
176+
llama stack run dell \
177+
--port $LLAMA_STACK_PORT
178178
```
179179

180180
If you are using Llama Stack Safety / Shield APIs, use:
181181

182182
```bash
183+
INFERENCE_MODEL=$INFERENCE_MODEL \
184+
DEH_URL=$DEH_URL \
185+
SAFETY_MODEL=$SAFETY_MODEL \
186+
DEH_SAFETY_URL=$DEH_SAFETY_URL \
187+
CHROMA_URL=$CHROMA_URL \
183188
llama stack run ./run-with-safety.yaml \
184-
--port $LLAMA_STACK_PORT \
185-
--env INFERENCE_MODEL=$INFERENCE_MODEL \
186-
--env DEH_URL=$DEH_URL \
187-
--env SAFETY_MODEL=$SAFETY_MODEL \
188-
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
189-
--env CHROMA_URL=$CHROMA_URL
189+
--port $LLAMA_STACK_PORT
190190
```

docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md

Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -84,9 +84,9 @@ docker run \
8484
--gpu all \
8585
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
8686
-v ~/.llama:/root/.llama \
87+
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
8788
llamastack/distribution-meta-reference-gpu \
88-
--port $LLAMA_STACK_PORT \
89-
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
89+
--port $LLAMA_STACK_PORT
9090
```
9191

9292
If you are using Llama Stack Safety / Shield APIs, use:
@@ -98,10 +98,10 @@ docker run \
9898
--gpu all \
9999
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
100100
-v ~/.llama:/root/.llama \
101+
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
102+
-e SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
101103
llamastack/distribution-meta-reference-gpu \
102-
--port $LLAMA_STACK_PORT \
103-
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
104-
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
104+
--port $LLAMA_STACK_PORT
105105
```
106106

107107
### Via venv
@@ -110,16 +110,16 @@ Make sure you have done `uv pip install llama-stack` and have the Llama Stack CL
110110

111111
```bash
112112
llama stack build --distro meta-reference-gpu --image-type venv
113+
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
113114
llama stack run distributions/meta-reference-gpu/run.yaml \
114-
--port 8321 \
115-
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
115+
--port 8321
116116
```
117117

118118
If you are using Llama Stack Safety / Shield APIs, use:
119119

120120
```bash
121+
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
122+
SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
121123
llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
122-
--port 8321 \
123-
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
124-
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
124+
--port 8321
125125
```

docs/docs/distributions/self_hosted_distro/nvidia.md

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -129,10 +129,10 @@ docker run \
129129
--pull always \
130130
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
131131
-v ./run.yaml:/root/my-run.yaml \
132+
-e NVIDIA_API_KEY=$NVIDIA_API_KEY \
132133
llamastack/distribution-nvidia \
133134
--config /root/my-run.yaml \
134-
--port $LLAMA_STACK_PORT \
135-
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
135+
--port $LLAMA_STACK_PORT
136136
```
137137

138138
### Via venv
@@ -142,10 +142,10 @@ If you've set up your local development environment, you can also build the imag
142142
```bash
143143
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
144144
llama stack build --distro nvidia --image-type venv
145+
NVIDIA_API_KEY=$NVIDIA_API_KEY \
146+
INFERENCE_MODEL=$INFERENCE_MODEL \
145147
llama stack run ./run.yaml \
146-
--port 8321 \
147-
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
148-
--env INFERENCE_MODEL=$INFERENCE_MODEL
148+
--port 8321
149149
```
150150

151151
## Example Notebooks

docs/docs/getting_started/detailed_tutorial.mdx

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -86,9 +86,9 @@ docker run -it \
8686
--pull always \
8787
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
8888
-v ~/.llama:/root/.llama \
89+
-e OLLAMA_URL=http://host.docker.internal:11434 \
8990
llamastack/distribution-starter \
90-
--port $LLAMA_STACK_PORT \
91-
--env OLLAMA_URL=http://host.docker.internal:11434
91+
--port $LLAMA_STACK_PORT
9292
```
9393
Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with
9494
`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
@@ -106,9 +106,9 @@ docker run -it \
106106
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
107107
-v ~/.llama:/root/.llama \
108108
--network=host \
109+
-e OLLAMA_URL=http://localhost:11434 \
109110
llamastack/distribution-starter \
110-
--port $LLAMA_STACK_PORT \
111-
--env OLLAMA_URL=http://localhost:11434
111+
--port $LLAMA_STACK_PORT
112112
```
113113
:::
114114
You will see output like below:

docs/getting_started_llama4.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -238,7 +238,7 @@
238238
"def run_llama_stack_server_background():\n",
239239
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
240240
" process = subprocess.Popen(\n",
241-
" f\"uv run --with llama-stack llama stack run meta-reference-gpu --image-type venv --env INFERENCE_MODEL={model_id}\",\n",
241+
" f\"INFERENCE_MODEL={model_id} uv run --with llama-stack llama stack run meta-reference-gpu --image-type venv\",\n",
242242
" shell=True,\n",
243243
" stdout=log_file,\n",
244244
" stderr=log_file,\n",

0 commit comments

Comments
 (0)