Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/integration-auth-tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ jobs:

# avoid line breaks in the server log, especially because we grep it below.
export COLUMNS=1984
nohup uv run llama stack run $run_dir/run.yaml --image-type venv > server.log 2>&1 &
nohup uv run llama stack run $run_dir/run.yaml > server.log 2>&1 &

- name: Wait for Llama Stack server to be ready
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-external-provider-module.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
# Use the virtual environment created by the build step (name comes from build config)
source ramalama-stack-test/bin/activate
uv pip list
nohup llama stack run tests/external/ramalama-stack/run.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &
nohup llama stack run tests/external/ramalama-stack/run.yaml > server.log 2>&1 &

- name: Wait for Llama Stack server to be ready
run: |
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/test-external.yml
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ jobs:
# Use the virtual environment created by the build step (name comes from build config)
source ci-test/bin/activate
uv pip list
nohup llama stack run tests/external/run-byoa.yaml --image-type ${{ matrix.image-type }} > server.log 2>&1 &
nohup llama stack run tests/external/run-byoa.yaml > server.log 2>&1 &

- name: Wait for Llama Stack server to be ready
run: |
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/advanced_apis/post_training.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ You can access the HuggingFace trainer via the `starter` distribution:

```bash
llama stack build --distro starter --image-type venv
llama stack run --image-type venv ~/.llama/distributions/starter/starter-run.yaml
llama stack run ~/.llama/distributions/starter/starter-run.yaml
```

### Usage Example
Expand Down
9 changes: 3 additions & 6 deletions docs/docs/building_applications/tools.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -219,13 +219,10 @@ group_tools = client.tools.list_tools(toolgroup_id="search_tools")
<TabItem value="setup" label="Setup & Configuration">

1. Start by registering a Tavily API key at [Tavily](https://tavily.com/).
2. [Optional] Provide the API key directly to the Llama Stack server
2. [Optional] Set the API key in your environment before starting the Llama Stack server
```bash
export TAVILY_SEARCH_API_KEY="your key"
```
```bash
--env TAVILY_SEARCH_API_KEY=${TAVILY_SEARCH_API_KEY}
```

</TabItem>
<TabItem value="implementation" label="Implementation">
Expand Down Expand Up @@ -273,9 +270,9 @@ for log in EventLogger().log(response):
<TabItem value="setup" label="Setup & Configuration">

1. Start by registering for a WolframAlpha API key at [WolframAlpha Developer Portal](https://developer.wolframalpha.com/access).
2. Provide the API key either when starting the Llama Stack server:
2. Provide the API key either by setting it in your environment before starting the Llama Stack server:
```bash
--env WOLFRAM_ALPHA_API_KEY=${WOLFRAM_ALPHA_API_KEY}
export WOLFRAM_ALPHA_API_KEY="your key"
```
or from the client side:
```python
Expand Down
2 changes: 1 addition & 1 deletion docs/docs/contributing/new_api_provider.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -76,7 +76,7 @@ Integration tests are located in [tests/integration](https://github.com/meta-lla
Consult [tests/integration/README.md](https://github.com/meta-llama/llama-stack/blob/main/tests/integration/README.md) for more details on how to run the tests.

Note that each provider's `sample_run_config()` method (in the configuration class for that provider)
typically references some environment variables for specifying API keys and the like. You can set these in the environment or pass these via the `--env` flag to the test command.
typically references some environment variables for specifying API keys and the like. You can set these in the environment before running the test command.


### 2. Unit Testing
Expand Down
26 changes: 11 additions & 15 deletions docs/docs/distributions/building_distro.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -289,10 +289,10 @@ After this step is successful, you should be able to find the built container im
docker run -d \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e OLLAMA_URL=http://host.docker.internal:11434 \
localhost/distribution-ollama:dev \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env OLLAMA_URL=http://host.docker.internal:11434
--port $LLAMA_STACK_PORT
```

Here are the docker flags and their uses:
Expand All @@ -305,11 +305,11 @@ Here are the docker flags and their uses:

* `localhost/distribution-ollama:dev`: The name and tag of the container image to run

* `--port $LLAMA_STACK_PORT`: Port number for the server to listen on
* `-e INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the INFERENCE_MODEL environment variable in the container

* `--env INFERENCE_MODEL=$INFERENCE_MODEL`: Sets the model to use for inference
* `-e OLLAMA_URL=http://host.docker.internal:11434`: Sets the OLLAMA_URL environment variable in the container

* `--env OLLAMA_URL=http://host.docker.internal:11434`: Configures the URL for the Ollama service
* `--port $LLAMA_STACK_PORT`: Port number for the server to listen on

</TabItem>
</Tabs>
Expand All @@ -320,23 +320,22 @@ Now, let's start the Llama Stack Distribution Server. You will need the YAML con

```
llama stack run -h
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME] [--env KEY=VALUE]
usage: llama stack run [-h] [--port PORT] [--image-name IMAGE_NAME]
[--image-type {venv}] [--enable-ui]
[config | template]
[config | distro]

Start the server for a Llama Stack Distribution. You should have already built (or downloaded) and configured the distribution.

positional arguments:
config | template Path to config file to use for the run or name of known template (`llama stack list` for a list). (default: None)
config | distro Path to config file to use for the run or name of known distro (`llama stack list` for a list). (default: None)

options:
-h, --help show this help message and exit
--port PORT Port to run the server on. It can also be passed via the env var LLAMA_STACK_PORT. (default: 8321)
--image-name IMAGE_NAME
Name of the image to run. Defaults to the current environment (default: None)
--env KEY=VALUE Environment variables to pass to the server in KEY=VALUE format. Can be specified multiple times. (default: None)
[DEPRECATED] This flag is no longer supported. Please activate your virtual environment before running. (default: None)
--image-type {venv}
Image Type used during the build. This should be venv. (default: None)
[DEPRECATED] This flag is no longer supported. Please activate your virtual environment before running. (default: None)
--enable-ui Start the UI server (default: False)
```

Expand All @@ -348,9 +347,6 @@ llama stack run tgi

# Start using config file
llama stack run ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml

# Start using a venv
llama stack run --image-type venv ~/.llama/distributions/llamastack-my-local-stack/my-local-stack-run.yaml
```

```
Expand Down
9 changes: 3 additions & 6 deletions docs/docs/distributions/configuration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -101,7 +101,7 @@ A few things to note:
- The id is a string you can choose freely.
- You can instantiate any number of provider instances of the same type.
- The configuration dictionary is provider-specific.
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server (via docker or via `llama stack run`), you can specify `--env OLLAMA_URL=http://my-server:11434` to override the default value.
- Notice that configuration can reference environment variables (with default values), which are expanded at runtime. When you run a stack server, you can set environment variables in your shell before running `llama stack run` to override the default values.

### Environment Variable Substitution

Expand Down Expand Up @@ -173,13 +173,10 @@ optional_token: ${env.OPTIONAL_TOKEN:+}

#### Runtime Override

You can override environment variables at runtime when starting the server:
You can override environment variables at runtime by setting them in your shell before starting the server:

```bash
# Override specific environment variables
llama stack run --config run.yaml --env API_KEY=sk-123 --env BASE_URL=https://custom-api.com

# Or set them in your shell
# Set environment variables in your shell
export API_KEY=sk-123
export BASE_URL=https://custom-api.com
llama stack run --config run.yaml
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/distributions/remote_hosted_distro/watsonx.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,10 +69,10 @@ docker run \
-it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-e WATSONX_API_KEY=$WATSONX_API_KEY \
-e WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
-e WATSONX_BASE_URL=$WATSONX_BASE_URL \
llamastack/distribution-watsonx \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env WATSONX_API_KEY=$WATSONX_API_KEY \
--env WATSONX_PROJECT_ID=$WATSONX_PROJECT_ID \
--env WATSONX_BASE_URL=$WATSONX_BASE_URL
--port $LLAMA_STACK_PORT
```
44 changes: 22 additions & 22 deletions docs/docs/distributions/self_hosted_distro/dell.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,11 +129,11 @@ docker run -it \
# NOTE: mount the llama-stack / llama-model directories if testing local changes else not needed
-v $HOME/git/llama-stack:/app/llama-stack-source -v $HOME/git/llama-models:/app/llama-models-source \
# localhost/distribution-dell:dev if building / testing locally
llamastack/distribution-dell\
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e DEH_URL=$DEH_URL \
-e CHROMA_URL=$CHROMA_URL \
llamastack/distribution-dell \
--port $LLAMA_STACK_PORT

```

Expand All @@ -154,14 +154,14 @@ docker run \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v $HOME/.llama:/root/.llama \
-v ./llama_stack/distributions/tgi/run-with-safety.yaml:/root/my-run.yaml \
-e INFERENCE_MODEL=$INFERENCE_MODEL \
-e DEH_URL=$DEH_URL \
-e SAFETY_MODEL=$SAFETY_MODEL \
-e DEH_SAFETY_URL=$DEH_SAFETY_URL \
-e CHROMA_URL=$CHROMA_URL \
llamastack/distribution-dell \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
--env CHROMA_URL=$CHROMA_URL
--port $LLAMA_STACK_PORT
```

### Via venv
Expand All @@ -170,21 +170,21 @@ Make sure you have done `pip install llama-stack` and have the Llama Stack CLI a

```bash
llama stack build --distro dell --image-type venv
llama stack run dell
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env CHROMA_URL=$CHROMA_URL
INFERENCE_MODEL=$INFERENCE_MODEL \
DEH_URL=$DEH_URL \
CHROMA_URL=$CHROMA_URL \
llama stack run dell \
--port $LLAMA_STACK_PORT
```

If you are using Llama Stack Safety / Shield APIs, use:

```bash
INFERENCE_MODEL=$INFERENCE_MODEL \
DEH_URL=$DEH_URL \
SAFETY_MODEL=$SAFETY_MODEL \
DEH_SAFETY_URL=$DEH_SAFETY_URL \
CHROMA_URL=$CHROMA_URL \
llama stack run ./run-with-safety.yaml \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=$INFERENCE_MODEL \
--env DEH_URL=$DEH_URL \
--env SAFETY_MODEL=$SAFETY_MODEL \
--env DEH_SAFETY_URL=$DEH_SAFETY_URL \
--env CHROMA_URL=$CHROMA_URL
--port $LLAMA_STACK_PORT
```
20 changes: 10 additions & 10 deletions docs/docs/distributions/self_hosted_distro/meta-reference-gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -84,9 +84,9 @@ docker run \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--port $LLAMA_STACK_PORT
```

If you are using Llama Stack Safety / Shield APIs, use:
Expand All @@ -98,10 +98,10 @@ docker run \
--gpu all \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
-e SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
llamastack/distribution-meta-reference-gpu \
--port $LLAMA_STACK_PORT \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
--port $LLAMA_STACK_PORT
```

### Via venv
Expand All @@ -110,16 +110,16 @@ Make sure you have done `uv pip install llama-stack` and have the Llama Stack CL

```bash
llama stack build --distro meta-reference-gpu --image-type venv
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
llama stack run distributions/meta-reference-gpu/run.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct
--port 8321
```

If you are using Llama Stack Safety / Shield APIs, use:

```bash
INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
SAFETY_MODEL=meta-llama/Llama-Guard-3-1B \
llama stack run distributions/meta-reference-gpu/run-with-safety.yaml \
--port 8321 \
--env INFERENCE_MODEL=meta-llama/Llama-3.2-3B-Instruct \
--env SAFETY_MODEL=meta-llama/Llama-Guard-3-1B
--port 8321
```
10 changes: 5 additions & 5 deletions docs/docs/distributions/self_hosted_distro/nvidia.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,10 +129,10 @@ docker run \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ./run.yaml:/root/my-run.yaml \
-e NVIDIA_API_KEY=$NVIDIA_API_KEY \
llamastack/distribution-nvidia \
--config /root/my-run.yaml \
--port $LLAMA_STACK_PORT \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY
--port $LLAMA_STACK_PORT
```

### Via venv
Expand All @@ -142,10 +142,10 @@ If you've set up your local development environment, you can also build the imag
```bash
INFERENCE_MODEL=meta-llama/Llama-3.1-8B-Instruct
llama stack build --distro nvidia --image-type venv
NVIDIA_API_KEY=$NVIDIA_API_KEY \
INFERENCE_MODEL=$INFERENCE_MODEL \
llama stack run ./run.yaml \
--port 8321 \
--env NVIDIA_API_KEY=$NVIDIA_API_KEY \
--env INFERENCE_MODEL=$INFERENCE_MODEL
--port 8321
```

## Example Notebooks
Expand Down
8 changes: 4 additions & 4 deletions docs/docs/getting_started/detailed_tutorial.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -86,9 +86,9 @@ docker run -it \
--pull always \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
-e OLLAMA_URL=http://host.docker.internal:11434 \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \
--env OLLAMA_URL=http://host.docker.internal:11434
--port $LLAMA_STACK_PORT
```
Note to start the container with Podman, you can do the same but replace `docker` at the start of the command with
`podman`. If you are using `podman` older than `4.7.0`, please also replace `host.docker.internal` in the `OLLAMA_URL`
Expand All @@ -106,9 +106,9 @@ docker run -it \
-p $LLAMA_STACK_PORT:$LLAMA_STACK_PORT \
-v ~/.llama:/root/.llama \
--network=host \
-e OLLAMA_URL=http://localhost:11434 \
llamastack/distribution-starter \
--port $LLAMA_STACK_PORT \
--env OLLAMA_URL=http://localhost:11434
--port $LLAMA_STACK_PORT
```
:::
You will see output like below:
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -123,12 +123,12 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"# this command installs all the dependencies needed for the llama stack server with the together inference provider\n",
"!uv run --with llama-stack llama stack build --distro together --image-type venv\n",
"!uv run --with llama-stack llama stack build --distro together\n",
"\n",
"def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" \"uv run --with llama-stack llama stack run together --image-type venv\",\n",
" \"uv run --with llama-stack llama stack run together\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started_llama4.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -233,12 +233,12 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"# this command installs all the dependencies needed for the llama stack server\n",
"!uv run --with llama-stack llama stack build --distro meta-reference-gpu --image-type venv\n",
"!uv run --with llama-stack llama stack build --distro meta-reference-gpu\n",
"\n",
"def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" f\"uv run --with llama-stack llama stack run meta-reference-gpu --image-type venv --env INFERENCE_MODEL={model_id}\",\n",
" f\"INFERENCE_MODEL={model_id} uv run --with llama-stack llama stack run meta-reference-gpu\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
Expand Down
4 changes: 2 additions & 2 deletions docs/getting_started_llama_api.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -223,12 +223,12 @@
" del os.environ[\"UV_SYSTEM_PYTHON\"]\n",
"\n",
"# this command installs all the dependencies needed for the llama stack server\n",
"!uv run --with llama-stack llama stack build --distro llama_api --image-type venv\n",
"!uv run --with llama-stack llama stack build --distro llama_api\n",
"\n",
"def run_llama_stack_server_background():\n",
" log_file = open(\"llama_stack_server.log\", \"w\")\n",
" process = subprocess.Popen(\n",
" \"uv run --with llama-stack llama stack run llama_api --image-type venv\",\n",
" \"uv run --with llama-stack llama stack run llama_api\",\n",
" shell=True,\n",
" stdout=log_file,\n",
" stderr=log_file,\n",
Expand Down
Loading
Loading