Skip to content

Conversation

justinwlin
Copy link
Contributor

What does this PR do?

Sorry to @mattf I thought I could close the other PR and reopen it.. But I didn't have the option to reopen it now. I just didn't want it to keep notifying maintainers if I would make other commits for testing.

Continuation of: #3641

PR fixes Runpod Adapter
#3517

What I fixed from before:

Continuation of: #3641

  1. Made it all OpenAI
  2. Fixed the class up since the OpenAIMixin had a couple changes with the pydantic base model stuff.
  3. Test to make sure that we could dynamically find models and use the resulting identifier to make requests
curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"

Test Plan

# RunPod Provider Quick Start

## Prerequisites
- Python 3.10+
- Git
- RunPod API token

## Setup for Development

```bash
# 1. Clone and enter the repository
cd (into the repo)

# 2. Create and activate virtual environment
python3 -m venv .venv
source .venv/bin/activate

# 3. Remove any existing llama-stack installation
pip uninstall llama-stack llama-stack-client -y

# 4. Install llama-stack in development mode
pip install -e .

# 5. Build using local development code
(Found this through the Discord)
LLAMA_STACK_DIR=. llama stack build

# When prompted during build:
# - Name: runpod-dev
# - Image type: venv
# - Inference provider: remote::runpod
# - Safety provider: "llama-guard" 
# - Other providers: first defaults

Configure the Stack

The RunPod adapter automatically discovers models from your endpoint via the /v1/models API.
No manual model configuration is required - just set your environment variables.

Run the Server

Important: Use the Build-Created Virtual Environment

# Exit the development venv if you're in it
deactivate

# Activate the build-created venv (NOT .venv)
cd (lama-stack folder github repo)
source llamastack-runpod-dev/bin/activate

For Qwen3-32B-AWQ Public Endpoint (Recommended)

# Set environment variables
export RUNPOD_URL="https://api.runpod.ai/v2/qwen3-32b-awq/openai/v1"
export RUNPOD_API_TOKEN="your_runpod_api_key"

# Start server
llama stack run ~/.llama/distributions/llamastack-runpod-dev/llamastack-runpod-dev-run.yaml

Quick Test

1. List Available Models (Dynamic Discovery)

First, check which models are available on your RunPod endpoint:

curl -X GET \
  -H "Content-Type: application/json" \
  "http://localhost:8321/v1/models"

Example Response:

{
  "data": [
    {
      "identifier": "qwen3-32b-awq",
      "provider_resource_id": "Qwen/Qwen3-32B-AWQ",
      "provider_id": "runpod",
      "type": "model",
      "metadata": {},
      "model_type": "llm"
    }
  ]
}

Note: Use the identifier value from the response above in your requests below.

2. Chat Completion (Non-streaming)

Replace qwen3-32b-awq with your model identifier from step 1:

curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Hello, count to 3"}],
    "stream": false
  }'

3. Chat Completion (Streaming)

curl -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen3-32b-awq",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Clean streaming output:

curl -N -X POST http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model": "qwen3-32b-awq", "messages": [{"role": "user", "content": "Count to 5"}], "stream": true}' \
  2>/dev/null | while read -r line; do
    echo "$line" | grep "^data: " | sed 's/^data: //' | jq -r '.choices[0].delta.content // empty' 2>/dev/null
  done

Expected Output:

1
2
3
4
5

We can just use the default, runpod embedding endpoint for vllm is nothing special and just passes through to vllm
Copy link
Collaborator

@mattf mattf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good to me

@mattf mattf changed the title Runpod adapter fix feat: enable Runpod inference adapter Oct 7, 2025
@leseb leseb merged commit 509ac4a into llamastack:main Oct 7, 2025
21 of 22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants