feat: enable Runpod inference adapter #3707
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What does this PR do?
Sorry to @mattf I thought I could close the other PR and reopen it.. But I didn't have the option to reopen it now. I just didn't want it to keep notifying maintainers if I would make other commits for testing.
Continuation of: #3641
PR fixes Runpod Adapter
#3517
What I fixed from before:
Continuation of: #3641
Test Plan
Configure the Stack
The RunPod adapter automatically discovers models from your endpoint via the
/v1/models
API.No manual model configuration is required - just set your environment variables.
Run the Server
Important: Use the Build-Created Virtual Environment
For Qwen3-32B-AWQ Public Endpoint (Recommended)
Quick Test
1. List Available Models (Dynamic Discovery)
First, check which models are available on your RunPod endpoint:
Example Response:
Note: Use the
identifier
value from the response above in your requests below.2. Chat Completion (Non-streaming)
Replace
qwen3-32b-awq
with your model identifier from step 1:3. Chat Completion (Streaming)
Clean streaming output:
Expected Output: