Skip to content

Support denvr endpoints with Litellm. #2085

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 17 commits into
base: main
Choose a base branch
from

Conversation

srinarayan-srikanthan
Copy link
Collaborator

Description

Support remote inference with denvr endpoint for chatqna with readme updates.

Issues

#2084

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

N/A

Tests

Describe the tests that you ran to verify your changes.

Ubuntu added 3 commits June 20, 2025 02:52
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
@Copilot Copilot AI review requested due to automatic review settings June 20, 2025 03:43
Copy link

github-actions bot commented Jun 20, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

Copilot

This comment was marked as outdated.

pre-commit-ci bot and others added 3 commits June 20, 2025 03:44
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Adds support for deploying ChatQnA with remote denvr inference endpoints and updates the streaming response parser for multi-chunk JSON outputs.

  • Introduce a new compose_remote.yaml workflow and environment variable instructions in the Xeon CPU Docker README.
  • Update align_generator in chatqna.py to split and process multiple JSON chunks per line.

Reviewed Changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated no comments.

File Description
ChatQnA/docker_compose/intel/cpu/xeon/README.md Added remote endpoint deployment steps and updated compose table
ChatQnA/chatqna.py Refactored align_generator to handle multi-chunk streaming JSON
Comments suppressed due to low confidence (3)

ChatQnA/docker_compose/intel/cpu/xeon/README.md:78

  • [nitpick] Clarify whether REMOTE_ENDPOINT should include the /v1/chat/completions path or just the base URL to avoid confusion.
**Note**: Set REMOTE_ENDPOINT variable value to "https://api.inference.denvrdata.com" when the remote endpoint to access is "https://api.inference.denvrdata.com/v1/chat/completions"

ChatQnA/chatqna.py:178

  • [nitpick] The outer variable line is reused for the inner loop below, which can reduce readability; consider renaming the loop variable to chunk or similar.
        chunks = [chunk.strip() for chunk in line.split("\n\n") if chunk.strip()]

ChatQnA/chatqna.py:191

  • The previous finish_reason check was removed, which may cause tokens to be emitted after the stream should end; consider re-adding or documenting this behavior change.
                elif "content" in json_data["choices"][0]["delta"]:

Ubuntu and others added 5 commits June 23, 2025 17:47
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
pre-commit-ci bot and others added 5 commits June 26, 2025 19:36
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
Signed-off-by: Ubuntu <azureuser@denvr-inf.kifxisxbiwme5gt4kkwqsfdjuh.dx.internal.cloudapp.net>
no_proxy: ${no_proxy}
http_proxy: ${http_proxy}
https_proxy: ${https_proxy}
LLM_ENDPOINT: ${LLM_ENDPOINT}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be:

LLM_ENDPOINT: ${REMOTE_ENDPOINT}

Copy link
Collaborator

@louie-tsai louie-tsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor comments. overall looks good.


```bash
export REMOTE_ENDPOINT=<endpoint-url>
export LLM_MODEL_ID=<model-id>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to put a notice why we need to set LLM_MODEL_ID again and users need to pick one supported by remote endpoint.

```bash
export REMOTE_ENDPOINT=<endpoint-url>
export LLM_MODEL_ID=<model-id>
export OPENAI_API_KEY=<API-KEY>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good to change it from OPENAI_API_KEY to API_KEY since it is not for openai

- **To Run:**

```bash
export OPENAI_API_KEY=<api-key>>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPENAI name here is confusing. good to remove OPENAI term here.


```bash
export LLM_ENDPOINT=<endpoint-url>
export LLM_MODEL_ID=<model-id>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to mention how users to set LLM_MODEL_ID correctly by getting the supported model list

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants