Skip to content

FinanceAgent - enable on Xeon, remote endpoint, and refactor tests #2032

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 14 commits into
base: main
Choose a base branch
from

Conversation

alexsin368
Copy link
Collaborator

Description

  • Enable support for running OpenAI models on Xeon
  • Enable remote endpoints for the LLMs on the agents only (docsum and dataprep will still run LLM locally)
  • Reorganize set_env.sh environment variables
  • Refactor tests: since Xeon test steps are similar to Gaudi's, breaking out the tests to make it easier to add new test steps or new tests for hardware

Issues

#1973

Type of change

List the type of change like below. Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds new functionality)
  • Breaking change (fix or feature that would break existing design and interface)
  • Others (enhancement, documentation, validation, etc.)

Dependencies

None

Tests

Added new test script.
Verified FinanceAgent is running on the UI.

Copy link

github-actions bot commented Jun 4, 2025

Dependency Review

✅ No vulnerabilities or license issues found.

Scanned Files

None

- tei-embedding-serving
- redis-vector-db
- redis-kv-store
- dataprep-redis-server-finance
- finqa-agent-endpoint
- research-agent-endpoint
- docsum-vllm-gaudi
- docsum-vllm-xeon
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you double-check this? DocSum is using the same LLM models as the agents and is running on VLLM-Gaudi. This is an HPU compose, not a Xeon-based one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, originally DocSum and Dataprep use the same LLM models as the agents. What I've done is separate them with two models: one for agents, and another for DocSum/Dataprep.

For DocSum, it can run with vLLM on Xeon, and I've changed the LLM model for this to meta-llama/Llama-3.1-8B-Instruct. This 8B parameter will run fine on Xeon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't decided yet whether or not I want the remote endpoint/OpenAI model to be the LLM for the DocSum and Dataprep microservices. For simplicity, I'd leave them out for now.

"xeon")
echo "==================== Start all services for Xeon ===================="
docker compose -f $WORKPATH/docker_compose/intel/cpu/xeon/compose_openai.yaml up -d
;;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might require some buffer time for all services to initialize and become ready.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added check for the docsum-vllm-xeon service to be up and running. This should be the bottleneck and I've given it about 33 minutes, which is more than enough for loading an 8B parameter model.

Copy link
Collaborator Author

@alexsin368 alexsin368 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed comments, but still need to fix test for Xeon.

Copy link
Collaborator

@louie-tsai louie-tsai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR looks confusing to me. Even we enable remote endpoint, vLLM is still running on Gaudi for Llama3 70B model. For Xeon recipe, do we move any micro-service from Gaudi to Xeon comparing to Gaudi recipe? if no architecture changes, why we claim a new Xeon support since no new support to move running from Gaudi to Xeon?

| Hardware | Deployment Mode | Guide Link |
| :--------------------------------- | :------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------- |
| Intel® Gaudi® AI Accelerator | Single Node (Docker) | [Gaudi Docker Compose Guide](./docker_compose/intel/hpu/gaudi/README.md) |
| Intel® Xeon® Scalable processors | Single Node (Docker) [Xeon Docker Compose Guide](./docker_compose/intel/cpu/xeon/README.md) |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume that those hardware platforms need to run LLama 70B with good accuracy. Do we plan to run Llama 70B on Xeon ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my set_env.sh. Xeon will run Llama 8B.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Llama 8b failed on AgentQnA accuracy check results. does Llama 8b give right answers for FinanceAgent?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, the agent microservice will run with gpt-4o-mini-2024-07-18, the same as AgentQnA. Llama-3.1-8B-Instruct will be used for dataprep and docsum. Using the same models as other GenAIExamples.

@@ -0,0 +1,202 @@
# Deploy Finance Agent on Intel® Xeon® Scalable processors with Docker Compose
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you remind me what micro-service we run on different architectures by introducing this PR? for remote endpoint, we still run on Gaudi not Xeon.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent microservice: remote endpoint (could be Xeon or Gaudi)
DocSum: vLLM on Xeon
Dataprep: Xeon

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we claim Xeon support for FinanceAgent, make sure that you have right responses from different models on Xeon. it is not the case for AgentQnA, and we need 70B for AgentQnA. for xeon folder, we probably need to focus on Xeon and have one microservice running on Xeon instead of Gaudi comparing to Gaudi folder.

@louie-tsai
Copy link
Collaborator

tag @louie-tsai to track the PR

@alexsin368 alexsin368 closed this Jun 16, 2025
@alexsin368 alexsin368 force-pushed the finance-agent-remote-endpoint-new branch from dfe01b0 to 6ebae99 Compare June 16, 2025 23:14
@alexsin368 alexsin368 reopened this Jun 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants