-
Notifications
You must be signed in to change notification settings - Fork 295
FinanceAgent - enable on Xeon, remote endpoint, and refactor tests #2032
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
FinanceAgent - enable on Xeon, remote endpoint, and refactor tests #2032
Conversation
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Signed-off-by: alexsin368 <[email protected]>
Dependency Review✅ No vulnerabilities or license issues found.Scanned FilesNone |
for more information, see https://pre-commit.ci
- tei-embedding-serving | ||
- redis-vector-db | ||
- redis-kv-store | ||
- dataprep-redis-server-finance | ||
- finqa-agent-endpoint | ||
- research-agent-endpoint | ||
- docsum-vllm-gaudi | ||
- docsum-vllm-xeon |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you double-check this? DocSum is using the same LLM models as the agents and is running on VLLM-Gaudi. This is an HPU compose, not a Xeon-based one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, originally DocSum and Dataprep use the same LLM models as the agents. What I've done is separate them with two models: one for agents, and another for DocSum/Dataprep.
For DocSum, it can run with vLLM on Xeon, and I've changed the LLM model for this to meta-llama/Llama-3.1-8B-Instruct. This 8B parameter will run fine on Xeon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't decided yet whether or not I want the remote endpoint/OpenAI model to be the LLM for the DocSum and Dataprep microservices. For simplicity, I'd leave them out for now.
"xeon") | ||
echo "==================== Start all services for Xeon ====================" | ||
docker compose -f $WORKPATH/docker_compose/intel/cpu/xeon/compose_openai.yaml up -d | ||
;; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might require some buffer time for all services to initialize and become ready.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added check for the docsum-vllm-xeon service to be up and running. This should be the bottleneck and I've given it about 33 minutes, which is more than enough for loading an 8B parameter model.
Signed-off-by: alexsin368 <[email protected]>
…m/alexsin368/GenAIExamples into finance-agent-remote-endpoint-new
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed comments, but still need to fix test for Xeon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR looks confusing to me. Even we enable remote endpoint, vLLM is still running on Gaudi for Llama3 70B model. For Xeon recipe, do we move any micro-service from Gaudi to Xeon comparing to Gaudi recipe? if no architecture changes, why we claim a new Xeon support since no new support to move running from Gaudi to Xeon?
| Hardware | Deployment Mode | Guide Link | | ||
| :--------------------------------- | :------------------------------------------------------------------------------------------ | :----------------------------------------------------------------------- | | ||
| Intel® Gaudi® AI Accelerator | Single Node (Docker) | [Gaudi Docker Compose Guide](./docker_compose/intel/hpu/gaudi/README.md) | | ||
| Intel® Xeon® Scalable processors | Single Node (Docker) [Xeon Docker Compose Guide](./docker_compose/intel/cpu/xeon/README.md) | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that those hardware platforms need to run LLama 70B with good accuracy. Do we plan to run Llama 70B on Xeon ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my set_env.sh. Xeon will run Llama 8B.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Llama 8b failed on AgentQnA accuracy check results. does Llama 8b give right answers for FinanceAgent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, the agent microservice will run with gpt-4o-mini-2024-07-18, the same as AgentQnA. Llama-3.1-8B-Instruct will be used for dataprep and docsum. Using the same models as other GenAIExamples.
@@ -0,0 +1,202 @@ | |||
# Deploy Finance Agent on Intel® Xeon® Scalable processors with Docker Compose |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you remind me what micro-service we run on different architectures by introducing this PR? for remote endpoint, we still run on Gaudi not Xeon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agent microservice: remote endpoint (could be Xeon or Gaudi)
DocSum: vLLM on Xeon
Dataprep: Xeon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we claim Xeon support for FinanceAgent, make sure that you have right responses from different models on Xeon. it is not the case for AgentQnA, and we need 70B for AgentQnA. for xeon folder, we probably need to focus on Xeon and have one microservice running on Xeon instead of Gaudi comparing to Gaudi folder.
…, fix python cmd Signed-off-by: alexsin368 <[email protected]>
tag @louie-tsai to track the PR |
dfe01b0
to
6ebae99
Compare
…m/alexsin368/GenAIExamples into finance-agent-remote-endpoint-new
Description
Issues
#1973
Type of change
List the type of change like below. Please delete options that are not relevant.
Dependencies
None
Tests
Added new test script.
Verified FinanceAgent is running on the UI.