Add Llama.cpp as inference backend for Answer section #114

mschuettlerTNG · 2024-12-15T22:16:20Z

Description:

This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models.

Changes Made:

add llama.cpp backend
add installation management for llama.cpp
adjust build scripts
adjust add model dialog

Testing Done:

Tested locally on BMG.

Checklist:

I have tested the changes locally.
I have self-reviewed the code changes.

*Use interface to exchange backends *Add llama cpp backend *Move ipex llm to class Re-enable RAG for both backends (#32) llama.cpp and ipex run alternate (#32) Adjust to llama.cpp branch to upstream changes. Improve UX. Adds separate Llama cpp service Update parameters for arc usage Update gitignore Uses own llama.cpp backend service for llama.cpp support (#32) Move llama.cpp out of ipex service (#32) Makes llama.cpp accessible (#32)

…ckend

LlamaCPP/llama_cpp_test.py

LlamaCPP/llama_rag.py

+    def __calculate_md5(self, file_path: str) -> str:
+        import hashlib
+
+        hasher = hashlib.md5()


LlamaCPP/llama_web_api.py

+    parser = argparse.ArgumentParser(description="AI Playground Web service")
+    parser.add_argument("--port", type=int, default=59997, help="Service listen port")
+    args = parser.parse_args()
+    app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)


Nuullll · 2024-12-16T00:36:54Z

LlamaCPP/llama_rag.py

+import gc
+import json
+import os
+import re


Suggested change

import re

Nuullll · 2024-12-16T00:40:05Z

LlamaCPP/llama_web_api.py

+    parser = argparse.ArgumentParser(description="AI Playground Web service")
+    parser.add_argument("--port", type=int, default=59997, help="Service listen port")
+    args = parser.parse_args()
+    app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)


Suggested change

app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)

app.run(host="127.0.0.1", port=args.port, use_reloader=False)

Nuullll · 2024-12-16T00:44:58Z

LlamaCPP/model_config.py

+}
+
+device = "xpu"
+env_type = "arc"


this is no longer needed.

Suggested change

env_type = "arc"

**Description:** This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models. **Changes Made:** * add llama.cpp backend * add installation management for llama.cpp * adjust build scripts * adjust add model dialog **Testing Done:** Tested locally on BMG. **Checklist:** - [x] I have tested the changes locally. - [x] I have self-reviewed the code changes.

DanielHirschTNG and others added 12 commits December 15, 2024 23:09

Adjust llama.cpp setup logic

ff4538c

Adjust llama.cpp integration

b672364

Support paths with spaces

fa5a206

Improve model selection

ca6426d

Remove max_token limit

234140c

Fix model checking from Frontend

2dc34ee

Copy llama-cpp-python wheel during build

6b2d7a4

build: add llama.cpp service folder

64935d0

Adjust backend service after rebase

77d6537

Revert changes to ai-backend after llama.cpp was moved to separate ba…

46e0d1f

…ckend

Fix add model dialog for Llama.cpp

ca365d1

github-advanced-security bot found potential problems Dec 15, 2024

View reviewed changes

Nuullll reviewed Dec 16, 2024

View reviewed changes

LlamaCPP/model_config.py

}

device = "xpu"

env_type = "arc"

Copy link

Contributor

Nuullll Dec 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer needed.

Suggested change

env_type = "arc"

Nuullll merged commit a386b20 into intel:dev Dec 16, 2024
4 of 6 checks passed

mschuettlerTNG deleted the llama-cpp-squashed branch January 29, 2025 08:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add Llama.cpp as inference backend for Answer section #114

Add Llama.cpp as inference backend for Answer section #114

Uh oh!

mschuettlerTNG commented Dec 15, 2024

Uh oh!

Uh oh!

Check warning

Check failure

Nuullll Dec 16, 2024

Uh oh!

Nuullll Dec 16, 2024

Uh oh!

Nuullll Dec 16, 2024

Uh oh!

Uh oh!

Uh oh!

	app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)
	app.run(host="127.0.0.1", port=args.port, use_reloader=False)

Add Llama.cpp as inference backend for Answer section #114

Add Llama.cpp as inference backend for Answer section #114

Uh oh!

Conversation

mschuettlerTNG commented Dec 15, 2024

Uh oh!

Uh oh!

Check warning

Check failure

Nuullll Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Nuullll Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Nuullll Dec 16, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!