-
Notifications
You must be signed in to change notification settings - Fork 66
Add Llama.cpp as inference backend for Answer section #114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
*Use interface to exchange backends *Add llama cpp backend *Move ipex llm to class Re-enable RAG for both backends (#32) llama.cpp and ipex run alternate (#32) Adjust to llama.cpp branch to upstream changes. Improve UX. Adds separate Llama cpp service Update parameters for arc usage Update gitignore Uses own llama.cpp backend service for llama.cpp support (#32) Move llama.cpp out of ipex service (#32) Makes llama.cpp accessible (#32)
def __calculate_md5(self, file_path: str) -> str: | ||
import hashlib | ||
|
||
hasher = hashlib.md5() |
Check warning
Code scanning / Bandit
Use of insecure MD2, MD4, MD5, or SHA1 hash function. Warning
parser = argparse.ArgumentParser(description="AI Playground Web service") | ||
parser.add_argument("--port", type=int, default=59997, help="Service listen port") | ||
args = parser.parse_args() | ||
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False) |
Check failure
Code scanning / Bandit
A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code. Error
import gc | ||
import json | ||
import os | ||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import re |
parser = argparse.ArgumentParser(description="AI Playground Web service") | ||
parser.add_argument("--port", type=int, default=59997, help="Service listen port") | ||
args = parser.parse_args() | ||
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False) | |
app.run(host="127.0.0.1", port=args.port, use_reloader=False) |
} | ||
|
||
device = "xpu" | ||
env_type = "arc" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is no longer needed.
env_type = "arc" |
**Description:** This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models. **Changes Made:** * add llama.cpp backend * add installation management for llama.cpp * adjust build scripts * adjust add model dialog **Testing Done:** Tested locally on BMG. **Checklist:** - [x] I have tested the changes locally. - [x] I have self-reviewed the code changes.
Description:
This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models.
Changes Made:
Testing Done:
Tested locally on BMG.
Checklist: