Skip to content

Conversation

mschuettlerTNG
Copy link
Contributor

Description:

This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models.

Changes Made:

  • add llama.cpp backend
  • add installation management for llama.cpp
  • adjust build scripts
  • adjust add model dialog

Testing Done:

Tested locally on BMG.

Checklist:

  • I have tested the changes locally.
  • I have self-reviewed the code changes.

DanielHirschTNG and others added 12 commits December 15, 2024 23:09
*Use interface to exchange backends
*Add llama cpp backend
*Move ipex llm to class

Re-enable RAG for both backends (#32)

llama.cpp and ipex run alternate (#32)

Adjust to llama.cpp branch to upstream changes. Improve UX.

Adds separate Llama cpp service

Update parameters for arc usage

Update gitignore

Uses own llama.cpp backend service for llama.cpp support (#32)

Move llama.cpp out of ipex service (#32)

Makes llama.cpp accessible (#32)
def __calculate_md5(self, file_path: str) -> str:
import hashlib

hasher = hashlib.md5()

Check warning

Code scanning / Bandit

Use of insecure MD2, MD4, MD5, or SHA1 hash function. Warning

Use of insecure MD2, MD4, MD5, or SHA1 hash function.
parser = argparse.ArgumentParser(description="AI Playground Web service")
parser.add_argument("--port", type=int, default=59997, help="Service listen port")
args = parser.parse_args()
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)

Check failure

Code scanning / Bandit

A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code. Error

A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code.
import gc
import json
import os
import re
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
import re

parser = argparse.ArgumentParser(description="AI Playground Web service")
parser.add_argument("--port", type=int, default=59997, help="Service listen port")
args = parser.parse_args()
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
app.run(host="127.0.0.1", port=args.port, debug=True, use_reloader=False)
app.run(host="127.0.0.1", port=args.port, use_reloader=False)

}

device = "xpu"
env_type = "arc"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer needed.

Suggested change
env_type = "arc"

@Nuullll Nuullll merged commit a386b20 into intel:dev Dec 16, 2024
4 of 6 checks passed
@mschuettlerTNG mschuettlerTNG deleted the llama-cpp-squashed branch January 29, 2025 08:39
EffortlessSteven pushed a commit to zimm0140/AI-Playground that referenced this pull request Mar 27, 2025
**Description:**

This adds a new python backend which uses python-llama-cpp as an inference backend for the Answer section. This allows users to run text generation using single file GGUF models.

**Changes Made:**

* add llama.cpp backend
* add installation management for llama.cpp
* adjust build scripts
* adjust add model dialog

**Testing Done:**

Tested locally on BMG.

**Checklist:**

- [x] I have tested the changes locally.
- [x] I have self-reviewed the code changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants