OpenVoiceOS · JarbasAl · Jul 18, 2025 · Jul 18, 2025
diff --git a/README.md b/README.md
@@ -1,17 +1,26 @@
 # <img src='https://raw.githack.com/FortAwesome/Font-Awesome/master/svgs/solid/robot.svg' card_color='#40DBB0' width='50' height='50' style='vertical-align:bottom'/> OVOS OpenAI Plugin
 
-Leverages [OpenAI Completions API](https://platform.openai.com/docs/api-reference/completions/create) to provide the following ovos plugins: 
-- `ovos-solver-openai-plugin` for usage with [ovos-persona](https://github.com/OpenVoiceOS/ovos-persona) (and in older ovos releases with [ovos-skill-fallback-chatgpt]())
-- `ovos-dialog-transformer-openai-plugin` to rewrite OVOS dialogs just before TTS executes in [ovos-audio](https://github.com/OpenVoiceOS/ovos-audio)
-- `ovos-summarizer-openai-plugin` to summarize text, not used directly but provided for consumption by other plugins/skills
+This plugin is designed to leverage the **OpenAI API** for various functionalities within the OpenVoiceOS ecosystem. It provides a set of OVOS plugins that interact with OpenAI's services. Crucially, it is also compatible with **self-hosted OpenAI-compatible alternatives**, such as the [OVOS Persona Server](https://github.com/OpenVoiceOS/ovos-persona-server), or any other project that implements the full suite of OpenAI API endpoints (Chat Completions, Embeddings, Files, and Vector Stores). This flexibility allows you to choose between cloud-based OpenAI services or a local, private setup.
+
+Specifically, this plugin provides:
+
+  - `ovos-solver-openai-plugin` for general chat completions, primarily for usage with [ovos-persona](https://github.com/OpenVoiceOS/ovos-persona) (and in older ovos releases with [ovos-skill-fallback-chatgpt](https://www.google.com/search?q=))
+  - `ovos-solver-openai-rag-plugin` for Retrieval Augmented Generation using a compatible backend (like `ovos-persona-server`) as a knowledge source.
+  - `ovos-dialog-transformer-openai-plugin` to rewrite OVOS dialogs just before TTS executes in [ovos-audio](https://github.com/OpenVoiceOS/ovos-audio)
+  - `ovos-summarizer-openai-plugin` to summarize text, not used directly but provided for consumption by other plugins/skills
+
+-----
 
 ## Install
 
 `pip install ovos-openai-plugin`
 
+-----
+
 ## Persona Usage
 
-To create your own persona using a OpenAI compatible server create a .json in `~/.config/ovos_persona/llm.json`:  
+To create your own persona using a OpenAI compatible server create a .json in `~/.config/ovos_persona/llm.json`:
+
 ```json
 {
   "name": "My Local LLM",
@@ -30,21 +39,76 @@ Then say "Chat with {name_from_json}" to enable it, more details can be found in
 
 This plugins also provides a default "Remote LLama" demo persona, it points to a public server hosted by @goldyfruit.
 
+-----
+
+## RAG Solver Usage
+
+The `ovos-solver-openai-rag-plugin` enables **Retrieval Augmented Generation (RAG)**. This means your OVOS assistant can answer questions by first searching for relevant information in a configured knowledge base (a "vector store" hosted by a compatible backend like `ovos-persona-server`), and then using an LLM to generate a coherent answer based on that retrieved context.
+
+This is particularly useful for:
+
+  * Answering questions about specific documentation, personal notes, or proprietary data.
+  * Reducing LLM hallucinations by grounding responses in factual, provided information.
+
+### How it Works
+
+1.  **Search**: When a user asks a question, the RAG solver first sends the query to the configured backend's vector store search endpoint.
+2.  **Retrieve**: The backend returns relevant text chunks (documents or passages) from your indexed data.
+3.  **Augment**: These retrieved chunks are then injected into the LLM's prompt, along with the user's original query and conversation history.
+4.  **Generate**: The LLM processes this augmented prompt and generates an answer, prioritizing the provided context.
+
+### Configuration
+
+To use the RAG solver, you need to configure it in your `~/.config/ovos_persona/llm.json` file. You will need:
+
+1.  A **compatible OpenAI API backend running** (e.g., [ovos-persona-server](https://github.com/OpenVoiceOS/ovos-persona-server)) with a populated vector store.
+2.  The `vector_store_id` of your created vector store on that backend.
+3.  The `llm_model` and `llm_api_key` for the LLM that your chosen backend will use for chat completions.
+
+Here's an example `llm.json` configuration for a RAG persona:
+
+```json
+{
+  "name": "My RAG Assistant",
+  "solvers": [
+    "ovos-solver-openai-rag-plugin"
+  ],
+  "ovos-solver-openai-rag-plugin": {
+    "persona_server_url": "http://localhost:8337/v1",  // URL of your OpenAI-compatible backend
+    "vector_store_id": "vs_your_vector_store_id_here", // <<< REPLACE THIS!
+    "max_num_results": 5,                             // Max text chunks to retrieve
+    "max_context_tokens": 2000,                       // Max tokens from retrieved context for LLM
+    "system_prompt_template": "You are a helpful assistant. Use the following context to answer the user's question. If the answer is not in the context, state that you don't know.\n\nContext:\n{context}\n\nQuestion:\n{question}",
+    "llm_model": "llama3.1:8b",                       // The LLM model name used by the backend
+    "llm_api_key": "sk-xxxx",                         // API key for the LLM on the backend (can be dummy for local setups)
+    "llm_temperature": 0.7,
+    "llm_top_p": 1.0,
+    "llm_max_tokens": 500,
+    "enable_memory": true,                            // Enable conversation history for RAG
+    "memory_size": 3                                  // Number of Q&A pairs to remember
+  }
+}
+```
+
+-----
+
 ## Dialog Transformer
 
-you can rewrite text dynamically based on specific personas, such as simplifying explanations or mimicking a specific tone.  
+You can rewrite text dynamically based on specific personas, such as simplifying explanations or mimicking a specific tone.
 
 #### Example Usage:
-- **`rewrite_prompt`:** `"rewrite the text as if you were explaining it to a 5-year-old"`  
-- **Input:** `"Quantum mechanics is a branch of physics that describes the behavior of particles at the smallest scales."`  
-- **Output:** `"Quantum mechanics is like a special kind of science that helps us understand really tiny things."`  
+
+  - **`rewrite_prompt`:** `"rewrite the text as if you were explaining it to a 5-year-old"`
+  - **Input:** `"Quantum mechanics is a branch of physics that describes the behavior of particles at the smallest scales."`
+  - **Output:** `"Quantum mechanics is like a special kind of science that helps us understand really tiny things."`
 
 Examples of `rewrite_prompt` Values:
-- `"rewrite the text as if it was an angry old man speaking"`  
-- `"Add more 'dude'ness to it"`  
-- `"Explain it like you're teaching a child"`  
 
-To enable this plugin, add the following to your `mycroft.conf`:  
+  - `"rewrite the text as if it was an angry old man speaking"`
+  - `"Add more 'dude'ness to it"`
+  - `"Explain it like you're teaching a child"`
+
+To enable this plugin, add the following to your `mycroft.conf`:
 
 ```json
 "dialog_transformers": {
@@ -57,6 +121,8 @@ To enable this plugin, add the following to your `mycroft.conf`:
 
 > 💡 the user utterance will be appended after `rewrite_prompt` for the actual query
 
+-----
+
 ## Direct Usage
 
 ```python
@@ -71,11 +137,10 @@ print(bot.spoken_answer("Quem encontrou o caminho maritimo para o Brazil", lang=
 
 ```
 
-## Remote Persona / Proxies
-
-You can run any persona behind a OpenAI compatible server via [ovos-persona-server](https://github.com/OpenVoiceOS/ovos-persona-server). 
+-----
 
-This allows you to offload the workload to a standalone server, either for performance reasons or to keep api keys in a single safe place.
+## Remote Persona / Proxies
 
-Then just configure this plugin to point to your persona server like it was OpenAI
+You can run any persona behind an **OpenAI-compatible server** (such as [ovos-persona-server](https://github.com/OpenVoiceOS/ovos-persona-server)).
 
+This allows you to offload the workload to a standalone server, either for performance reasons or to keep API keys in a single safe place. Then, you just configure this plugin to point to your self-hosted server as if it were the official OpenAI API.
diff --git a/ovos_solver_openai_persona/__init__.py b/ovos_solver_openai_persona/__init__.py
@@ -38,4 +38,11 @@ def __init__(self, *args, **kwargs):
 
     # Quantum mechanics is a branch of physics that deals with the behavior of particles on a very small scale, such as atoms and subatomic particles. It explores the idea that particles can exist in multiple states at once and that their behavior is not predictable in the traditional sense.
     print(bot.spoken_answer("what is the definition of computer", lang="en-US"))
-    # O português Pedro Álvares Cabral encontrou o caminho marítimo para o Brasil em 1500. Ele foi o responsável por descobrir o litoral brasileiro, embora Cristóvão Colombo tenha chegado à América do Sul em 1498, cinco anos antes. Cabral desembarcou na atual costa de Alagoas, no Nordeste do Brasil.
+    # Okay, let's break down the definition of a computer. Here's a comprehensive explanation, covering different aspects:
+    #
+    # At its core, a computer is an electronic device that can:
+    #
+    #    Receive Input: Take in data and instructions.
+    #    Process Data: Perform calculations and manipulate data based on those instructions.
+    #    Store Data:  Save data and instructions for later use.
+    #    Produce Output:  Present the results of processing in a
diff --git a/ovos_solver_openai_persona/engines.py b/ovos_solver_openai_persona/engines.py
@@ -101,10 +101,10 @@ def __init__(self, config=None,
                  enable_cache: bool = False,
                  internal_lang: Optional[str] = None):
         """
-        Initializes the OpenAIChatCompletionsSolver with API configuration, memory settings, and system prompt.
-         
+        Initialize an OpenAIChatCompletionsSolver instance with API configuration, conversation memory settings, and system prompt.
+
         Raises:
-            ValueError: If the API key is not provided in the configuration.
+            ValueError: If the API key is missing from the configuration.
         """
         super().__init__(config=config, translator=translator,
                  detector=detector, priority=priority,
@@ -126,21 +126,21 @@ def __init__(self, config=None,
         self.system_prompt = config.get("system_prompt") or config.get("initial_prompt")
         if not self.system_prompt:
             self.system_prompt =  "You are a helpful assistant."
-            LOG.error(f"system prompt not set in config! defaulting to '{self.system_prompt}'")
+            LOG.debug(f"system prompt not set in config! defaulting to '{self.system_prompt}'")
 
     # OpenAI API integration
     def _do_api_request(self, messages):
         """
-        Sends a chat completion request to the OpenAI API and returns the assistant's reply.
+        Send a chat completion request to the OpenAI API using the provided conversation history and return the assistant's reply.
 
-        Args:
-            messages: A list of message dictionaries representing the conversation history.
+        Parameters:
+            messages (list): Conversation history as a list of message dictionaries.
 
         Returns:
-            The content of the assistant's reply as a string.
+            str: The assistant's reply content.
 
         Raises:
-            RequestException: If the OpenAI API returns an error in the response.
+            RequestException: If the OpenAI API response contains an error.
         """
         s = requests.Session()
         headers = {
@@ -243,35 +243,35 @@ def get_chat_history(self, system_prompt=None):
 
     def get_messages(self, utt, system_prompt=None) -> MessageList:
         """
-        Builds a list of chat messages including the system prompt, recent conversation history, and the current user utterance.
+        Constructs a list of chat messages for the API, including the system prompt, recent conversation history, and the current user utterance.
 
-        Args:
-        	utt: The current user input to be appended as the latest message.
+        Parameters:
+        	utt: The current user input to be added as the latest message.
         	system_prompt: Optional system prompt to use as the initial message.
 
         Returns:
-        	A list of message dictionaries representing the chat context for the API.
+        	A list of message dictionaries representing the chat context.
         """
         messages = self.get_chat_history(system_prompt)
         messages.append({"role": "user", "content": utt})
         return messages
 
-    # abstract Solver methods
+    ## chat completions api - message list as input
     def continue_chat(self, messages: MessageList,
                       lang: Optional[str],
                       units: Optional[str] = None) -> Optional[str]:
         """
-        Generates a chat response using the provided message history and updates memory if enabled.
+        Generate a chat response based on the provided message history and update conversation memory if enabled.
 
-        If the first message is not a system prompt, prepends the system prompt. Processes the API response and returns a cleaned answer, or None if the answer is empty or only punctuation/underscores. Updates internal memory with the latest question and answer if memory is enabled.
+        If the first message is not a system prompt, prepends the system prompt. Returns a cleaned response string, or None if the response is empty or contains only punctuation or underscores. Updates internal memory with the latest user message and answer when memory is enabled.
 
-        Args:
-            messages: List of chat messages with 'role' and 'content' keys.
-            lang: Optional language code for the response.
-            units: Optional unit system for numerical values.
+        Parameters:
+            messages (MessageList): List of chat messages, each with 'role' and 'content' keys.
+            lang (Optional[str]): Language code for the response.
+            units (Optional[str]): Unit system for numerical values.
 
         Returns:
-            The generated response as a string, or None if no valid response is produced.
+            Optional[str]: The generated response string, or None if no valid response is produced.
         """
         if messages[0]["role"] != "system":
             messages = [{"role": "system", "content": self.system_prompt }] + messages
@@ -317,19 +317,20 @@ def stream_chat_utterances(self, messages: MessageList,
                     yield post_process_sentence(answer)
                 answer = ""
 
+    ## completions api - single text as input
     def stream_utterances(self, query: str,
                           lang: Optional[str] = None,
                           units: Optional[str] = None) -> Iterable[str]:
         """
-        Stream utterances for the given query as they become available.
+        Yields partial responses for a query as they are generated by the chat completions API.
 
-        Args:
-            query (str): The query text.
-            lang (Optional[str]): Optional language code. Defaults to None.
-            units (Optional[str]): Optional units for the query. Defaults to None.
+        Parameters:
+            query (str): The user query to send to the chat model.
+            lang (Optional[str]): Language code for the response, if applicable.
+            units (Optional[str]): Units relevant to the query, if applicable.
 
         Returns:
-            Iterable[str]: An iterable of utterances.
+            Iterable[str]: An iterator yielding segments of the model's response as they become available.
         """
         messages = self.get_messages(query)
         yield from self.stream_chat_utterances(messages, lang, units)