-
-
Notifications
You must be signed in to change notification settings - Fork 5
feat: RAG solver #37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: dev
Are you sure you want to change the base?
feat: RAG solver #37
Conversation
WalkthroughThis update introduces a new Retrieval Augmented Generation (RAG) solver by adding the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant RAGSolver
participant VectorStore
participant PersonaServer
User->>RAGSolver: Ask question
RAGSolver->>VectorStore: Search for relevant context
VectorStore-->>RAGSolver: Return context chunks
RAGSolver->>PersonaServer: Send chat completions request (with context + history)
PersonaServer-->>RAGSolver: Return generated response
RAGSolver->>User: Return answer
Possibly related PRs
Poem
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (8)
README.md (1)
3-11
: Fix markdown formatting for consistency.The documentation content is excellent, but there are several markdown formatting issues:
- Lines 7-10: Unordered lists should not be indented
- Lines 50-51: Use dashes (
-
) instead of asterisks (*
) for consistency- Line 99: Should be h3 (
###
) instead of h4 (####
) to follow heading hierarchy- Line 99: Remove trailing colon from heading
Apply these formatting fixes:
-Specifically, this plugin provides: - - - `ovos-solver-openai-plugin` for general chat completions, primarily for usage with [ovos-persona](https://github.com/OpenVoiceOS/ovos-persona) (and in older ovos releases with [ovos-skill-fallback-chatgpt](https://www.google.com/search?q=)) - - `ovos-solver-openai-rag-plugin` for Retrieval Augmented Generation using a compatible backend (like `ovos-persona-server`) as a knowledge source. - - `ovos-dialog-transformer-openai-plugin` to rewrite OVOS dialogs just before TTS executes in [ovos-audio](https://github.com/OpenVoiceOS/ovos-audio) - - `ovos-summarizer-openai-plugin` to summarize text, not used directly but provided for consumption by other plugins/skills +Specifically, this plugin provides: + +- `ovos-solver-openai-plugin` for general chat completions, primarily for usage with [ovos-persona](https://github.com/OpenVoiceOS/ovos-persona) (and in older ovos releases with [ovos-skill-fallback-chatgpt](https://www.google.com/search?q=)) +- `ovos-solver-openai-rag-plugin` for Retrieval Augmented Generation using a compatible backend (like `ovos-persona-server`) as a knowledge source. +- `ovos-dialog-transformer-openai-plugin` to rewrite OVOS dialogs just before TTS executes in [ovos-audio](https://github.com/OpenVoiceOS/ovos-audio) +- `ovos-summarizer-openai-plugin` to summarize text, not used directly but provided for consumption by other plugins/skills-This is particularly useful for: - - * Answering questions about specific documentation, personal notes, or proprietary data. - * Reducing LLM hallucinations by grounding responses in factual, provided information. +This is particularly useful for: + +- Answering questions about specific documentation, personal notes, or proprietary data. +- Reducing LLM hallucinations by grounding responses in factual, provided information.-#### Example Usage: - - - **`rewrite_prompt`:** `"rewrite the text as if you were explaining it to a 5-year-old"` +### Example Usage + +- **`rewrite_prompt`:** `"rewrite the text as if you were explaining it to a 5-year-old"`Also applies to: 50-51, 99-109
ovos_solver_openai_persona/rag.py (7)
67-69
: Improve error messages for better debugging.The error messages could include the actual config key names to help users identify the issue more easily.
- raise ValueError("api_url must be set in config for PersonaServerRAGSolver") + raise ValueError("'api_url' must be set in config for OpenAIRAGSolver") if not self.vector_store_id: - raise ValueError("vector_store_id must be set in config for PersonaServerRAGSolver") + raise ValueError("'vector_store_id' must be set in config for OpenAIRAGSolver")
85-85
: Add space after comment marker.- self.key = self.config.get("key") # This is the key for the Persona Server's chat endpoint + self.key = self.config.get("key") # This is the key for the Persona Server's chat endpoint
159-159
: Consider using a more accurate token estimation method.The current implementation uses word count as a token estimate, which can be inaccurate. Consider using a proper tokenizer or at least document this limitation.
Would you like me to implement a more accurate token estimation using tiktoken or a similar library?
177-177
: Fix comment formatting.- for q, a in self.qa_pairs[-1 * self.max_utts:]: # Use RAG solver's memory + for q, a in self.qa_pairs[-1 * self.max_utts:]: # Use RAG solver's memory
241-241
: Fix comment formatting.- "stream": False # Non-streaming call + "stream": False # Non-streaming call- "stream": True # Streaming call + "stream": True # Streaming callAlso applies to: 303-303
266-266
: Fix return type annotation.The method yields strings, not raw data lines as the comment suggests.
- units: Optional[str] = None) -> Iterable[str]: # Yields raw data: lines + units: Optional[str] = None) -> Iterable[str]: # Yields raw data: lines
395-399
: Security: Avoid hardcoding test credentials.The example code contains what appears to be placeholder API keys and URLs. Ensure these are clearly marked as examples.
- PERSONA_SERVER_URL = "http://0.0.0.0:8337/v1" - VECTOR_STORE_ID = "vs_YgqHwhmyJ48kkI6jU6G3ApSq" # <<< REPLACE THIS with your vector store ID - LLM_API_KEY = "sk-xxxx" # Can be any non-empty string for local setups like Ollama + PERSONA_SERVER_URL = "http://localhost:8337/v1" # Example URL - replace with your server + VECTOR_STORE_ID = "vs_EXAMPLE_ID" # <<< REPLACE THIS with your actual vector store ID + LLM_API_KEY = "sk-EXAMPLE" # <<< REPLACE THIS with your actual API key
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
README.md
(4 hunks)ovos_solver_openai_persona/__init__.py
(1 hunks)ovos_solver_openai_persona/engines.py
(3 hunks)ovos_solver_openai_persona/rag.py
(1 hunks)setup.py
(2 hunks)
🧰 Additional context used
🪛 markdownlint-cli2 (0.17.2)
README.md
7-7: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
8-8: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
9-9: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
10-10: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
50-50: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
50-50: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
51-51: Unordered list style
Expected: dash; Actual: asterisk
(MD004, ul-style)
51-51: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
99-99: Heading levels should only increment by one level at a time
Expected: h3; Actual: h4
(MD001, heading-increment)
99-99: Trailing punctuation in heading
Punctuation: ':'
(MD026, no-trailing-punctuation)
101-101: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
102-102: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
103-103: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
107-107: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
108-108: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
109-109: Unordered list indentation
Expected: 0; Actual: 2
(MD007, ul-indent)
🪛 Ruff (0.12.2)
ovos_solver_openai_persona/rag.py
136-136: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
139-139: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
259-259: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
262-262: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🔇 Additional comments (6)
ovos_solver_openai_persona/__init__.py (1)
41-48
: LGTM!The updated comment now correctly reflects the output for the "what is the definition of computer" query, improving documentation consistency.
ovos_solver_openai_persona/engines.py (2)
129-129
: Good logging level adjustment.Changing from error to debug is appropriate since a sensible default value is provided. This prevents unnecessary error logs in valid configurations.
259-259
: Helpful API method categorization.The added comments clearly distinguish between chat completions API methods (message list input) and completions API methods (single text input), improving code organization.
Also applies to: 320-320
setup.py (2)
68-68
: Verify the entry point key change is intentional.The entry point key has been changed from
'neon.plugin.solver'
to'opm.solver.chat'
. This appears to be a migration to OVOS Plugin Manager naming conventions. Please confirm this is intentional and won't break existing integrations.
51-51
: LGTM! RAG solver entry point properly configured.The new RAG solver entry point is correctly added and both solvers are now registered under the chat solver category.
Also applies to: 68-68
README.md (1)
44-92
: Excellent RAG solver documentation!The new RAG solver section provides clear explanations of:
- What RAG is and its benefits
- How the system works (search, retrieve, augment, generate)
- Detailed configuration with helpful comments
- Complete example configuration
This will greatly help users understand and implement the RAG functionality.
return retrieved_chunks | ||
except requests.exceptions.RequestException as e: | ||
LOG.error(f"Error during RAG search request to {search_url}: {e}") | ||
raise RequestException(f"Failed to retrieve context from vector store: {e}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use explicit exception chaining for better error traceability.
When re-raising exceptions, use from
to preserve the original exception context.
- raise RequestException(f"Failed to retrieve context from vector store: {e}")
+ raise RequestException(f"Failed to retrieve context from vector store: {e}") from e
- raise RequestException(f"Invalid JSON response from vector store: {e}")
+ raise RequestException(f"Invalid JSON response from vector store: {e}") from e
- raise RequestException(f"Failed to get LLM response: {e}")
+ raise RequestException(f"Failed to get LLM response: {e}") from e
- raise RequestException(f"Invalid JSON response from LLM: {e}")
+ raise RequestException(f"Invalid JSON response from LLM: {e}") from e
Also applies to: 139-139, 259-259, 262-262
🧰 Tools
🪛 Ruff (0.12.2)
136-136: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
🤖 Prompt for AI Agents
In ovos_solver_openai_persona/rag.py at lines 136, 139, 259, and 262, the code
raises new exceptions without using explicit exception chaining, which loses the
original error context. Modify each raise statement to use the syntax "raise ...
from e" to preserve the original exception context and improve error
traceability.
} | ||
LOG.debug(f"Sending streaming LLM request to {chat_completions_url} with payload: {payload}") | ||
|
||
full_answer = "" # To reconstruct the full answer for memory |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Consider memory consistency in streaming mode.
In streaming mode, if an error occurs mid-stream, the partial answer is still saved to memory. This could lead to incomplete responses being stored.
Consider only saving to memory after successful completion:
# Move memory update after the try-except block completes successfully
if self.memory and full_answer and not error_occurred:
self.qa_pairs.append((user_query, full_answer))
Also applies to: 334-334
🤖 Prompt for AI Agents
In ovos_solver_openai_persona/rag.py at lines 307 and 334, the code currently
saves partial answers to memory even if an error occurs during streaming, which
risks storing incomplete responses. Modify the logic to update memory only after
the entire streaming process completes successfully by moving the memory update
code outside and after the try-except block, and add a condition to check that
no error occurred before appending the QA pair.
Returns: | ||
The generated response as a string, or None if no valid response is produced. | ||
""" | ||
user_query = messages[-1]["content"] # Get the current user query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add bounds checking for message list access.
Accessing messages[-1]
without checking if the list is empty could raise an IndexError.
Add validation at the beginning of methods:
if not messages or messages[-1].get("role") != "user":
raise ValueError("Messages must contain at least one user message as the last element")
Also applies to: 227-227, 279-279, 289-289
🤖 Prompt for AI Agents
In ovos_solver_openai_persona/rag.py at lines 217, 227, 279, and 289, the code
accesses messages[-1] without checking if the messages list is empty, which can
cause an IndexError. Add a validation at the start of each method or code block
that accesses messages[-1] to check if messages is not empty and that the last
message has the role "user". If this condition is not met, raise a ValueError
with a clear message indicating that messages must contain at least one user
message as the last element.
enable_cache: bool = False, | ||
internal_lang: Optional[str] = None): | ||
""" | ||
Initializes the PersonaServerRAGSolver. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix docstring class name reference.
The docstring refers to "PersonaServerRAGSolver" but the class is named "OpenAIRAGSolver".
- Initializes the PersonaServerRAGSolver.
+ Initializes the OpenAIRAGSolver.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
Initializes the PersonaServerRAGSolver. | |
Initializes the OpenAIRAGSolver. |
🤖 Prompt for AI Agents
In ovos_solver_openai_persona/rag.py at line 30, the class docstring incorrectly
refers to "PersonaServerRAGSolver" instead of the actual class name
"OpenAIRAGSolver". Update the docstring to replace "PersonaServerRAGSolver" with
"OpenAIRAGSolver" to correctly reflect the class name.
Docstrings generation was requested by @JarbasAl. * #37 (comment) The following files were modified: * `ovos_solver_openai_persona/engines.py` * `ovos_solver_openai_persona/rag.py` * `setup.py`
Note Generated docstrings for this pull request at #38 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (5)
ovos_solver_openai_persona/rag.py (5)
30-30
: Fix docstring class name reference.The docstring refers to "PersonaServerRAGSolver" but the class is named "OpenAIRAGSolver".
- Initializes the PersonaServerRAGSolver. + Initializes the OpenAIRAGSolver.
136-136
: Use explicit exception chaining for better error traceability.When re-raising exceptions, use
from
to preserve the original exception context.- raise RequestException(f"Failed to retrieve context from vector store: {e}") + raise RequestException(f"Failed to retrieve context from vector store: {e}") from e- raise RequestException(f"Invalid JSON response from vector store: {e}") + raise RequestException(f"Invalid JSON response from vector store: {e}") from eAlso applies to: 139-139
222-222
: Add bounds checking for message list access.Accessing
messages[-1]
without checking if the list is empty could raise an IndexError.Add validation at the beginning of the method:
if not messages or messages[-1].get("role") != "user": raise ValueError("Messages must contain at least one user message as the last element")
264-264
: Use explicit exception chaining for better error traceability.When re-raising exceptions, use
from
to preserve the original exception context.- raise RequestException(f"Failed to get LLM response: {e}") + raise RequestException(f"Failed to get LLM response: {e}") from e- raise RequestException(f"Invalid JSON response from LLM: {e}") + raise RequestException(f"Invalid JSON response from LLM: {e}") from eAlso applies to: 267-267
285-285
: Add bounds checking for message list access.Accessing
messages[-1]
without checking if the list is empty could raise an IndexError.Add validation at the beginning of the method:
if not messages or messages[-1].get("role") != "user": raise ValueError("Messages must contain at least one user message as the last element")
🧹 Nitpick comments (2)
ovos_solver_openai_persona/rag.py (2)
161-161
: Consider using a proper tokenizer for accurate token counting.The current implementation uses word count as a token estimate, which can be inaccurate. For better precision, consider using a tokenizer library like
tiktoken
for OpenAI models or model-specific tokenizers.Example implementation:
# At the top of the file import tiktoken # In __init__ method self.tokenizer = tiktoken.encoding_for_model(self.llm_model) if self.llm_model else None # In _build_llm_messages chunk_tokens = len(self.tokenizer.encode(chunk)) if self.tokenizer else len(chunk.split())
399-399
: Fix class name references in comments.The comments incorrectly reference "PersonaServerRAGSolver" instead of "OpenAIRAGSolver".
- # --- Live Test Example for PersonaServerRAGSolver --- + # --- Live Test Example for OpenAIRAGSolver ---- print("--- Initializing PersonaServerRAGSolver ---") + print("--- Initializing OpenAIRAGSolver ---")Also applies to: 432-432
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
ovos_solver_openai_persona/engines.py
(4 hunks)ovos_solver_openai_persona/rag.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- ovos_solver_openai_persona/engines.py
🧰 Additional context used
🪛 Ruff (0.12.2)
ovos_solver_openai_persona/rag.py
136-136: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
139-139: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
264-264: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
267-267: Within an except
clause, raise exceptions with raise ... from err
or raise ... from None
to distinguish them from errors in exception handling
(B904)
self.max_context_tokens = self.config.get("max_context_tokens", 2000) | ||
|
||
if not self.api_url: | ||
raise ValueError("api_url must be set in config for PersonaServerRAGSolver") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix class name in error messages.
The error messages incorrectly reference "PersonaServerRAGSolver" instead of "OpenAIRAGSolver".
- raise ValueError("api_url must be set in config for PersonaServerRAGSolver")
+ raise ValueError("api_url must be set in config for OpenAIRAGSolver")
- raise ValueError("vector_store_id must be set in config for PersonaServerRAGSolver")
+ raise ValueError("vector_store_id must be set in config for OpenAIRAGSolver")
Also applies to: 69-69
🤖 Prompt for AI Agents
In ovos_solver_openai_persona/rag.py at lines 67 and 69, the error messages
incorrectly mention "PersonaServerRAGSolver" instead of the correct class name
"OpenAIRAGSolver". Update the error message strings to replace
"PersonaServerRAGSolver" with "OpenAIRAGSolver" to accurately reflect the class
raising the error.
companion to OpenVoiceOS/ovos-persona-server#11
Summary by CodeRabbit
New Features
Documentation
Chores