-
Notifications
You must be signed in to change notification settings - Fork 86
[RFC] Intel-ArangoDB joint RFC for txt2query microservice abstraction with vendor integrations #387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rbrugaro
wants to merge
1
commit into
opea-project:main
Choose a base branch
from
rbrugaro:RFC-txt2query
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
101 changes: 101 additions & 0 deletions
101
community/rfcs/25-06-17-GenAIComps-001-Introduce-txt2query-abstraction.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,101 @@ | ||
# Intriduce "txt2query" microservice with vendor integration to replace individual txt2cypher, txt2sql microservices | ||
|
||
Txt2query is a microservice abstraction for all different query languages. It follows same design philosophy as current dataprep, retriever and OPEAStore microservices. | ||
|
||
## Author(s) | ||
|
||
Rita Brugarolas Brufau | Intel Corporation | @rbrugaro | ||
|
||
Antony Mahanna | Arango DB | @aMahanna | ||
|
||
## Status | ||
|
||
`Under Review` | ||
|
||
## Objective | ||
|
||
List what problem will this solve? What are the goals and non-goals of this RFC? | ||
|
||
Currently there are [txt2cypher](https://github.com/opea-project/GenAIComps/tree/main/comps/text2cypher) and [txt2sql](https://github.com/opea-project/GenAIComps/tree/main/comps/text2sql) and any new vendor needs to contribute a new microservice. | ||
What this RFC proposes is that all the txt2query language vendor integrations inherit from a base class definition modeling the same OPEA design philosophy in dataprep, retriever, OPEAStore. | ||
|
||
|
||
## Motivation | ||
|
||
List why this problem is valuable to solve? Whether some related work exists? | ||
|
||
- Offer same design philosophy for developers across OPEA microservices. Enabling switching between database vendor integrations without code changes other than the vendor selection setting. | ||
- Reduce code duplication and maintenance across microservices | ||
|
||
## Design Proposal | ||
|
||
Folder structure to follow same as dataprep and retriever | ||
|
||
``` | ||
# ❯ tree . | ||
# . | ||
# ├── Dockerfile | ||
# ├── README.md | ||
# ├── __init__.py | ||
# ├── integrations | ||
# │ ├── aql.py # <----------- | ||
# │ ├── cypher.py | ||
# │ └── sql.py | ||
# ├── opea_text2query_microservice.py | ||
# └── requirements.txt | ||
``` | ||
|
||
Define a Txt2QueryInput base clase with the required arguments, in this case, the `input_text` we want to convert into a query, and an `execte_query` to indicate if that resulting query needs to be executed against the database or just the query string returned. | ||
|
||
For vendor integration (SQL, Cypher, AQL…) may chose to add custom arguments like connection string, prompt template, graph schema, etc… | ||
|
||
``` | ||
# TODO: Move this to GenAIComps/comps/cores/proto/api_protocol.py | ||
class Text2QueryInput(BaseModel): | ||
input_text: str | ||
execute_query: bool = True | ||
class Text2QueryInputSQL(Text2QueryInput): | ||
conn_str: Optional[PostgresConnection] = None | ||
class Text2QueryInputCypher(Text2QueryInput): | ||
conn_str: Optional[Neo4jConnection] = None | ||
class Text2QueryInputAQL(BaseModel): | ||
custom_propmt_template: str | None = None | ||
custom_schema: Dict[str, Any] | None = None | ||
# TODO: .... | ||
pass | ||
``` | ||
|
||
Below an example of `aql.py` registration | ||
``` | ||
from comps import CustomLogger, OpeaComponent, OpeaComponentRegistry, ServiceType | ||
|
||
@OpeaComponentRegistry.register("OPEA_TEXT2QUERY_AQL") | ||
class OpeaText2AQL(OpeaComponent): | ||
pass | ||
``` | ||
Each of the implementations in the integrations folder should implement those OpeaComponent methods: check_health, invoke.. | ||
|
||
## Alternatives Considered | ||
|
||
Continue with vendor specific microservices txt2sql, txt2cypher, txt2aql…. | ||
|
||
## Compatibility | ||
|
||
Pursuing this refactoring abstraction would require deprecating existing `txt2sql` and `txt2cypher` microservice and refactoring those GenAI examples that leverage them. | ||
|
||
## Miscellaneous | ||
|
||
Few other items for consideration: | ||
|
||
1. Current txt2cypher microservice includes gaudi native and gaudi utils scripts within the microservice folder. I don't know yet the reason why that is there. If we pursue this abstraction those should be relocated to llm or a more appropriate microservice. Need to check w @jeanyu-habana | ||
|
||
2. In the comps/agents/src/integrations/strategy/sqlagent there is an sql agent. We could see this sql2query as one of the building blocks for such agent solution that is more robust and offers query validation and retry mechanism for failing queries. | ||
|
||
3. Regarding engineering resourcing: ArangoDB is interested to contribute their vendor integration but still need to identify resources to introduce the txt2query class and following refactoring. | ||
|
||
List other information user and developer may care about, such as: | ||
|
||
- Performance Impact, such as speed, memory, accuracy. | ||
- Engineering Impact, such as binary size, startup time, build time, test times. | ||
- Security Impact, such as code vulnerability. | ||
- TODO List or staging plan. |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the design looks good! but the folder name
Txt2query
doesn't make sense. the nametxt2query
has nothing to do with the functions oftxt2sql
ortxt2cypher
and can be confusing.Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about
txt2ql
?ql
would representquery language
natural-language-service
is also another optionThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to accept other suggestions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
other options: txt2dbql ( txt 2 data base query language), txt2DBquery , nl2ql (natural language 2 query language)