You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
AgenticRAG Operators are a specialized suite of tools designed for agentic RAG (Retrieval-Augmented Generation) tasks, with a particular focus on generating question-and-answer (QA) samples from provided text to support RL-based agentic RAG training. These operators are primarily categorized into two groups: **Data Generation Operators (Generators)** and **Processing Operators (Processors)**.
11
+
AgenticRAG Operators are a specialized suite of tools designed for agentic RAG (Retrieval-Augmented Generation) tasks, with a particular focus on generating question-and-answer (QA) samples from provided text to support RL-based agentic RAG training. These operators are primarily categorized into two groups: **Data Generation Operators (Generators)** and **Evaluating Operators (Evaluators)**.
12
12
13
13
- 🚀 **Independent Innovation**: Core algorithms developed from scratch, filling existing algorithmic gaps or further improving performance, breaking through current performance bottlenecks.
14
14
- ✨ **Open Source First**: First integration of this operator into mainstream community frameworks, facilitating use by more developers and achieving open-source sharing.
@@ -28,31 +28,19 @@ Data Generation Operators are responsible for producing RAG-related RL training
28
28
</thead>
29
29
<tbody>
30
30
<tr>
31
-
<td class="tg-0pky">AutoPromptGenerator🚀</td>
32
-
<td class="tg-0pky">Prompt Synthesis</td>
33
-
<td class="tg-0pky">Generates prompts for question and answer creation tailored to specific content by leveraging large language models.</td>
<td class="tg-0pky">Generates high-quality questions and verifiable answers based on the given text content.</td>
40
34
<td class="tg-0pky">Refined and improved from <a href="https://github.com/OPPO-PersonalAI/TaskCraft" target="_blank">https://github.com/OPPO-PersonalAI/TaskCraft</a></td>
41
35
</tr>
42
36
<tr>
43
-
<td class="tg-0pky">QAGenerator✨</td>
44
-
<td class="tg-0pky">Question and Answer Generation</td>
45
-
<td class="tg-0pky">Produces questions and answers for given text content using large language models and generated prompts.</td>
<td class="tg-0pky">Combines multiple QA pairs to generate new, more difficult QA pairs.</td>
52
40
<td class="tg-0pky">Refined and improved from <a href="https://github.com/OPPO-PersonalAI/TaskCraft" target="_blank">https://github.com/OPPO-PersonalAI/TaskCraft</a></td>
<td class="tg-0pky">Expands individual QA pairs into new, more challenging QA pairs.</td>
58
46
<td class="tg-0pky">Refined and improved from <a href="https://github.com/OPPO-PersonalAI/TaskCraft" target="_blank">https://github.com/OPPO-PersonalAI/TaskCraft</a></td>
@@ -75,52 +63,22 @@ Data evaluation operators are responsible for assessing reinforcement learning t
75
63
</thead>
76
64
<tbody>
77
65
<tr>
78
-
<td class="tg-0pky">QAScorer✨</td>
79
-
<td class="tg-0pky">QA Scoring</td>
80
-
<td class="tg-0pky">Evaluates the quality of questions, answer consistency, answer verifiability, and downstream utility for QA pairs and their related content.</td>
<td class="tg-0pky">Assesses the verifiability of answers with and without the presence of gold documents in QA tasks.</td>
87
69
<td class="tg-0pky">-</td>
88
70
</tr>
89
71
</tbody>
90
72
</table>
91
73
92
-
93
-
## Processing Operators
94
-
95
-
Processing Operators are mainly tasked with choosing suitable data.
96
-
97
-
<tableclass="tg">
98
-
<thead>
99
-
<tr>
100
-
<th class="tg-0pky">Name</th>
101
-
<th class="tg-0pky">Application Type</th>
102
-
<th class="tg-0pky">Description</th>
103
-
<th class="tg-0pky">Official Repository or Paper</th>
104
-
</tr>
105
-
</thead>
106
-
<tbody>
107
-
<tr>
108
-
<td class="tg-0pky">ContentChooser🚀</td>
109
-
<td class="tg-0pky">Content chooser</td>
110
-
<td class="tg-0pky">Selects a subset of content from a larger collection for further processing within the pipeline.</td>
111
-
<td class="tg-0pky">-</td>
112
-
</tr>
113
-
</tbody>
114
-
</table>
115
-
116
74
## Operator Interface Usage Instructions
117
75
118
76
Specifically, for operators that specify storage paths or call models, we provide encapsulated **model interfaces** and **storage object interfaces**. You can predefine model API parameters for operators in the following way:
119
77
120
78
```python
121
79
from dataflow.llmserving import APILLMServing_request
122
80
123
-
api_llm_serving= APILLMServing_request(
81
+
llm_serving= APILLMServing_request(
124
82
api_url="your_api_url",
125
83
model_name="model_name",
126
84
max_workers=5
@@ -140,44 +98,15 @@ from dataflow.utils.storage import FileStorage
140
98
)
141
99
```
142
100
143
-
The `api_llm_serving` and `self.storage` used in the following text are the interface objects defined here. Complete usage examples can be found in `test/test_agentic_rag.py`.
101
+
The `llm_serving` and `self.storage` used in the following text are the interface objects defined here. Complete usage examples can be found in `DataFlow/dataflow/statics/pipelines/api_pipelines/agentic_rag_pipeline.py`.
144
102
145
103
For parameter passing, the constructor of operator objects mainly passes information related to operator configuration, which can be configured once and called multiple times; while the `X.run()` function passes `key` information related to IO. Details can be seen in the operator description examples below.
146
104
147
105
## Detailed Operator Descriptions
148
106
149
107
### Data Generation Operators
150
108
151
-
#### 1. AutoPromptGenerator
152
-
153
-
**Function Description:** This operator is specifically designed to generate specialized prompts for creating question-and-answer pairs based on given text content.
154
-
155
-
**Input Parameters:**
156
-
157
-
-`__init__()`
158
-
-`llm_serving`: Large language model interface object to use (default: predefined value above)
159
-
-`run()`
160
-
-`storage`: Storage interface object (default: predefined value above)
161
-
-`input_key`: Input text content field name (default: "text")
162
-
-`output_key`: Output generated prompt field name (default: "generated_prompt")
**Function Description:** This operator generates multiple evaluation scores for the produced question-and-answer pairs.
310
-
311
-
**Input Parameters:**
312
-
313
-
-`__init__()`
314
-
-`llm_serving`: Large language model interface object to use (default: predefined value above)
315
-
-`run()`
316
-
-`storage`: Storage interface object (default: predefined value above)
317
-
-`input_question_key`: Input text content field name containing the generated questions (default: "generated_question")
318
-
-`input_answer_key`: Input text content field name containing the generated answers (default: "generated_answer")
319
-
-`output_question_quality_key`: Output field name for question quality grades (default: "question_quality_grades")
320
-
-`output_question_quality_feedback_key`: Output field name for detailed feedback on question quality (default: "question_quality_feedbacks")
321
-
-`output_answer_alignment_key`: Output field name for answer alignment grades (default: "answer_alignment_grades")
322
-
-`output_answer_alignment_feedback_key`: Output field name for detailed feedback on answer alignment (default: "answer_alignment_feedbacks")
323
-
-`output_answer_verifiability_key`: Output field name for answer verifiability grades (default: "answer_verifiability_grades")
324
-
-`output_answer_verifiability_feedback_key`: Output field name for detailed feedback on answer verifiability (default: "answer_verifiability_feedbacks")
325
-
-`output_downstream_value_key`: Output field name for downstream value grades (default: "downstream_value_grades")
326
-
-`output_downstream_value_feedback_key`: Output field name for detailed feedback on downstream value (default: "downstream_value_feedbacks")
327
-
328
-
**Key Features:**
329
-
330
-
- Generates multiple useful scores for further filtering
0 commit comments