You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/rag/ilab-rag-retrieval.md
+96-37Lines changed: 96 additions & 37 deletions
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,7 @@
1
1
# Design Proposal - Embedding Ingestion Pipeline And RAG-Based Chat
2
-
**TODOs**:
2
+
3
+
**TODOs**
4
+
3
5
* Vector store authentication options.
4
6
* Document versioning and data update policies.
5
7
* Unify prompt management in InstructLab. See (`chat_template`[configuration][chat_template] and
@@ -9,54 +11,66 @@
9
11
10
12
**Version**: 0.1
11
13
12
-
**Options to Rebuild Excalidraw Diagrams**:
14
+
**Options to Rebuild Excalidraw Diagrams**
15
+
13
16
* Using this [shareable link][shareable-excalidraw]
14
17
* Importing the scene from the exported [DSL](./images/rag-ingestion-and-chat.excalidraw)
15
18
16
19
## 1. Introduction
20
+
17
21
This document proposes enhancements to the `ilab` CLI to support workflows utilizing Retrieval-Augmented Generation
18
22
(RAG) artifacts within `InstructLab`. The proposed changes introduce new commands and options for the embedding ingestion
19
23
and RAG-based chat pipelines:
24
+
20
25
* A new `ilab data` sub-command to process customer documentation.
21
26
* Either from knowledge taxonomy or from actual user documents.
22
27
* A new `ilab data` sub-command to generate and ingest embeddings from pre-processed documents into a configured vector store.
23
28
* An option to enhance the chat pipeline by using the stored embeddings to augment the context of conversations, improving relevance and accuracy.
24
29
25
30
### 1.1 User Experience Overview
31
+
26
32
The commands are tailored to support diverse user experiences, all enabling the use of RAG functionality to enrich chat sessions.
27
33
28
34
### 1.2 Model Training Path
35
+
29
36
This flow is designed for users who aim to train their own models and leverage the source documents that support knowledge submissions to enhance the chat context:
**Note**: documents are processed using `instructlab-sdg` package and are defined using the docling v1 schema.
33
40
34
41
### 1.3 Taxonomy Path (no Training)
42
+
35
43
This flow is for users who have defined taxonomy knowledge but prefer not to train their own models. Instead, they aim to generate RAG artifacts from source documents to enhance the chat context:
* In particular, only the latest folder with name starting by `documents-` will be explored.
158
188
* It must include a subfolder `docling-artifacts` with the actual json files.
159
189
* In case the */path/to/processed/folder* parameter is provided, it is used to lookup the processed documents to ingest.
160
190
161
191
**Notes**:
162
-
* To ensure consistency and avoid issues with document versioning or outdated embeddings, the ingested collection will be cleared before execution.
192
+
193
+
* To ensure consistency and avoid issues with document versioning or outdated embeddings, the ingested collection will be cleared before execution.
163
194
This ensures it contains only the embeddings generated from the most recent run.
164
195
165
-
### Why We Need It
196
+
### Ingestion-Why We Need It
197
+
166
198
To populate embedding vector stores with pre-processed information that can be used at chat inference time.
167
199
168
-
#### Supported Databases
169
-
The command may support various vector database types. A default configuration will align with the selected
200
+
#### Ingestion-Supported Databases
201
+
202
+
The command may support various vector database types. A default configuration will align with the selected
170
203
InstructLab technology stack.
171
204
172
-
#### Usage
173
-
The generated embeddings can later be retrieved from a vector database and converted to text, enriching the
205
+
#### Ingestion-Usage
206
+
207
+
The generated embeddings can later be retrieved from a vector database and converted to text, enriching the
174
208
context for RAG-based chat pipelines.
175
209
176
210
### 2.5 Embedding Ingestion Pipeline Options
211
+
177
212
```bash
178
213
% ilab data ingest --help
179
214
Usage: ilab data ingest [OPTIONS]
@@ -206,23 +241,29 @@ Options:
206
241
| Name of the embedding model. | **TBD** | `--retriever-embedder-model-name` | `ILAB_EMBEDDER_MODEL_NAME` |
207
242
208
243
### 2.6 RAG Chat Pipeline Command
244
+
209
245
The proposal is to add a `chat.rag.enable` configuration (or the equivalent `--rag` flag) to the `model chat` command, like:
210
-
```
246
+
247
+
```bash
211
248
ilab model chat --rag
212
249
```
213
250
214
251
#### Command Purpose
215
-
This command enhances the existing `ilab model chat` functionality by integrating contextual information retrieved from user-provided documents,
252
+
253
+
This command enhances the existing `ilab model chat` functionality by integrating contextual information retrieved from user-provided documents,
216
254
enriching the conversational experience with relevant insights.
217
255
218
256
#### Revised chat pipeline
257
+
219
258
* Start with the user's input, `user_query`.
220
259
* Use the given `user_query` to retrieve relevant contextual information from the embedding database (semantic search).
221
260
* Append the retrieved context to the original LLM request.
222
261
* Send the context augmented request to the LLM and return the response to the user.
223
262
224
263
#### Prompt Template
264
+
225
265
A default non-configurable template is used with parameters to specify the user query and the context, like:
266
+
226
267
```text
227
268
Given the following information, answer the question.
228
269
Context:
@@ -235,9 +276,11 @@ Answer:
235
276
Future extensions should align prompt management with the existing InstructLab design.
236
277
237
278
### 2.7 RAG Chat Commands
279
+
238
280
The `/r` command may be added to the `ilab model chat` command to dynamically toggle the execution of the RAG pipeline.
239
281
240
282
The current status could be displayed with an additional marker on the chat status bar, as in (top right corner):
283
+
241
284
```console
242
285
>>> /h [RAG][S][default]
243
286
╭───────────────────────────────────────────────────────────── system ──────────────────────────────────────────────────────────────╮
@@ -265,6 +308,7 @@ The current status could be displayed with an additional marker on the chat stat
265
308
```
266
309
267
310
### 2.8 RAG Chat Options
311
+
268
312
As we stated in [2.1 Working Assumptions](#21-working-assumption), we will introduce new configuration options for the spceific `chat` command,
269
313
but we'll use flags and environment variables for the options that come from the embedding ingestion pipeline command.
270
314
@@ -279,6 +323,7 @@ but we'll use flags and environment variables for the options that come from the
279
323
|| Name of the embedding model. |**TBD**|`--retriever-embedder-model-name`|`ILAB_EMBEDDER_MODEL_NAME`|
280
324
281
325
Equivalent YAML document for the newly proposed options:
326
+
282
327
```yaml
283
328
chat:
284
329
enable: false
@@ -293,11 +338,11 @@ chat:
293
338
```
294
339
295
340
### 2.9 References
341
+
296
342
* [Haystack-DocumentSplitter](https://github.com/deepset-ai/haystack/blob/f0c3692cf2a86c69de8738d53af925500e8a5126/haystack/components/preprocessors/document_splitter.py#L55) is temporarily adopted with default settings until a splitter based on the [docling chunkers][chunkers] is integrated
>**ℹ️ Note:** This stack is still under review. The proposed list represents potential candidates based on the current state of discussions.
310
356
311
357
The following technologies form the foundation of the proposed solution:
@@ -315,48 +361,61 @@ The following technologies form the foundation of the proposed solution:
315
361
* [Docling](https://github.com/DS4SD/docling): Document processing tool. For more details, refer to William’s blog, [Docling: The missing document processing companion for generative AI](https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai).
316
362
317
363
## 3. Design Considerations
318
-
* As decided in [PR #165](https://github.com/instructlab/dev-docs/pull/165), functions related to RAG ingestion
364
+
365
+
* As decided in [PR #165](https://github.com/instructlab/dev-docs/pull/165), functions related to RAG ingestion
319
366
and retrieval are located in the dedicated folder `src/instructlab/rag`.
320
-
* The solution must minimize changes to existing modules by importing the required functions from the
367
+
* The solution must minimize changes to existing modules by importing the required functions from the
321
368
`instructlab.rag` package.
322
369
* The solution must adopt a pluggable design to facilitate seamless integration of additional components:
323
370
***Vector stores**: Support all selected implementations (e.g., Milvus).
324
-
***Embedding models**: Handle embedding models using the appropriate embedder implementation for the
371
+
***Embedding models**: Handle embedding models using the appropriate embedder implementation for the
325
372
chosen framework (e.g., Haystack).
326
373
* Consider using factory functions to abstract implementations and enhance code flexibility.
327
-
* Optional dependencies for3rd party integrations should be definedin`pyproject.toml` and documented for
374
+
* Optional dependencies for3rd party integrations should be definedin`pyproject.toml` and documented for
328
375
clarity. Users can install optional components with commands like:
329
376
330
377
`pip install instructlab[milvus]`
331
378
332
379
3rd party dependencies may also be grouped in files such as `requirements/milvus.txt`.
333
380
334
381
## 4. Future Enhancements
335
-
### 4.1 Model Evaluation
382
+
383
+
### 4.1 Model Evaluation
384
+
336
385
**TODO** A separate ADR will be defined.
337
386
338
387
### 4.2 Advanced RAG retrieval steps
339
-
- [Ranking retriever's result][ranking]:
388
+
389
+
* [Ranking retriever's result][ranking]:
390
+
340
391
```bash
341
392
ilab model chat --rag --ranking --ranking-top-k=5 --ranking-model=cross-encoder/ms-marco-MiniLM-L-12-v2
342
393
```
343
-
- [Query expansion][expansion]:
394
+
395
+
* [Query expansion][expansion]:
396
+
344
397
```bash
345
398
ilab model chat --rag --query-expansion --query-expansion-prompt="$QUERY_EXPANSION_PROMPT" --query-expansion-num-of-queries=5
346
399
```
347
-
- Using retrieval strategy:
400
+
401
+
* Using retrieval strategy:
402
+
348
403
```bash
349
404
ilab model chat --rag --retrieval-strategy query-expansion --retrieval-strategy-options="prompt=$QUERY_EXPANSION_PROMPT;num_of_queries=5"
350
405
```
351
-
- ...
406
+
407
+
* ...
352
408
353
409
### 4.3 Containerized Indexing Service
410
+
354
411
Generate a containerized RAG artifact to expose a `/query` endpoint that can serve as an alternative source :
412
+
355
413
```bash
356
414
ilab data ingest --build-image --image-name=docker.io/user/my_rag_artifacts:1.0
0 commit comments