The RAG return inappropriate (general) content #8451

snassimr · 2023-10-23T23:59:36Z

snassimr
Oct 23, 2023

Question Validation

I have searched both the documentation and discord for an answer.

Question

Hi,

I have 9 pdf files and want the RAG answers only question regarding one or seberal PDFs that in collection.
Suprisingly , I am getting some answer to question "How to prepare pizza ?" however my collection of pdfs nothing to do with pizza. And it's not clear why response's score is above 0.7 . How I can rely on it ?

What the way block such answers that not grounded in one or several PDFs and provide some general answer that PDFs don't contain relevant information ?

Thanks

@dosu-bot · 2023-10-24T00:08:22Z

dosubot[bot]
bot Oct 24, 2023

🤖

Hi,

Thanks for reaching out. It seems like you're experiencing an issue where the RAG model is returning answers that are not grounded in the content of your PDFs. This can happen if the model is not properly configured to only consider the content of your PDFs when generating answers.

To address this, you can use the following code to make the system answer only based on the content of your document. If it can't find the corresponding answer in the document, it will reply with a fixed copy.

response = index.query(
    "how do i?",
    optimizer=SentenceEmbeddingOptimizer(percentile_cutoff=0.5),
    service_context=service_context,
    response_mode="compact",
    similarity_cutoff=0.8
)

#if the output of response is None print sorry i dont know
if str(response) == "None":
    print("Hmm, I don't know enough")
else:
    print(response)

This solution was suggested in a similar issue here.

Additionally, you can use several techniques to configure the RAG to only provide answers that are grounded in one or several PDFs in your collection and avoid general answers that are not relevant to the information contained in the PDFs. These techniques include decoupling chunks used for retrieval vs. chunks used for synthesis, structured retrieval for larger document sets, dynamically retrieving chunks depending on your task, and optimizing context embeddings. You can find more details about these techniques in the LlamaIndex documentation.

Finally, you can also fine-tune the model to better suit your needs. The generate_ce_fine_tuning_dataset function in the llama_index/finetuning/cross_encoders/dataset_gen.py file can be used to generate a fine-tuning dataset for a Cross Encoder model. This function retrieves the top-k most relevant documents for each question and uses the OpenAI model to determine if the retrieved document is relevant to the question. If the model's response is "yes", it creates a CrossEncoderFinetuningDatasetSample with a score of 1. If the response is "no", it creates a CrossEncoderFinetuningDatasetSample with a score of 0. These samples are then used to fine-tune the model.

I hope this helps! Let me know if you have any other questions.

Sources

This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

0 replies

snassimr · 2023-10-24T05:38:33Z

snassimr
Oct 24, 2023
Author

I tried to set percentile_cutoff and similarity_cutoff. I am still getting irrelevant answers.
I want to try to resolve it with some prompt asking to base answers only on documents . Any example for this ?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The RAG return inappropriate (general) content #8451

{{title}}

Replies: 2 comments

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

The RAG return inappropriate (general) content #8451

snassimr Oct 23, 2023

Question Validation

Question

Replies: 2 comments

dosubot[bot] bot Oct 24, 2023

Sources

snassimr Oct 24, 2023 Author

snassimr
Oct 23, 2023

dosubot[bot]
bot Oct 24, 2023

snassimr
Oct 24, 2023
Author