This project implements a Retrieval-Augmented Generation (RAG) system that combines information retrieval with generative AI to answer queries based on a given dataset.
- PDF Document Processing: Extracts text from multiple PDF files.
- Text Splitting: Divides extracted text into manageable passages.
- Embeddings Generation: Creates vector representations of text passages.
- Efficient Retrieval: Uses FAISS for fast similarity search.
- Generative AI Integration: Leverages Google's Generative AI for answer generation.
- Interactive Query Interface: Allows users to input queries and receive AI-generated answers.
- Python 3.x
- Libraries:
re
: For text preprocessing and splitting.faiss
: Facebook AI Similarity Search for efficient similarity search.PyPDF2
: For reading and extracting text from PDF files.numpy
: For numerical operations.google.generativeai
: Google's Generative AI API for text generation.sentence_transformers
: For generating text embeddings.
- RAG Class: The core component that integrates all functionalities.
- Data Loading: Processes PDF files and extracts text.
- Text Processing: Splits text into passages for efficient retrieval.
- Embedding Generation: Uses SentenceTransformer to create vector representations.
- Indexing: Utilizes FAISS for creating a searchable index of embeddings.
- Retrieval: Finds relevant passages based on query similarity.
- Answer Generation: Uses Google's Generative AI to produce answers based on retrieved context.
The system initializes with specified model names and data paths. Users can then input queries, and the system will retrieve relevant information and generate answers.
- Install required libraries:
pip install faiss-cpu PyPDF2 numpy google-generativeai sentence-transformers
- Set up Google Generative AI API key.
- Prepare PDF documents in the specified data path.
- Run the script and start querying!
Note: Ensure you have the necessary permissions and API keys to use Google's Generative AI service.