A powerful application that allows you to chat with your PDF documents using advanced AI technologies. Upload any PDF document and ask questions about its content in natural language.
- PDF Processing: Upload and process PDF documents of any size
- Natural Language Queries: Ask questions about your document in plain English
- Accurate Answers: Get precise answers with relevant context from the document
- Source Transparency: View the exact sections of the document used to generate answers
- Efficient Processing: PDFs are processed only once after upload, not with each query
This application leverages state-of-the-art technologies:
- Streamlit: For the interactive web interface
- LangChain: For orchestrating the RAG workflow
- LangGraph: For creating a structured retrieval and generation pipeline
- HuggingFace Embeddings: Using the "thenlper/gte-large" model for semantic understanding
- Groq LLM: Using Llama 4 models for high-quality answer generation
- PyPDF Loader: For extracting text from PDF documents
- Vector Storage: For efficient semantic search capabilities
The application uses a Retrieval Augmented Generation (RAG) approach:
-
Document Processing:
- When you upload a PDF, the document is processed and converted to vector embeddings
- The embeddings capture the semantic meaning of the document's content
- The processed document is stored in-memory for quick retrieval
-
Query Processing:
- When you ask a question, the system finds the most relevant sections from your document
- These sections are used as context for the AI to generate accurate answers
- The system ranks and sorts results by page number for better context coherence
-
Answer Generation:
- A Llama 4 model from Groq evaluates your question against the retrieved context
- The model generates a precise answer based solely on the document's content
- You can view both the answer and the source context used to generate it
- Python 3.9 or higher
- Groq API key
-
Clone the repository
git clone <repository-url> cd chat-with-document
-
Install the required packages
pip install -r requirements.txt
-
Set up your Groq API key
- Create a
.streamlit/secrets.toml
file with your API key:GROQ_API_KEY = "your-api-key-here"
- Create a
streamlit run app.py
- Upload a PDF: Use the file uploader to select your PDF document
- Ask a Question: Type your question in the text input field
- View the Answer: The system will display the answer based on the document content
- Explore Context: Expand the "Show source context" section to see which parts of the document were used
- The application works best with text-based PDFs; scanned documents may not process correctly
- Very large PDFs may take longer to process initially
- The quality of answers depends on the clarity and structure of the source document
- Support for additional document formats (DOCX, TXT, etc.)
- Multi-document querying
- Persistent storage for processed documents
- Chat history to maintain conversation context
- LangChain for the RAG implementation
- Groq for the powerful LLM API
- HuggingFace for the embedding models
- Streamlit for the web interface framework