A production-grade Retrieval-Augmented Generation system for querying personal study notes and documents. Built to support exam preparation by enabling natural language search across multiple document sources.
Most students accumulate notes across PDFs, text files, and web resources. Finding specific information across all of them during exam prep is slow and frustrating. This system lets you ask questions in plain English and get grounded, cited answers from your own material.
Documents (PDF / TXT / URL)
↓
Load & Parse
↓
Recursive Chunking (chunk_size=500, overlap=50)
↓
Embedding (all-MiniLM-L6-v2, 384 dimensions)
↓
ChromaDB Vector Store (persistent)
↓
User Query -> Embed -> MMR Similarity Search (k=4)
↓
Retrieved Chunks + Prompt V2
↓
Llama 3.1 70B via Groq
↓
Grounded Answer with Source Citations
| Component | Technology |
|---|---|
| Embedding Model | all-MiniLM-L6-v2 (sentence-transformers, local) |
| Vector Database | ChromaDB (persistent, local) |
| LLM | Llama 3.1 70B via Groq API |
| RAG Framework | LangChain |
| Backend | FastAPI |
| Frontend | Streamlit |
| Containerization | Docker |
Systematic hyperparameter search across 12 configurations (k in {2, 4, 8}, chunk_size in {200, 300, 400, 500}) using a custom LLM-as-judge evaluation framework with four metrics:
- Faithfulness: is every claim in the answer grounded in retrieved context?
- Answer Relevancy: does the answer address the question asked?
- Context Recall: did retrieval find the chunks needed to answer?
- Completeness: did the system actually answer or deflect?
| Metric | Score |
|---|---|
| Faithfulness | 0.9222 |
| Answer Relevancy | 0.9778 |
| Context Recall | 0.7818 |
| Completeness | 0.7273 |
| Technique | Completeness | Verdict |
|---|---|---|
| Baseline | 0.6818 | Starting point |
| + Query Rewriting | 0.5545 | Dropped |
| + MMR Retrieval | 0.7273 | Kept |
| + Prompt V2 | 0.7727 | Best overall |
| + Hybrid Search (0.5/0.5) | 0.7273 | Dropped |
| + Hybrid Search (0.7/0.3) | 0.7273 | No improvement |
Key findings:
Larger chunks (500) consistently outperform smaller chunks across all k values, preserving semantic context better for these document types. k=4 balances retrieval coverage without adding noise. Query rewriting hurt completeness on factual Q&A since raw questions are already precise search queries for this domain. MMR improved completeness by reducing redundant chunk retrieval. Hybrid search added noise for this dataset and was rejected. LLM-as-judge evaluation requires a completeness metric to avoid inflated scores from "I don't know" responses being rated as faithful.
rag-project/
├── src/
│ ├── document_loader.py # PDF, TXT, URL loading + chunking
│ ├── embeddings.py # HuggingFace embedding model
│ ├── vector_store.py # ChromaDB operations (add, search, clear)
│ └── rag_chain.py # LangChain RAG chains + prompt variants
├── api/
│ └── app.py # FastAPI backend
├── app/
│ └── streamlit_app.py # Streamlit frontend
├── evaluation/
│ ├── evaluate.py # LLM-as-judge evaluation harness
│ ├── test_set.json # 11 manually curated test questions
│ └── results_*.json # All experiment results
├── data/ # User documents (gitignored)
├── .chroma/ # ChromaDB persistent store (gitignored)
├── .env # API keys (gitignored)
├── Dockerfile
├── docker-compose.yml
└── requirements.txt
Prerequisites: Python 3.10+, Groq API key (free at console.groq.com)
git clone https://github.com/yourusername/rag-project
cd rag-project
python -m venv venv
venv\Scripts\activate # Windows
source venv/bin/activate # Mac/Linux
pip install -r requirements.txt
echo "GROQ_API_KEY=your_key_here" > .env
streamlit run app/streamlit_app.pyDocker:
docker-compose up --build- Upload PDF or TXT files, or paste a URL in the sidebar
- Wait for documents to be processed and indexed
- Ask questions in plain English
- Answers include source citations showing which document was used
- Use "Clear Session" to start fresh with new documents
| Type | Notes |
|---|---|
| Text-based PDFs only; scanned PDFs require OCR preprocessing | |
| TXT | Plain text files, UTF-8 encoded |
| URL | Web pages parsed and cleaned automatically |
- Scanned and handwritten PDFs are not supported without OCR preprocessing
- Context recall drops for very specific factual queries due to embedding model limitations
- Groq free tier has rate limits; sustained heavy usage may hit them
- Clearing session removes all indexed documents from ChromaDB
- Proper RAG evaluation requires more than eyeballing answers. LLM-as-judge needs a completeness metric to avoid rewarding "I don't know" responses
- Chunk size affects answer quality more than retrieval depth for this document type
- Query rewriting helps for conversational queries but hurts precise factual Q&A
- PDF parsing quality directly impacts embedding quality and downstream retrieval
- MMR retrieval consistently improves completeness by reducing redundant chunk selection
- Agentic extension: ReAct agent that decides between local retrieval and web search
- Web search integration via Tavily API
- OCR support for scanned PDFs and handwritten notes
- Conversation memory for follow-up questions
- Multi-user support with separate ChromaDB collections
[Energy Consumption Forecasting](github link): LSTM time series forecasting with FastAPI and Docker