🧠 DocuRAG

Production-Grade RAG Document Intelligence

DocuRAG lets you upload any document (PDF, Word, TXT, Markdown, HTML) and have a precise, grounded conversation with it. Every answer is backed by citations from your source material — no hallucinations, no general knowledge, strictly your documents.

🔗 Live Demo: docurag-grxzmrohpnhxnd2znmrkqq.streamlit.app

Architecture

User Query
    │
    ▼
ConversationMemory (Summary Buffer)   ←── per-session, isolated history
    │
    ▼
RetrieverFactory
    ├── Stage 1: MultiQueryRetriever  (LLM rewrites query into 3 variants)
    ├── Stage 2: EnsembleRetriever   (Semantic Pinecone + BM25 hybrid search)
    ├── Stage 3: FlashrankReranker   (Cross-encoder selects top-N chunks)
    └── [Cached per user — invalidated on new uploads]
    │
    ▼
ChatPromptTemplate                    ←── system + history + context + question
    │
    ▼
LLM (Groq llama-3.3-70b-versatile)
    │
    ▼
Answer + Source Citations             ←── document name, page number, excerpt

Key Components

File	Responsibility
`src/config.py`	Centralised settings — reads from `.env` locally, Streamlit secrets in production
`src/dataloader.py`	Multi-format document parsing + semantic chunking with metadata
`src/vector_store.py`	Pinecone (production) + ChromaDB (local) with multi-tenant isolation
`src/retriever.py`	4-stage retrieval pipeline: rewrite → hybrid search → rerank (cached per user)
`src/conversation_history.py`	Summary buffer memory — keeps last k messages, summarises older ones
`src/rag_engine.py`	Orchestrates all components into a single clean interface
`app.py`	Streamlit chat UI with document management and source citation panel
`main.py`	FastAPI REST backend
`evaluate.py`	RAGAS evaluation framework — faithfulness, relevancy, precision, recall

Quickstart

1. Clone and install

git clone https://github.com/vic-comm/DocuRAG.git
cd DocuRAG
python -m venv venv
source venv/bin/activate       # Windows: venv\Scripts\activate
pip install -r requirements.txt

2. Configure environment

cp .env.example .env

Open .env and fill in your keys:

# Required
GROQ_API_KEY=your_groq_api_key        # Free at console.groq.com
GOOGLE_API_KEY=your_google_api_key    # For Gemini embeddings

# For production (Pinecone)
PINECONE_API_KEY=your_pinecone_key    # Free tier at pinecone.io
PINECONE_INDEX_NAME=documind
USE_PINECONE=true

3. Run the Streamlit UI

streamlit run app.py

Visit http://localhost:8501, upload a document, and start chatting.

4. Run the API (optional)

python main.py
# Interactive API docs at http://localhost:8000/docs

Tech Stack

Layer	Technology	Purpose
LLM	Groq `llama-3.3-70b-versatile`	Answer generation + query rewriting
Embeddings	Google `gemini-embedding-001` (768-dim)	Semantic vector representation
Vector Store	Pinecone (prod) / ChromaDB (local)	Persistent similarity search
Keyword Search	BM25	Exact match retrieval
Reranking	FlashRank cross-encoder	Precision scoring of retrieved chunks
Orchestration	LangChain	Pipeline chaining and memory
Backend	FastAPI	REST API
Frontend	Streamlit	Chat UI
Evaluation	RAGAS	Retrieval + answer quality metrics

Retrieval Pipeline — Why 4 Stages?

Stage	What it does	Why it matters
Query rewriting	LLM generates 3 query variants	Catches documents using different terminology than the user
Hybrid search	Semantic (Pinecone) + keyword (BM25) via Reciprocal Rank Fusion	BM25 catches exact matches semantic search misses; semantic catches meaning keyword search misses
Cross-encoder reranking	FlashRank scores each chunk against the query jointly	Dramatically improves precision — picks the 5 best from the top 20
Summary buffer memory	Keeps last k messages in full, summarises older ones	Handles long multi-turn conversations without exceeding context window

Evaluation Framework

DocuRAG includes a RAGAS-powered evaluation module that measures retrieval and answer quality scientifically.

# Run evaluation with built-in questions
python evaluate.py --user your_user_id

# Run with your own testset
python evaluate.py --user your_user_id --testset tests/eval_questions.yaml

# Save report to CSV
python evaluate.py --user your_user_id --output reports/eval.csv

Metrics

Metric	What it measures	Target
Faithfulness	Is the answer grounded in retrieved context?	> 0.85
Answer Relevancy	Does the answer address the question?	> 0.80
Context Precision	Are retrieved chunks actually relevant?	> 0.75
Context Recall	Were all relevant chunks retrieved? (needs ground truth)	> 0.70

API Endpoints

Method	Endpoint	Description
`GET`	`/health`	Health check
`POST`	`/ingest`	Upload and index a document (async)
`POST`	`/query`	Ask a question, receive answer + citations
`GET`	`/sources`	List uploaded documents for current user
`DELETE`	`/sources/{name}`	Remove a document from the knowledge base
`POST`	`/session/clear`	Clear conversation history for a session

Example: Query via curl

curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"question": "What are the main findings?", "session_id": "my-session"}'

Response:

{
  "answer": "The main findings include...",
  "sources": [
    {
      "source": "report.pdf",
      "page": 4,
      "excerpt": "...relevant excerpt from the document..."
    }
  ],
  "session_id": "my-session"
}

Running Tests

pytest tests/ -v

# With coverage report
pytest tests/ --cov=src --cov-report=term-missing

Deployment

Streamlit Cloud (live demo)

The app is deployed at docurag-grxzmrohpnhxnd2znmrkqq.streamlit.app.

To deploy your own instance:

Fork this repo
Go to share.streamlit.io → New app → select your fork
Set app.py as the main file
Add your secrets in Advanced Settings:

GROQ_API_KEY = "..."
GOOGLE_API_KEY = "..."
PINECONE_API_KEY = "..."
PINECONE_INDEX_NAME = "documind"
USE_PINECONE = "true"

What's Next — OmniSupport

DocuRAG is the foundation. The next project, OmniSupport, extends this into a fully production-hardened system:

Async document ingestion via Celery + Redis task queue
PostgreSQL full-text search for persistent, scalable BM25
Fine-tuning pipeline for domain-specific embedding adaptation
Streaming responses via WebSocket
Prometheus + Grafana monitoring with latency percentile tracking
Real JWT authentication (Auth0 / Supabase)
Citation verification — programmatic hallucination detection

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.devcontainer		.devcontainer
src		src
tests		tests
.DS_Store		.DS_Store
.gitignore		.gitignore
.python-version		.python-version
Evaluate.py		Evaluate.py
Makefile		Makefile
Procfile		Procfile
README.md		README.md
app.py		app.py
cli.py		cli.py
demo.py		demo.py
main.py		main.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 DocuRAG

Architecture

Key Components

Quickstart

1. Clone and install

2. Configure environment

3. Run the Streamlit UI

4. Run the API (optional)

Tech Stack

Retrieval Pipeline — Why 4 Stages?

Evaluation Framework

Metrics

API Endpoints

Example: Query via curl

Running Tests

Deployment

Streamlit Cloud (live demo)

What's Next — OmniSupport

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🧠 DocuRAG

Architecture

Key Components

Quickstart

1. Clone and install

2. Configure environment

3. Run the Streamlit UI

4. Run the API (optional)

Tech Stack

Retrieval Pipeline — Why 4 Stages?

Evaluation Framework

Metrics

API Endpoints

Example: Query via curl

Running Tests

Deployment

Streamlit Cloud (live demo)

What's Next — OmniSupport

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages