Explainable RAG with Knowledge Graphs Web Application

A modern, lean web application implementing Retrieval-Augmented Generation (RAG) with knowledge graph-based explanations. This system allows users to upload documents, ask questions, and receive grounded answers with visual knowledge graphs and entity relationships.

🌟 Features

Document Upload: Support for PDF, TXT, and Markdown files
Smart Retrieval: Vector-based semantic search using embeddings
Entity Extraction: Automatic entity recognition using NER
Knowledge Graphs: Visual representation of entity relationships
AI-Powered Answers: LLM integration for generating grounded responses
Explainability: Complete traceability of answers to source documents
Modern UI: Responsive React-based interface with interactive graph visualization

🏗️ Architecture

Frontend (React + Vite)
    ↓
API Layer (FastAPI)
    ↓
Backend Pipeline:
  - Document Preprocessing (chunking, embedding)
  - Retrieval (FAISS vector search)
  - Entity Extraction (spaCy NER)
  - Graph Construction (NetworkX)
  - Answer Generation (OpenAI/HF LLM)

🛠️ Technology Stack

Backend

Framework: FastAPI 0.110+
Language: Python 3.12+
Key Libraries:
- SentenceTransformers (embeddings)
- FAISS (vector search)
- spaCy (NER)
- NetworkX (knowledge graphs)
- OpenAI SDK (LLM)
- PyMuPDF (PDF parsing)

Frontend

Framework: React 18+
Build Tool: Vite
Styling: Tailwind CSS
Graph Viz: Cytoscape.js
State: Zustand

DevOps

Containerization: Docker & Docker Compose
Python Env: Poetry/Pipenv

🚀 Quick Start

Prerequisites

Docker & Docker Compose (recommended)
Python 3.12+ (for local development)
Node.js 20+ (for frontend development)
OpenAI API key (optional, for LLM integration)

Option 1: Docker Compose (Recommended)

# Clone/navigate to project
cd Dataforge

# Copy environment template
cp .env.example .env

# Add your OpenAI API key
# OPENAI_API_KEY=sk-your-key-here

# Start both services
docker-compose up

# Access the app
# Frontend: http://localhost:3000
# API: http://localhost:8000
# API Docs: http://localhost:8000/docs

Option 2: Local Development

Backend Setup

cd backend

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Download spaCy model
python -m spacy download en_core_web_sm

# Run backend
uvicorn app.main:app --reload --port 8000

Frontend Setup

cd frontend

# Install dependencies
npm install

# Start dev server
npm run dev

# Runs on http://localhost:5173

📖 API Documentation

Endpoints

1. POST /upload

Upload and index documents.

Request:

curl -X POST -F "[email protected]" http://localhost:8000/upload

Response:

{
  "status": "success",
  "message": "Successfully processed 5 chunks from 1 files",
  "index_id": "550e8400-e29b-41d4-a716-446655440000",
  "chunks_count": 5
}

2. POST /query

Submit a query and get answers with explanations.

Request:

curl -X POST -H "Content-Type: application/json" \
  -d '{"query": "Who developed GPT-4?", "index_id": "550e8400..."}' \
  http://localhost:8000/query

Response:

{
  "answer": "GPT-4 was developed by OpenAI.",
  "entities": [
    {"name": "GPT-4", "type": "PRODUCT", "source_chunk_id": 0},
    {"name": "OpenAI", "type": "ORG", "source_chunk_id": 0}
  ],
  "relationships": [
    {"from_entity": "OpenAI", "to_entity": "GPT-4", "relation": "developed"}
  ],
  "graph_data": {
    "nodes": [...],
    "edges": [...]
  },
  "snippets": ["GPT-4 was developed by OpenAI..."],
  "status": "success"
}

3. GET /status

Health check.

curl http://localhost:8000/status

4. POST /clear

Clear a session.

curl -X POST "http://localhost:8000/clear?index_id=550e8400..."

📁 Project Structure

Dataforge/
├── backend/
│   ├── app/
│   │   ├── main.py              # FastAPI application
│   │   ├── models/
│   │   │   └── schemas.py       # Pydantic models
│   │   └── modules/
│   │       ├── preprocessing.py # Document processing
│   │       ├── retrieval.py     # Vector search
│   │       ├── entity_extraction.py # NER
│   │       ├── graph_builder.py # Knowledge graphs
│   │       └── answer_generator.py # LLM integration
│   ├── requirements.txt
│   └── .gitignore
├── frontend/
│   ├── src/
│   │   ├── components/          # React components
│   │   ├── store/               # Zustand store
│   │   ├── services/            # API client
│   │   ├── App.jsx
│   │   └── main.jsx
│   ├── index.html
│   ├── package.json
│   ├── vite.config.js
│   ├── tailwind.config.js
│   └── .gitignore
├── Dockerfile.backend
├── Dockerfile.frontend
├── docker-compose.yml
├── .env.example
├── .gitignore
└── README.md

🔧 Configuration

Environment Variables

Create .env file:

OPENAI_API_KEY=sk-your-api-key

Backend Settings

In backend/app/main.py:

Embedding model: all-MiniLM-L6-v2
Retrieval top-k: 5 (configurable per query)
Chunk size: 300 words
Chunk overlap: 50 words

📊 Usage Examples

Example 1: Query Knowledge Base

Upload PDF documents
Receive index_id from response
Submit query: "What are the main topics?"
Receive answer with:
- Generated response
- Extracted entities (people, organizations, locations)
- Knowledge graph showing relationships
- Source snippets

Example 2: Verify Sources

Click on entities in the knowledge graph to see:

Source chunks where entity was found
Related entities
Relationships and how they were inferred

🔒 Security & Privacy

No Persistent Storage: All data processed in-memory per session
CORS Protection: Configured for localhost (customize for production)
Input Validation: Pydantic models validate all inputs
Session Isolation: Each upload creates isolated session

⚡ Performance

Embedding: ~100ms per 300-word chunk
Retrieval: ~50ms for FAISS search
Entity Extraction: ~200ms per chunk
Answer Generation: ~2-5s (API dependent)
Total Query Latency: ~3-10s

🚧 Limitations

Memory: In-memory storage limits corpus size (~1GB RAM for 100k chunks)
API Costs: OpenAI API usage charges per request
Graph Complexity: Large graphs may slow visualization
Languages: Currently optimized for English

📈 Scaling Considerations

For production:

Add Database: PostgreSQL for persistent storage
Queue System: Celery for async processing
Caching: Redis for embedding cache
Load Balancing: Nginx for multiple backend instances
User Auth: JWT for session management

🧪 Testing

Backend Tests

cd backend
pytest

Frontend Tests

cd frontend
npm test

📝 Logging

Backend logs available via:

Console output
API endpoint (implement /logs if needed)

🤝 Contributing

Create feature branch
Follow code style (Black for Python, Prettier for JS)
Add tests
Submit pull request

📄 License

MIT License - See LICENSE file

🆘 Troubleshooting

Backend won't start

Check Python version: python --version (need 3.12+)
Verify spaCy model: python -m spacy download en_core_web_sm
Check port 8000 is available

Frontend won't connect to API

Ensure backend is running: http://localhost:8000/status
Check CORS settings in backend/app/main.py
Verify API URL in frontend: frontend/src/services/api.js

High memory usage

Reduce chunk size in backend/app/modules/preprocessing.py
Clear old sessions via /clear endpoint
Limit uploaded file size

📞 Support

For issues, feature requests, or questions:

Check documentation
Review API docs at http://localhost:8000/docs
Inspect browser console for frontend errors
Check backend logs

🎯 Future Enhancements

Version: 1.0.0
Last Updated: January 2026
Status: Production Ready

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github		.github
backend		backend
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
ARCHITECTURE_DIAGRAMS.md		ARCHITECTURE_DIAGRAMS.md
DELIVERY_COMPLETE.md		DELIVERY_COMPLETE.md
Dockerfile.backend		Dockerfile.backend
Dockerfile.frontend		Dockerfile.frontend
FIXED_WHITE_SCREEN.md		FIXED_WHITE_SCREEN.md
GETTING_STARTED.md		GETTING_STARTED.md
IMPLEMENTATION_COMPLETE.md		IMPLEMENTATION_COMPLETE.md
INDEX.md		INDEX.md
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICK_FIX.md		QUICK_FIX.md
README.md		README.md
api_client_example.py		api_client_example.py
docker-compose.yml		docker-compose.yml
run_backend.py		run_backend.py
start-dev.ps1		start-dev.ps1
start.bat		start.bat
start.sh		start.sh
test_document.txt		test_document.txt

Eleutherian13/Explainable-Rag

Folders and files

Latest commit

History

Repository files navigation