Enterprise Knowledge Assistant

A production-ready Enterprise Knowledge Assistant using advanced RAG (Retrieval-Augmented Generation) architecture. The system ingests internal company documents (PDFs, emails, Confluence pages, Google Docs) and provides accurate, cited answers to employee questions.

🏗️ Architecture

Core Workflow

Query Construction: Natural language → optimized database queries
Query Translation: HyDE, multi-query, decomposition techniques
Routing: Determine optimal retrieval path (vector/relational/graph)
Indexing: Semantic chunking, multi-representation indexing
Retrieval: Vector search + re-ranking + active retrieval
Generation: LLM synthesis with Self-RAG capabilities
Feedback Loop: Quality assessment and iterative improvement

🛠️ Tech Stack

Backend

Framework: FastAPI with Uvicorn ASGI server
Runtime: Python 3.12 with UV package manager
Data Validation: Pydantic v2 models
Database ORM: SQLAlchemy 2.0 (async)
Task Queue: Celery with Redis broker
Vector DB: Qdrant with HNSW indexing
Relational DB: PostgreSQL 15+
Cache: Redis

Frontend

Framework: Next.js 14 with App Router
Styling: Tailwind CSS
State Management: Zustand
Data Fetching: React Query (TanStack Query)

AI/ML

Primary LLM: OpenAI GPT-3.5-Turbo
Fallback LLM: GPT-4 for complex queries
Embedding Model: SentenceTransformers all-MiniLM-L6-v2 (384-dim)
RAG Framework: LangChain for orchestration
Observability: LangSmith for tracing/monitoring

🚀 Quick Start

Prerequisites

Docker and Docker Compose
Python 3.12+
Node.js 20+
UV package manager (pip install uv)
OpenAI API key

1. Clone and Setup

git clone <repository-url>
cd Enterprise-RAG-System

2. Environment Configuration

Copy .env.example to .env and configure:

cp .env.example .env

Edit .env with your settings:

OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=ls-your-key-here  # Optional
POSTGRES_URL=postgresql+asyncpg://raguser:ragpass@localhost:5432/ragdb
QDRANT_URL=http://localhost:6333
REDIS_URL=redis://localhost:6379

3. Start Services with Docker Compose

docker-compose up -d

This will start:

PostgreSQL (port 5432)
Qdrant (ports 6333, 6334)
Redis (port 6379)
Backend API (port 8000)
Celery worker

4. Setup Backend (if running locally)

cd backend
uv pip install -e .

5. Setup Frontend (if running locally)

cd frontend
npm install
npm run dev

Frontend will be available at http://localhost:3000

6. Access the Application

Frontend: http://localhost:3000
Backend API: http://localhost:8000
API Docs: http://localhost:8000/docs
Qdrant Dashboard: http://localhost:6333/dashboard

📁 Project Structure

Enterprise-RAG-System/
├── backend/
│   ├── src/
│   │   ├── api/              # FastAPI routes and models
│   │   ├── core/             # Configuration
│   │   ├── services/         # Business logic
│   │   │   ├── document/    # Document processing
│   │   │   ├── embeddings/  # Embedding generation
│   │   │   ├── vector/      # Qdrant operations
│   │   │   ├── retrieval/   # Retrieval logic
│   │   │   ├── generation/  # LLM integration
│   │   │   └── query/       # Query optimization
│   │   ├── database/        # SQLAlchemy models
│   │   └── utils/           # Utilities
│   ├── pyproject.toml
│   └── Dockerfile
├── frontend/
│   ├── app/                 # Next.js app router
│   ├── components/          # React components
│   ├── lib/                # Utilities and API client
│   └── package.json
├── docker-compose.yml
├── .env.example
└── README.md

🔧 Advanced RAG Features

1. Query Processing

HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve retrieval
Multi-Query Generation: Create 3-5 query variations for better recall
Query Decomposition: Break complex questions into sub-queries

2. Retrieval Enhancement

RAG-Fusion: Combine results from multiple query variations
Cross-Encoder Re-ranking: Use cross-encoder/ms-marco-MiniLM-L-6-v2 for result refinement
Hierarchical Retrieval: Summary → detail retrieval pattern

3. Generation Optimization

Citation Management: Automatic source attribution
Confidence Scoring: Estimate answer reliability
Streaming Responses: Real-time answer generation

📡 API Endpoints

Chat

POST /api/chat - Send a chat message
POST /api/chat/stream - Stream chat response

Documents

GET /api/documents - List all documents
POST /api/documents - Upload a document
GET /api/documents/{id} - Get document details
DELETE /api/documents/{id} - Delete a document

Health

GET /api/health - Health check

🧪 Development

Running Tests

cd backend
pytest

Code Formatting

cd backend
black src/
ruff check src/

Database Migrations

cd backend
alembic revision --autogenerate -m "description"
alembic upgrade head

📊 Monitoring

LangSmith: LLM tracing and monitoring (if configured)
Prometheus: System metrics (to be configured)
Grafana: Dashboards (to be configured)

🔒 Security

Environment variables for sensitive data
JWT authentication (to be implemented)
CORS configuration
Rate limiting (to be implemented)

📈 Performance Targets

Latency: < 3 seconds for end-to-end response
Accuracy: High answer correctness (RAGAS evaluation)
Uptime: 99.9% availability target

🚧 Roadmap

Phase 1: MVP ✅

Basic document upload and chunking
Simple embedding with SentenceTransformers
Qdrant setup and basic vector search
FastAPI endpoints for chat and documents
Next.js basic chat interface

Phase 2: Core RAG (In Progress)

Advanced chunking strategies
Query optimization (HyDE implementation)
Re-ranking with cross-encoders
Improved prompt engineering
Basic citation management

FilesExpand file tree

README.md

Latest commit

History