Skip to content

Latest commit

 

History

History
253 lines (197 loc) · 6.8 KB

File metadata and controls

253 lines (197 loc) · 6.8 KB

Enterprise Knowledge Assistant

A production-ready Enterprise Knowledge Assistant using advanced RAG (Retrieval-Augmented Generation) architecture. The system ingests internal company documents (PDFs, emails, Confluence pages, Google Docs) and provides accurate, cited answers to employee questions.

🏗️ Architecture

Core Workflow

  1. Query Construction: Natural language → optimized database queries
  2. Query Translation: HyDE, multi-query, decomposition techniques
  3. Routing: Determine optimal retrieval path (vector/relational/graph)
  4. Indexing: Semantic chunking, multi-representation indexing
  5. Retrieval: Vector search + re-ranking + active retrieval
  6. Generation: LLM synthesis with Self-RAG capabilities
  7. Feedback Loop: Quality assessment and iterative improvement

🛠️ Tech Stack

Backend

  • Framework: FastAPI with Uvicorn ASGI server
  • Runtime: Python 3.12 with UV package manager
  • Data Validation: Pydantic v2 models
  • Database ORM: SQLAlchemy 2.0 (async)
  • Task Queue: Celery with Redis broker
  • Vector DB: Qdrant with HNSW indexing
  • Relational DB: PostgreSQL 15+
  • Cache: Redis

Frontend

  • Framework: Next.js 14 with App Router
  • Styling: Tailwind CSS
  • State Management: Zustand
  • Data Fetching: React Query (TanStack Query)

AI/ML

  • Primary LLM: OpenAI GPT-3.5-Turbo
  • Fallback LLM: GPT-4 for complex queries
  • Embedding Model: SentenceTransformers all-MiniLM-L6-v2 (384-dim)
  • RAG Framework: LangChain for orchestration
  • Observability: LangSmith for tracing/monitoring

🚀 Quick Start

Prerequisites

  • Docker and Docker Compose
  • Python 3.12+
  • Node.js 20+
  • UV package manager (pip install uv)
  • OpenAI API key

1. Clone and Setup

git clone <repository-url>
cd Enterprise-RAG-System

2. Environment Configuration

Copy .env.example to .env and configure:

cp .env.example .env

Edit .env with your settings:

OPENAI_API_KEY=sk-your-key-here
LANGSMITH_API_KEY=ls-your-key-here  # Optional
POSTGRES_URL=postgresql+asyncpg://raguser:ragpass@localhost:5432/ragdb
QDRANT_URL=http://localhost:6333
REDIS_URL=redis://localhost:6379

3. Start Services with Docker Compose

docker-compose up -d

This will start:

  • PostgreSQL (port 5432)
  • Qdrant (ports 6333, 6334)
  • Redis (port 6379)
  • Backend API (port 8000)
  • Celery worker

4. Setup Backend (if running locally)

cd backend
uv pip install -e .

5. Setup Frontend (if running locally)

cd frontend
npm install
npm run dev

Frontend will be available at http://localhost:3000

6. Access the Application

📁 Project Structure

Enterprise-RAG-System/
├── backend/
│   ├── src/
│   │   ├── api/              # FastAPI routes and models
│   │   ├── core/             # Configuration
│   │   ├── services/         # Business logic
│   │   │   ├── document/    # Document processing
│   │   │   ├── embeddings/  # Embedding generation
│   │   │   ├── vector/      # Qdrant operations
│   │   │   ├── retrieval/   # Retrieval logic
│   │   │   ├── generation/  # LLM integration
│   │   │   └── query/       # Query optimization
│   │   ├── database/        # SQLAlchemy models
│   │   └── utils/           # Utilities
│   ├── pyproject.toml
│   └── Dockerfile
├── frontend/
│   ├── app/                 # Next.js app router
│   ├── components/          # React components
│   ├── lib/                # Utilities and API client
│   └── package.json
├── docker-compose.yml
├── .env.example
└── README.md

🔧 Advanced RAG Features

1. Query Processing

  • HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve retrieval
  • Multi-Query Generation: Create 3-5 query variations for better recall
  • Query Decomposition: Break complex questions into sub-queries

2. Retrieval Enhancement

  • RAG-Fusion: Combine results from multiple query variations
  • Cross-Encoder Re-ranking: Use cross-encoder/ms-marco-MiniLM-L-6-v2 for result refinement
  • Hierarchical Retrieval: Summary → detail retrieval pattern

3. Generation Optimization

  • Citation Management: Automatic source attribution
  • Confidence Scoring: Estimate answer reliability
  • Streaming Responses: Real-time answer generation

📡 API Endpoints

Chat

  • POST /api/chat - Send a chat message
  • POST /api/chat/stream - Stream chat response

Documents

  • GET /api/documents - List all documents
  • POST /api/documents - Upload a document
  • GET /api/documents/{id} - Get document details
  • DELETE /api/documents/{id} - Delete a document

Health

  • GET /api/health - Health check

🧪 Development

Running Tests

cd backend
pytest

Code Formatting

cd backend
black src/
ruff check src/

Database Migrations

cd backend
alembic revision --autogenerate -m "description"
alembic upgrade head

📊 Monitoring

  • LangSmith: LLM tracing and monitoring (if configured)
  • Prometheus: System metrics (to be configured)
  • Grafana: Dashboards (to be configured)

🔒 Security

  • Environment variables for sensitive data
  • JWT authentication (to be implemented)
  • CORS configuration
  • Rate limiting (to be implemented)

📈 Performance Targets

  • Latency: < 3 seconds for end-to-end response
  • Accuracy: High answer correctness (RAGAS evaluation)
  • Uptime: 99.9% availability target

🚧 Roadmap

Phase 1: MVP ✅

  • Basic document upload and chunking
  • Simple embedding with SentenceTransformers
  • Qdrant setup and basic vector search
  • FastAPI endpoints for chat and documents
  • Next.js basic chat interface

Phase 2: Core RAG (In Progress)

  • Advanced chunking strategies
  • Query optimization (HyDE implementation)
  • Re-ranking with cross-encoders
  • Improved prompt engineering
  • Basic citation management

Phase 3: Advanced Features

  • Multi-query and RAG-Fusion
  • Self-RAG capabilities
  • Semantic routing
  • Active retrieval mechanisms
  • Comprehensive monitoring

Phase 4: Production Ready

  • Scalability improvements
  • Security hardening
  • Performance optimization
  • Comprehensive testing
  • Deployment automation

📝 License

See LICENSE file for details.

🤝 Contributing

Contributions are welcome! Please open an issue or submit a pull request.

📧 Support

For issues and questions, please open a GitHub issue.