DocuSeek AI

DocuSeek AI: Unlock Insights from Your Private Documents

Securely chat with your private documents using an intelligent RAG system. DocuSeek AI leverages powerful LLMs and vector search to provide accurate, context-aware answers based only on your uploaded content, ensuring privacy and relevance. Process various formats, visualize data relationships, and gain deeper understanding effortlessly.

Project Overview

DocuSeek AI provides a complete RAG (Retrieval-Augmented Generation) system for processing, indexing, and querying private documents. It leverages vector embeddings and integrates with LLMs to deliver contextual responses based on your document corpus.

Quick Start

Prerequisites

Python 3.8+
Docker & Docker Compose (Recommended)
docker buildx plugin enabled (usually included with recent Docker Desktop versions)
OpenAI API Key
(Optional) Mistral API Key (for OCR)

See Dependencies for full details.

Using Docker (Recommended)

This setup uses a simple script to build the image and start all services.

Configure Environment: Create a file named .env.docker in the project root. Copy the contents from the example in Environment Configuration and populate it with your API keys and desired settings (e.g., database password).
Run the Startup Script: Make the script executable (if you haven't already) and run it:
```
chmod +x start_docker.sh
./start_docker.sh
```
This script will automatically:
- Build the application image using docker buildx bake.
- Start the application (Chat UI by default), PostgreSQL, and pgAdmin using docker compose.
Access Services (Chat UI runs by default):
- Chat Interface: http://localhost:7861
- pgAdmin: http://localhost:8080 (Login with details from .env.docker)
- Note: The API server (port 8000) is not started by default. See instructions below or the Docker Command Reference to start it.
Process Your Documents: Once the services are running, place your documents in the documents/ directory and run the processing command:
```
docker compose exec app python main.py process --dir documents
```
(Note: The first time you run this after resetting the database, it will automatically create the necessary database tables.)
Use the Chat or Start API Server:
- Interact with your documents via the Chat Interface at http://localhost:7861.
- Optional: Start the API Server. If you need the API endpoints, run this command in a separate terminal:
```
docker compose exec app python main.py api
```
  Then access the API at http://localhost:8000 (Docs: http://localhost:8000/docs).

Troubleshooting: If docker compose up (run by the script) warns about missing API keys despite them being in .env.docker, check if those variables are set in your shell (echo $VAR_NAME). Shell variables take precedence; unset VAR_NAME before running the script if needed.

See the full Docker Command Reference for manual commands and other operations.

Manual Setup

(Note: Docker is strongly recommended for managing dependencies like PostgreSQL)

Install Dependencies: pip install -r requirements.txt
Setup Database: Manually set up PostgreSQL with the pgvector extension.
Configure Environment: Create a .env file. See Environment Configuration. Ensure database connection details match your manual setup.
Create Database Tables: Manually run database schema creation logic (e.g., using Alembic if integrated, or a custom script calling Base.metadata.create_all).

Available Commands:

# Process documents (replace 'documents' with your directory)
python main.py process --dir documents

# Start chat interface (Default: http://127.0.0.1:7860)
python main.py chat

# Run vector similarity search (interactive)
python main.py search

# Start the API server (Default: http://0.0.0.0:8000)
python main.py api

For more command options, run python main.py --help.

Key Features

Document Processing (PDF, DOCX, TXT, etc.)
Vector Storage & Similarity Search (PostgreSQL/pgvector)
Interactive Chat Interface (Gradio)
FastAPI Backend & API Endpoints
Optional Enhanced Conversion (Docling) & OCR (Mistral)
Caching (File-based/Redis)
Duplicate Document Detection (Checksums)
Asynchronous Processing Pipeline
Vector Visualization
Conceptual Diagram Generation from Chat

For a full list, see Features.

Documentation

Architecture & Flow Diagrams:
Core Documentation:
Integrations:
- Docling Integration (If available)

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
app		app
docker-entrypoint-initdb.d		docker-entrypoint-initdb.d
docs		docs
documents		documents
pgadmin/init		pgadmin/init
utils		utils
visualizations		visualizations
.env.example		.env.example
.env.local		.env.local
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-bake.hcl		docker-bake.hcl
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
docuseek1.png		docuseek1.png
main.py		main.py
package-lock.json		package-lock.json
package.json		package.json
quickstart.md		quickstart.md
requirements.txt		requirements.txt
start_docker.sh		start_docker.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

DocuSeek AI

Project Overview

Quick Start

Prerequisites

Using Docker (Recommended)

Manual Setup

Key Features

Documentation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

alakob/ai_private_document_retriever

Folders and files

Latest commit

History

Repository files navigation

DocuSeek AI

Project Overview

Quick Start

Prerequisites

Using Docker (Recommended)

Manual Setup

Key Features

Documentation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages