DocuSeek AI: Unlock Insights from Your Private Documents
Securely chat with your private documents using an intelligent RAG system. DocuSeek AI leverages powerful LLMs and vector search to provide accurate, context-aware answers based only on your uploaded content, ensuring privacy and relevance. Process various formats, visualize data relationships, and gain deeper understanding effortlessly.
DocuSeek AI provides a complete RAG (Retrieval-Augmented Generation) system for processing, indexing, and querying private documents. It leverages vector embeddings and integrates with LLMs to deliver contextual responses based on your document corpus.
- Python 3.8+
- Docker & Docker Compose (Recommended)
docker buildxplugin enabled (usually included with recent Docker Desktop versions)- OpenAI API Key
- (Optional) Mistral API Key (for OCR)
See Dependencies for full details.
This setup uses a simple script to build the image and start all services.
-
Configure Environment: Create a file named
.env.dockerin the project root. Copy the contents from the example in Environment Configuration and populate it with your API keys and desired settings (e.g., database password). -
Run the Startup Script: Make the script executable (if you haven't already) and run it:
chmod +x start_docker.sh ./start_docker.sh
This script will automatically:
- Build the application image using
docker buildx bake. - Start the application (Chat UI by default), PostgreSQL, and pgAdmin using
docker compose.
- Build the application image using
-
Access Services (Chat UI runs by default):
- Chat Interface:
http://localhost:7861 - pgAdmin:
http://localhost:8080(Login with details from.env.docker) - Note: The API server (port 8000) is not started by default. See instructions below or the Docker Command Reference to start it.
- Chat Interface:
-
Process Your Documents: Once the services are running, place your documents in the
documents/directory and run the processing command:docker compose exec app python main.py process --dir documents(Note: The first time you run this after resetting the database, it will automatically create the necessary database tables.)
-
Use the Chat or Start API Server:
- Interact with your documents via the Chat Interface at
http://localhost:7861. - Optional: Start the API Server. If you need the API endpoints, run this command in a separate terminal:
Then access the API at
docker compose exec app python main.py apihttp://localhost:8000(Docs:http://localhost:8000/docs).
- Interact with your documents via the Chat Interface at
Troubleshooting: If docker compose up (run by the script) warns about missing API keys despite them being in .env.docker, check if those variables are set in your shell (echo $VAR_NAME). Shell variables take precedence; unset VAR_NAME before running the script if needed.
See the full Docker Command Reference for manual commands and other operations.
(Note: Docker is strongly recommended for managing dependencies like PostgreSQL)
- Install Dependencies:
pip install -r requirements.txt - Setup Database: Manually set up PostgreSQL with the pgvector extension.
- Configure Environment: Create a
.envfile. See Environment Configuration. Ensure database connection details match your manual setup. - Create Database Tables: Manually run database schema creation logic (e.g., using Alembic if integrated, or a custom script calling
Base.metadata.create_all). - Available Commands:
For more command options, run
# Process documents (replace 'documents' with your directory) python main.py process --dir documents # Start chat interface (Default: http://127.0.0.1:7860) python main.py chat # Run vector similarity search (interactive) python main.py search # Start the API server (Default: http://0.0.0.0:8000) python main.py api
python main.py --help.
- Document Processing (PDF, DOCX, TXT, etc.)
- Vector Storage & Similarity Search (PostgreSQL/pgvector)
- Interactive Chat Interface (Gradio)
- FastAPI Backend & API Endpoints
- Optional Enhanced Conversion (Docling) & OCR (Mistral)
- Caching (File-based/Redis)
- Duplicate Document Detection (Checksums)
- Asynchronous Processing Pipeline
- Vector Visualization
- Conceptual Diagram Generation from Chat
For a full list, see Features.
- Architecture & Flow Diagrams:
- Core Documentation:
- Integrations:
- Docling Integration (If available)
