Skip to content

Vision infrastructure to turn complex documents into RAG/LLM-ready data

License

Notifications You must be signed in to change notification settings

lumina-ai-inc/chunkr

Repository files navigation


Logo

Chunkr | Open Source Document Intelligence API

Production-ready API service for document layout analysis, OCR, and semantic chunking.
Convert PDFs, PPTs, Word docs & images into RAG/LLM-ready chunks.

Layout Analysis | OCR + Bounding Boxes | Structured HTML and markdown | VLM Processing controls

Try it out! · Report Bug · Contact · Discord

Table of Contents

(Super) Quick Start

  1. Go to chunkr.ai
  2. Make an account and copy your API key
  3. Install our Python SDK:
    pip install chunkr-ai
  4. Use the SDK to process your documents:
    from chunkr_ai import Chunkr
    
    # Initialize with your API key from chunkr.ai
    chunkr = Chunkr(api_key="your_api_key")
    
    # Upload a document (URL or local file path)
    url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/science.pdf"
    task = chunkr.upload(url)
    
    # Export results in various formats
    task.html(output_file="output.html")
    task.markdown(output_file="output.md")
    task.content(output_file="output.txt")
    task.json(output_file="output.json")
    
    # Clean up
    chunkr.close()

Documentation

Visit our docs for more information and examples.

LLM Configuration

You can use any OpenAI API compatible endpoint by setting the following variables in your .env file:

LLM__URL:
LLM__MODEL:
LLM__KEY:

Self-Hosted Deployment Options

Quick Start with Docker Compose

  1. Prerequisites:

  2. Clone the repo:

git clone https://github.com/lumina-ai-inc/chunkr
cd chunkr
  1. Set up environment variables:
# Copy the example environment file
cp .env.example .env

# Configure your environment variables
# Required: LLM_KEY as your OpenAI API key
  1. Start the services:

With GPU:

docker compose up -d
  1. Access the services:
    • Web UI: http://localhost:5173
    • API: http://localhost:8000

Note: Requires an NVIDIA CUDA GPU

  1. Stop the services when done:
docker compose down

Deployment with Kubernetes

For production environments, we provide a Helm chart and detailed deployment instructions:

  1. See our detailed guide at kube/README.md
  2. Includes configurations for high availability and scaling

For enterprise support and deployment assistance, contact us.

Licensing

The core of this project is dual-licensed:

  1. GNU Affero General Public License v3.0 (AGPL-3.0)
  2. Commercial License

To use Chunkr without complying with the AGPL-3.0 license terms you can contact us or visit our website.

Connect With Us