A Docker-powered RAG system that understands the difference between code and prose. Ingest your codebase and documentation, then query them with full privacy and zero configuration.
Most RAG systems treat all data the sameβthey chunk your Python files the same way they chunk your PDFs. This is a mistake.
LocalRAG uses context-aware ingestion:
- Code collections use AST-based chunking that respects function boundaries
- Document collections use semantic chunking optimized for prose
- Separate collections prevent context pollution (your API docs don't interfere with your codebase queries)
Example:
# Ask about your docs
"What was our Q3 strategy?" β queries the 'company_docs' collection
# Ask about your code
"Show me the authentication middleware" β queries the 'backend_code' collectionThis separation is what makes answers actually useful.
Prerequisites:
- Docker & Docker Compose
- Ollama running locally
Setup:
# 1. Pull the embedding model
ollama pull nomic-embed-text
# 2. Clone and start
git clone https://github.com/2dogsandanerd/Knowledge-Base-Self-Hosting-Kit.git
cd Knowledge-Base-Self-Hosting-Kit
docker compose up -dThat's it. Open http://localhost:8080
- Go to the Upload tab
- Upload any PDF or Markdown file
- Go to the Quicksearch tab
- Select your collection and ask a question
Let's ingest this repository's backend code and query it like a wiki.
Step 1: Copy code into the data folder
# The ./data/docs folder is mounted as / in the container
cp -r backend/src data/docs/localrag_codeStep 2: Ingest via UI
- Navigate to Folder Ingestion tab
- Path:
/localrag_code - Collection:
localrag_code - Profile: Codebase (uses code-optimized chunking)
- Click Start Ingestion
Step 3: Query your code
- Go to Quicksearch
- Select
localrag_codecollection - Ask: "How does the folder ingestion work?" or "Show me the RAGClient class"
You'll get answers with direct code snippets. This is invaluable for:
- Onboarding new developers
- Understanding unfamiliar codebases
- Debugging complex systems
ββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Your Browser (localhost:8080) β
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
β Gateway (Nginx) β
β - Serves static frontend β
β - Proxies /api/* to backend β
ββββββββββββββββββββββββββββ¬ββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββΌββββββββββββββββββββββββ
β Backend (FastAPI + LlamaIndex) β
β - REST API for ingestion & queries β
β - Async task management β
β - Orchestrates ChromaDB & Ollama β
βββββββββββββββββββ¬βββββββββββββββββββ¬ββββββββββββββ
β β
βββββββββββββββββββΌβββββββ ββββββββββΌβββββββββββββββ
β ChromaDB β β Ollama β
β - Vector storage β β - Embeddings β
β - Persistent on disk β β - Answer generation β
ββββββββββββββββββββββββββ βββββββββββββββββββββββββ
Tech Stack:
- Backend: FastAPI, LlamaIndex 0.12.9
- Vector DB: ChromaDB 0.5.23
- LLM/Embeddings: Ollama (configurable)
- Document Parser: Docling 2.13.0 (advanced OCR, table extraction)
- Frontend: Vanilla HTML/JS (no build step)
Linux Users: If Ollama runs on your host, you may need to set OLLAMA_HOST=http://host.docker.internal:11434 in .env or use --network host.
- β 100% Local & Private β Your data never leaves your machine
- β
Zero Config β
docker compose upand you're running - β **Batch Ingestion β Process multiple files (sequential processing in Community Edition)
- β Code & Doc Profiles β Different chunking strategies for code vs. prose
- β Smart Ingestion β Auto-detects file types, avoids duplicates
- β
.ragignoreSupport β Works like.gitignoreto exclude files/folders - β Full REST API β Programmatic access for automation
import requests
import time
BASE_URL = "http://localhost:8080/api/v1/rag"
# 1. Create a collection
print("Creating collection...")
requests.post(f"{BASE_URL}/collections", json={"collection_name": "api_docs"})
# 2. Upload a document
print("Uploading README.md...")
with open("README.md", "rb") as f:
response = requests.post(
f"{BASE_URL}/documents/upload",
files={"files": ("README.md", f, "text/markdown")},
data={"collection_name": "api_docs"},
).json()
task_id = response.get("task_id")
print(f"Task ID: {task_id}")
# 3. Poll for completion
while True:
status = requests.get(f"{BASE_URL}/ingestion/ingest-status/{task_id}").json()
print(f"Status: {status['status']}, Progress: {status['progress']}%")
if status["status"] in ["completed", "failed"]:
break
time.sleep(2)
# 4. Query
print("\nQuerying...")
result = requests.post(
f"{BASE_URL}/query",
json={"query": "What is the killer feature?", "collection": "api_docs", "k": 3},
).json()
print("\nAnswer:")
print(result.get("answer"))
print("\nSources:")
for source in result.get("metadata", []):
print(f"- {source.get('filename')}")Create a .env file to customize:
# Change the public port
PORT=8090
# Swap LLM/embedding models
LLM_PROVIDER=ollama
LLM_MODEL=llama3:8b
EMBEDDING_MODEL=nomic-embed-text
# Use OpenAI/Anthropic instead
# LLM_PROVIDER=openai
# OPENAI_API_KEY=sk-...See .env.example for all options.
Hot-Reloading:
The backend uses Uvicorn's auto-reload. Edit files in backend/src and changes apply instantly.
Rebuild after dependency changes:
docker compose up -d --build backendProject Structure:
localrag/
βββ backend/
β βββ src/
β β βββ api/ # FastAPI routes
β β βββ core/ # RAG logic (RAGClient, services)
β β βββ models/ # Pydantic models
β β βββ main.py # Entry point
β βββ Dockerfile
β βββ requirements.txt
βββ frontend/ # Static HTML/JS
βββ nginx/ # Reverse proxy config
βββ data/ # Mounted volume for ingestion
βββ docker-compose.yml
You can query across multiple collections simultaneously:
result = requests.post(
f"{BASE_URL}/query",
json={
"query": "How do we handle authentication?",
"collections": ["backend_code", "api_docs"], # Note: plural
"k": 5
}
).json()This is useful when answers might span code and documentation.
| Feature | LocalRAG | Typical RAG |
|---|---|---|
| Code-aware chunking | β AST-based | β Fixed-size |
| Context separation | β Per-collection profiles | β One-size-fits-all |
| Self-hosted | β 100% local | |
| Zero config | β Docker Compose | β Complex setup |
| Async ingestion | β Background tasks | |
| Production-ready | β FastAPI + ChromaDB |
- Support for more LLM providers (Anthropic, Cohere)
- Advanced reranking (Cohere Rerank, Cross-Encoder)
- Multi-modal support (images, diagrams)
- Graph-based retrieval for code dependencies
- Evaluation metrics dashboard (RAGAS integration)
MIT License. See LICENSE for details.
- FastAPI β Modern Python web framework
- LlamaIndex β RAG orchestration
- ChromaDB β Vector database
- Ollama β Local LLM runtime
- Docling β Advanced document parsing
Contributions are welcome! Please:
- Fork the repo
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
- Issues: GitHub Issues
- Discussions: GitHub Discussions
β If you find this useful, please star the repo!


