Skip to content

feat: add Docker support with ghcr.io publish workflow#66

Open
Eli-Golin wants to merge 2 commits intococoindex-io:mainfrom
Eli-Golin:feat/docker-support
Open

feat: add Docker support with ghcr.io publish workflow#66
Eli-Golin wants to merge 2 commits intococoindex-io:mainfrom
Eli-Golin:feat/docker-support

Conversation

@Eli-Golin
Copy link

Add Docker support

What this PR adds

  • docker/Dockerfile — multi-stage build that produces a self-contained image
  • .github/workflows/docker-publish.yml — publishes to ghcr.io/cocoindex-io/cocoindex-code on every release tag
  • README section — Docker usage alongside the existing uvx / claude mcp add instructions

Motivation

Some teams can't or don't want to install Python, uv, or manage system
dependencies on developer machines. Docker gives them:

  • Reproducibility — identical environment across macOS, Linux, Windows (WSL2)
  • Isolation — no Python version conflicts with other tools
  • Zero host deps — only Docker required

Design decisions

Multi-stage build

Three stages keep the final image lean and cache-friendly:

  1. builder — installs cocoindex-code and sentence-transformers via uv
  2. model_cache — pre-bakes all-MiniLM-L6-v2 into the image so cold
    starts don't trigger a ~90 MB download
  3. runtime — copies packages + model from previous stages into a fresh
    python:3.12-slim base

python:3.12-slim, not Alpine

cocoindex ships pre-built Rust wheels linked against glibc. Alpine uses musl-libc
and would require building from source.

ENTRYPOINT ["cocoindex-code", "serve"]

serve is the MCP stdio subcommand. This keeps the container invocation clean
(docker run --rm -i ... image) with no extra arguments needed. Users who want
the one-shot indexer can override with --entrypoint cocoindex-code ... index.

All config via environment variables

No project-specific defaults are baked into the image. COCOINDEX_CODE_ROOT_PATH
defaults to /workspace (the conventional mount point), everything else is
left to the user's docker run -e flags or .mcp.json config.

Named volume for model cache

The default model is baked in, but users who override COCOINDEX_CODE_EMBEDDING_MODEL
with an external provider benefit from a named volume so the model is only
downloaded once across rebuilds:

docker volume create cocoindex-model-cache

First-time setup note

After the workflow runs for the first time, the ghcr.io/cocoindex-io/cocoindex-code
package will be created automatically but will be private by default.
A maintainer needs to set it to public once:

GitHub → org page → Packages → cocoindex-code → Package settings → Change visibility → Public

After that, docker pull ghcr.io/cocoindex-io/cocoindex-code:latest works for
everyone without authentication.

Testing

Tested locally against a Scala/SBT codebase:

  1. docker build -t cocoindex-code:local -f docker/Dockerfile . — builds cleanly
  2. MCP stdio handshake (initialize → valid JSON-RPC result) ✅
  3. tools/list returns search tool ✅
  4. .cocoindex_code/ index directory created in mounted workspace ✅
  5. Wired into Claude Code via .mcp.json — semantic search returns results ✅

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant