perf(docker): single-stage with cache-friendly layer ordering by georgeh0 · Pull Request #139 · cocoindex-io/cocoindex-code

georgeh0 · 2026-04-15T00:30:07Z

Replaces the multi-stage / two-COPY layout introduced in #138 with a single-stage Dockerfile that actually achieves the user-pull-cost optimization. The previous attempt bloated the image to 10 GB without reducing per-release downloads — BuildKit's COPY --from emits the full copied tree as a layer rather than a diff vs. the destination.

Summary

Single-stage runtime image with cache-friendly layer ordering. Heavy stable installs (sentence-transformers, model bake, base setup) come first; per-release cocoindex + cocoindex-code install is the last layer. Each RUN uv pip install produces its own distinct layer with a content-addressable digest.
Stable layers persist across releases. The sentence-transformers install is keyed on the RUN command string (no source-tree dependency). Subsequent releases reuse the same digest, so users docker pulling an upgrade keep that ~5 GB layer locally.
Per-release layer is small. ~470 MB containing cocoindex + cocoindex-code + their non-ST transitive deps (LiteLLM stack, MCP, typer, pydantic, etc.). Future option: bump litellm into the stable layer to shrink further.
RUN --mount=type=bind,source=.,target=/ccc-src,rw=true instead of COPY . /ccc-src — gives hatch-vcs a writable overlay for _version.py during the PEP 517 build without persisting the source tree as a layer in the final image.

Numbers

	Total image	Per-release pull
Before (single COPY)	~5 GB	~5 GB
#138 (two-COPY split)	10.1 GB	~5 GB (no improvement)
This PR	5.77 GB (full) / 534 MB (slim)	~470 MB (full)

Test plan

Local builds for both variants succeed end-to-end.
uv run pytest -m docker_e2e — 6 passed, 2 Linux-only PUID tests skipped on macOS.
Next workflow_dispatch with test_docker=true will populate the GHA cache; the release after should show short build times.

🤖 Generated with Claude Code

Reshape the Dockerfile so heavy deps live in a stable early layer (digest reproducible across releases, users cache it) and per-release cocoindex + cocoindex-code installs land in their own small layer at the end. Cuts the per-release `docker pull` from ~5 GB to ~470 MB. Specifically: - Drop the multi-stage builder/model_cache layout; do everything in one runtime image so each install RUN produces its own distinct layer. BuildKit COPY in a multi-stage emits the full copied tree as a layer (not a diff) — that's what made the previous two-COPY split bloat the image to ~10 GB without saving any pull cost. - Order layers so per-release content (the source-tree-dependent install) is last; everything before reuses across releases. - Use `RUN --mount=type=bind,source=.,target=/ccc-src,rw=true` instead of `COPY . /ccc-src` so hatch-vcs can write `_version.py` during the PEP 517 build without persisting the source tree as a layer in the final image. Image sizes: slim 534 MB (was 598 MB), full 5.77 GB (was 5.83 GB). Per-release layer: 468 MB (uv install on top of pre-installed ST). Verified: docker E2E suite passes (6 passed, 2 Linux-only skipped on macOS).

georgeh0 temporarily deployed to docker-hub April 15, 2026 00:35 — with GitHub Actions Inactive

georgeh0 had a problem deploying to testpypi April 15, 2026 00:35 — with GitHub Actions Failure

georgeh0 merged commit 00ae2d2 into main Apr 15, 2026
9 of 10 checks passed

georgeh0 deleted the g/docker-layer-cache-v2 branch April 15, 2026 00:38

georgeh0 mentioned this pull request Apr 15, 2026

ci(docker): switch to registry-backed BuildKit cache (GHCR) #140

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(docker): single-stage with cache-friendly layer ordering#139

perf(docker): single-stage with cache-friendly layer ordering#139
georgeh0 merged 1 commit intomainfrom
g/docker-layer-cache-v2

georgeh0 commented Apr 15, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

georgeh0 commented Apr 15, 2026

Summary

Numbers

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant