Skip to content

Add cache push/pull#388

Draft
mbj wants to merge 10 commits intomainfrom
cache/push-pull
Draft

Add cache push/pull#388
mbj wants to merge 10 commits intomainfrom
cache/push-pull

Conversation

@mbj
Copy link
Copy Markdown
Owner

@mbj mbj commented Apr 14, 2026

No description provided.

mbj added 10 commits April 15, 2026 01:13
Enables Name to be deserialized from a plain string via
#[serde(try_from = "String")] and a TryFrom<String> impl that
delegates to FromStr. Lets downstream crates use Name directly
in serde-derived config types without a custom deserialize helper.
Replace the nom-combinator `Parse` impl and the separate hand-rolled
helpers with a single `const fn consume_path_component` that implements
the full grammar (alpha-numeric blocks + [_.]|__|[-]* separators) and
returns the number of bytes consumed.

Both call sites delegate to it:
- `PathComponent::from_static_or_panic` (new) — requires the full input
  to be consumed, panics otherwise. Usable at compile time for constants.
- `impl Parse for PathComponent` — uses the consumed length as the nom
  match boundary, leaving the remainder for the next combinator.

Storage changes from `String` to `Cow<'static, str>` so the const
branch can hold a borrowed `&'static str`.

Grammar is unchanged; the existing PathComponent doctest (with `_`,
`.`, `__`, multi-dash edge cases) still passes through the new wrapper.
Added const-context tests for the new constructor covering valid
inputs and the three rejection modes (empty, leading separator,
trailing orphan separator).
A Path is defined by the OCI grammar as one-or-more path components.
The previous `Path(Vec<PathComponent>)` storage made non-emptiness an
invariant that constructors had to enforce at runtime, which leaked
`Option` / `expect("non-empty")` into callers that want to build a Path
programmatically.

Change storage to `Path { first, rest }` so non-emptiness is encoded
in the type. The nom Parse impl now uses `pair(first, many0(separated
by /))` which naturally produces the shape. Public API:

- `Path::iter()` — iterate all components starting with first
- `Path::extended(component)` — append a component, returning a new
  non-empty Path
- `impl From<PathComponent> for Path` — build a single-component Path

`components()` is removed; callers use `iter()` instead. The Path
doctest is updated accordingly.
Previously `pull_image` panicked on any failure via `.unwrap()`. The
upcoming cache-walk-back flow needs to distinguish "the registry
does not have this tag" from "something else went wrong" so it can
fall through to an older cache stage without silently swallowing
auth or network errors.

Introduce `PullError::{NotFound, Other}` and a pure
`classify_pull_result` helper that maps (exit status, stderr bytes)
to the variant. NotFound is detected via the OCI distribution-spec
`MANIFEST_UNKNOWN` error code as rendered on stderr by docker,
podman, and skopeo — the substring `"manifest unknown"`. The
const comment documents why this is load-bearing string matching
and why every better-looking alternative (engine sockets, CLI
--json flags, manifest-inspect, direct registry HTTP) was rejected.

`pull_image_if_absent` now propagates the Result. The three
existing callers (two in pg-ephemeral/tests/base.rs, two in
ociman/tests/integration.rs, one in the ociman internal
backend::tests module) all `.unwrap()` — they want must-succeed
semantics.

Reference fields in PullError are boxed to stay under
clippy::result-large-err's 128-byte threshold.

Unit tests cover the classifier directly with canned stderr
strings for the four interesting cases: success, podman
not-found, docker not-found, auth failure, network error.
No network, no daemon, no registry required.
Allows configuring an OCI registry prefix (e.g. ghcr.io/myorg) that
is applied to every cache image reference the tool builds. Does not
affect the cache key hash — the hex suffix is content-derived only —
so two machines pointed at different registries still agree on what
content hashes to and can share a remote cache if they point at the
same one.

Plumbed through the usual defaults/instance/overrides precedence:
- Top-level Config default
- Per-instance InstanceDefinition override
- CLI `--cache-registry` flag override
- Threaded into Instance and Definition
- Delivered to CacheStatus::from_cache_key via LoadedSeeds::load

Cache references are built programmatically using the new Path API
(`Path::from(component).extended(...)`) and the `PG_EPHEMERAL_COMPONENT`
const (via `PathComponent::from_static_or_panic`). No string
concatenation + re-parse round-trip.

CacheKey changes from `[u8; 32]` to `sha2::digest::Output<Sha256>`
so the existing `impl From<Output<Sha256>> for Tag` in ociman can
produce the tag directly.

Integration test verifies both halves of the contract — that the
registry prefix is applied, and that the hex hash is identical with
and without the registry set.
Mirror of the pull_image change. Previously push_image panicked on
any failure via `.unwrap()`. The upcoming `cache push` subcommand
needs to surface failures per-stage so the user can see which cache
images failed to upload without the process just dying mid-chain.

Unlike PullError, there's no useful sub-discrimination on push —
every failure (auth, network, rate limit, missing local image)
collapses into "the upload didn't happen", so PushError is a single
struct with the reference + stderr-derived message. Reference is
boxed for the same result_large_err reason as PullError's variants.

The classifier is extracted as a pure `classify_push_result`
function and unit-tested with canned stderr, same pattern as pull.
No external callers of push_image exist yet, so no caller updates.
Consumes the fallible pull_image/push_image primitives and the
cache_registry config option to move cache images between the local
store and a remote OCI registry.

`cache pull` walks the seed chain from tip backwards. For each stage:
- Uncacheable → skip, keep walking.
- Hit (already local) → return, nothing to pull.
- Miss → try backend.pull_image. On success return; on PullError::NotFound
  walk to the next older stage; on PullError::Other abort.

This gives "newest cached stage the registry has" in a single network
round-trip in the happy case, and gracefully degrades through colder
cache states on incremental miss.

`cache push` iterates forward and pushes every stage whose local cache
status is Hit. Miss and Uncacheable stages are skipped. Aborts on the
first push failure (typical CI shape is `cache pull && cache populate
&& cache push`, and you want to see push failures loudly).

Both commands error with a clear message if `cache_registry` is not
configured — there's nothing to pull from or push to.

Also switches the binary's error output from the derived `Debug` form
(e.g. "Error: CacheSync(RegistryNotSet)") to the user-facing `Display`
form (e.g. "Error: cache_registry must be set …"). This affects every
error the CLI can surface, not just the new ones.

Integration tests cover the no-registry error path for both
subcommands end-to-end via run_pg_ephemeral_expect_failure. A real
registry-backed push/pull round trip is still out of scope — it
would require spinning up a registry:2 container in the test harness.

README and CHANGELOG updated.
Implements an end-to-end verification of the cache push/pull round
trip against a real OCI registry. Invoked as

    manager pg-ephemeral github-actions cache-registry test

from the CI workflow after `docker login ghcr.io`.

Flow: populate → push → reset --force → status (assert all miss) →
pull → status (assert tip hit). Exercises the walk-back path
implicitly — with two seed stages and a reset local store, the tip
must be pulled fresh from the registry for the assertion to hold.

The command uses a typed `Error` enum (thiserror) so failures
propagate cleanly through `?` instead of panicking, and so CI log
output distinguishes the failure mode (binary missing, scratch-dir
I/O, subprocess spawn, JSON shape, status mismatch) rather than
collapsing everything to a panic backtrace. Adds thiserror as a
manager dependency.

The registry path and instance name are derived from the
`GITHUB_REPOSITORY` and `GITHUB_RUN_ID` environment variables set
by GitHub Actions, with sane fallbacks so the command is at least
invocable locally for debugging. Each CI run gets a unique
instance name via GITHUB_RUN_ID so concurrent jobs don't collide.

A new top-level `GithubActions` subcommand under `pg-ephemeral`
namespaces this and any future GHA-only test subcommands, keeping
them out of the paths that need to work on a vanilla local
checkout.

This commit does NOT wire the step into ci.yml — that comes next,
once the manager-side logic is reviewable in isolation.
Adds a `Run cache-registry end-to-end test` step that invokes
`manager pg-ephemeral github-actions cache-registry test`, gated to
the x86_64-unknown-linux-musl matrix entry. This is a registry
integration check, not a platform compat check, so running it on
every matrix target would be waste.

The step is preceded by a `docker login ghcr.io` using the automatic
GITHUB_TOKEN so pg-ephemeral's backend can authenticate with ghcr.io
during push/pull without any repo-level secret setup.

Adds `packages: write` to the job's permissions so GITHUB_TOKEN has
the scope needed to push to ghcr.io/${{ github.repository }}/...

Runs on every push (no branch filter) — the whole point of the
test is that it catches registry-level breakage on the branch
where it was introduced, not only after merge to main.
@mbj mbj force-pushed the cache/push-pull branch from cd1bc2c to 8e07aee Compare April 15, 2026 01:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant