Skip to content

Guard entire attach against overwriting checkpoints from other machines#1014

Merged
Soph merged 9 commits into
mainfrom
soph/attach-checkpoint-safety
Apr 24, 2026
Merged

Guard entire attach against overwriting checkpoints from other machines#1014
Soph merged 9 commits into
mainfrom
soph/attach-checkpoint-safety

Conversation

@Soph
Copy link
Copy Markdown
Collaborator

@Soph Soph commented Apr 23, 2026

https://entire.io/gh/entireio/cli/trails/cf1eb592e195

Summary

A production report surfaced a scenario where entire attach could silently clobber a colleague's session data. If Alice creates checkpoint abc123 on her machine and pushes the code commit (with Entire-Checkpoint: abc123 trailer) without also pushing entire/checkpoints/v1, then Bob pulls the commit and runs entire attach, Bob's attach reuses ID abc123, writes a fresh session 0 locally, and on push overwrites Alice's session data.

This branch adds two layers of defence and one diagnostic tripwire.

Changes

entire attach refuses to overwrite

When HEAD already carries an Entire-Checkpoint: trailer, attach now verifies the checkpoint is present on the local entire/checkpoints/v1 branch before writing:

  1. Fast path: read locally first — no network if we already have it.
  2. Fallback: if missing, run the same metadata fetch chain entire resume uses (checkpoint_remote → treeless origin fetch → full origin fetch → remote-tracking).
  3. Refuse: if it's still missing after the refresh, error out with an actionable hint. The suggested git fetch command targets the configured checkpoint_remote when present, otherwise origin.

Tripwire in writeStandardCheckpointEntries

If a session-0 write is about to replace metadata from a different session ID (a tree-corruption / stale-summary shape, not a routine overwrite path), emit a loud logging.Warn with both session IDs so future regressions leave a trace.

Tests

  • TestAttach_AppendsAsAdditionalSessionWhenIDDiffers — positive path: two attaches on the same commit land as session 0 and session 1 under the same checkpoint.
  • TestAttach_RefusesWhenCheckpointMissingFromLocalBranch — negative path: commit bears a trailer for a nonexistent checkpoint; attach errors with the actionable hint and creates nothing.
  • TestWriteStandardCheckpointEntries_WarnsOnUnexpectedSessionZeroOverwrite — verifies the tripwire fires and captures both session IDs.

Behaviour notes

  • Offline repeat-attach on an already-local checkpoint: zero network calls.
  • Offline attach against a missing checkpoint: best-effort fetch logs a warning, then the refuse error prints.
  • Commits without an Entire-Checkpoint: trailer are unaffected — new attaches create fresh IDs as before.
  • The refuse guard protects both the v1 write and the v2 dual-write, since it runs before either.

Test plan

  • mise run check
  • Manual: create a checkpoint locally, delete entire/checkpoints/v1 locally, run entire attach against the commit → verify refuse + hint
  • Manual: run entire attach twice on the same session-free commit → verify second attach appends as session 1 without a network fetch

Note

Medium Risk
Medium risk because it changes entire attach behavior around checkpoint reuse and git metadata fetching, which can affect users’ ability to attach sessions and may surface new failure modes in repos with unusual ref/remote setups.

Overview
Prevents entire attach from accidentally overwriting an existing checkpoint ID created on another machine by verifying the referenced checkpoint exists locally before writing; if missing, it attempts the same metadata refresh chain as resume and then refuses with an actionable git fetch hint.

Adds a diagnostic tripwire in checkpoint writes that warns when a session-0 write would replace metadata for a different session ID (stale/corrupt tree shape), plus new tests covering append-as-new-session behavior, refusal on missing checkpoint metadata, and the warning emission.

Reviewed by Cursor Bugbot for commit a5be08c. Configure here.

Soph added 4 commits April 23, 2026 18:09
Entire-Checkpoint: e83cb707c70f
Entire-Checkpoint: 321e85e6b869
Entire-Checkpoint: e7316f61963c
Skip the metadata refresh when the checkpoint is already on the local
entire/checkpoints/v1 branch — no network round-trip on repeat attaches.

When the refuse path fires, suggest a fetch from the configured
checkpoint_remote URL if one is set, instead of unconditionally pointing
users at origin.

Entire-Checkpoint: 941a2b7d55bf
Copilot AI review requested due to automatic review settings April 23, 2026 17:42
@Soph Soph requested a review from a team as a code owner April 23, 2026 17:42
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Guards entire attach against accidentally overwriting checkpoint metadata when a commit references an Entire-Checkpoint: ID that isn’t present locally, and adds a logging “tripwire” to help detect unexpected session-0 overwrites.

Changes:

  • Add an entire attach pre-write guard that verifies the referenced checkpoint exists locally (with a refresh attempt) and refuses with an actionable git fetch hint if not.
  • Add a warning tripwire in writeStandardCheckpointEntries when session 0 would overwrite metadata from a different session ID.
  • Add tests covering: session appends on repeated attach, refusal on missing checkpoint, and tripwire warning emission.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
cmd/entire/cli/attach.go Adds checkpoint-availability guard + fetch-command hint; ensures logs are closed per run.
cmd/entire/cli/attach_test.go Adds regression tests for “append vs overwrite” and “refuse when missing” behavior.
cmd/entire/cli/checkpoint/committed.go Adds tripwire warning when session-0 overwrite looks suspicious.
cmd/entire/cli/checkpoint/committed_tripwire_test.go Adds a unit test asserting the tripwire warning is logged with both session IDs.

Comment thread cmd/entire/cli/attach.go Outdated
Comment thread cmd/entire/cli/attach.go Outdated
Comment thread cmd/entire/cli/checkpoint/committed_tripwire_test.go
Soph and others added 2 commits April 23, 2026 20:25
The fast path was reading the checkpoint via the go-git store, which
silently falls back to refs/remotes/origin/entire/checkpoints/v1 when
the local branch is missing. So a freshly-cloned repo where the
checkpoint exists only on the remote-tracking ref passed the guard,
then WriteCommitted created an orphan local branch with an empty tree,
and the push would have clobbered the remote — the exact case the
guard is meant to block.

Check for the local branch ref explicitly before trusting the read.
In v2-only mode, use the v2 store instead (refs/entire/main has no
remote-tracking analog, so the equivalent of this hole doesn't exist
there).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: eb718cd31155
The assertion reads the WARN line from the log file, which only gets
written when the configured level is WARN or lower. A dev (or CI job)
running with ENTIRE_LOG_LEVEL=error would see an empty log and a
spurious failure. Setenv the level at the top of the test so the
assertion is environment-independent.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 03c1d6bf9b6b
Comment thread cmd/entire/cli/checkpoint/committed.go Outdated
Comment thread cmd/entire/cli/attach.go Outdated
computermode
computermode previously approved these changes Apr 23, 2026
Copy link
Copy Markdown
Contributor

@computermode computermode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some questions for my understanding but looks good 🙌🏻

In v2-only mode (checkpoints_version: 2) the guard was telling users to
fetch entire/checkpoints/v1, which is the wrong ref — v2 lives under
refs/entire/checkpoints/v2/main. The refresh step also called
getMetadataTree (v1) instead of getV2MetadataTree, so the best-effort
refresh did nothing useful when v2 was the active format.

Branch both the refresh and the fetch-hint on the storage version. v1
stays as-is; v2-only points at refs/entire/checkpoints/v2/main with a
fully-qualified refspec (short names don't resolve outside refs/heads/).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 5ff33f043f7c
Soph and others added 2 commits April 23, 2026 21:52
If a v1 checkpoint write targets slot 0 and the tree already holds
session-0 metadata for a different sessionID, stop instead of logging a
warning and proceeding. That shape means either tree corruption or a
stale root summary; silently overwriting data we don't know about is
the wrong default.

findSessionIndex only picks slot 0 when existingSummary is nil or when
the summary claims slot 0 is ours, so this check never fires on the
normal append-a-session path. In condensation (post-commit hook) the
error is swallowed as before — but that path can't produce this shape
because condensation always writes under a fresh or summary-coherent ID.
The paths that can reach it (attach, migrate) are foreground, so the
refuse surfaces to the user.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8518f599b0e2
v2 is about to become the primary storage format, and
writeMainCheckpointEntries has the exact same clobber shape as v1: if
slot 0 already holds metadata for a different sessionID and
findSessionIndex still picks 0, writeMainSessionToSubdirectory wipes
the subtree and writes ours on top. Mirror the v1 refuse.

Drop the "run 'entire doctor' to investigate" breadcrumb from both
messages — doctor only checks branch-level disconnection and v2 ref
health, not within-tree summary/session-ID coherence. Replace with the
git ls-tree invocation the user can actually run to capture the shape.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 664ec1791597
Copy link
Copy Markdown
Contributor

@computermode computermode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

@Soph Soph merged commit fcae9bd into main Apr 24, 2026
9 checks passed
@Soph Soph deleted the soph/attach-checkpoint-safety branch April 24, 2026 07:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

4 participants