Skip to content

V2 checkpoints migration performance improvements#1147

Merged
computermode merged 10 commits into
mainfrom
migration-performance-improvements
May 8, 2026
Merged

V2 checkpoints migration performance improvements#1147
computermode merged 10 commits into
mainfrom
migration-performance-improvements

Conversation

@computermode
Copy link
Copy Markdown
Contributor

@computermode computermode commented May 8, 2026

https://entire.io/gh/entireio/cli/trails/321

Summary

Tested with a repo with ~2500 checkpoints. Reduced migration time to ~3.5 minutes (migrate_checkpoints 216869ms), down from ~5.5 minutes (migrate_checkpoints 337738ms).

Validated with a fresh migration of the repo locally and compared with the version pushed up to GH, ensured there were no regressions in the data pushed up. See the comparison report here: https://gist.github.com/computermode/1c06c434317fb8fe7df18b4598913ab8

Script to compare repos: https://gist.github.com/computermode/599fea82d7ae0147716997de8f19576a

Biggest Redundancies

Area Before After
Raw transcript packing Re-chunked and re-stored v1 transcript blobs into v2 Reuses existing v1 blob hashes
Generation metadata Rescanned raw transcripts for timestamps Uses already-loaded checkpoint CreatedAt first
Compact offsets Compacted full transcript, then compacted scoped suffix again Caches raw-position -> compact-line offset
/main flush Repeated tree surgery per checkpoint inside each batch Builds subtrees, applies root changes once
V2 existence checks Read v2 summary per checkpoint Builds one v2 presence index

Where It Showed Up In Trace

  • pack_full_generation_total: duplicate raw transcript packing
  • pack_full.generation_timestamps_total: raw transcript timestamp rescans
  • migrate_one.compact_transcript_total: duplicate compact passes
  • flush_main_total: repeated /main tree rewrites
  • per-checkpoint migration overhead: repeated v2 summary lookups

Key Point

The batch size was already 100; the slowdown was inside each batch from repeated blob work, transcript parsing, and Git tree rewrites.


Note

Medium Risk
Touches checkpoint migration and v2 git ref-writing logic; while covered by new invariants/tests, mistakes could lead to missing or mis-indexed checkpoint data during migration.

Overview
Speeds up v1→v2 checkpoint migration by batching v2 /main writes: migration now buffers per-session WriteCommittedOptions and flushes them via a new WriteCommittedMainBatch that updates /main with a single commit/CAS per batch.

Avoids redundant work during /full/* packing by reusing existing v1 transcript blob hashes (surfaced via SessionContent.TranscriptBlobHashes) and by preferring checkpoint CreatedAt when generating generation.json, falling back to transcript timestamp scans only when needed.

Adds a compact-transcript offset cache and cumulative perf annotation (perf.Annotate) to reduce repeated offset computation and make phase totals visible, alongside new tests validating batch-vs-sequential tree equality, single-commit behavior, and raw-blob reuse.

Reviewed by Cursor Bugbot for commit 961c195. Configure here.

computermode and others added 6 commits May 7, 2026 13:00
Replace the per-session WriteCommittedWithSessionIndex call with a new
WriteCommittedMainBatch path, accumulated alongside pendingFull and
flushed at every /full pack boundary plus once at the end. Cuts /main
ref-CAS overhead from one update per session to one per generation
batch.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 8d771ca24516
…-improvements

# Conflicts:
#	cmd/entire/cli/migrate.go
Adds perf.Annotate, which attaches a synthetic child span with a
pre-computed duration to the surrounding context. Lets the migrate
loop surface cumulative time spent in migrate_one_checkpoint vs.
flush_main vs. pack_full_generation without paying the per-iteration
span cost (4k+ iterations would blow past trace.go's 1MB limit).

Each batch flush + archive pack also gets its own span, so a doctor
trace reader can tell whether a slow run is uniformly slow or bursting
on certain batches.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 428acebd2ac0
Adds compact.WithOffset, which produces the full compact bytes plus the
checkpoint-start line offset in one parse. For JSONL inputs (Claude /
Cursor — the migration's hot path) the second compact pass becomes a
count-only walk over the shared parsed entries, skipping the json.Marshal
of every output line. For non-line-oriented formats (OpenCode, Gemini,
Codex) and the merge-heavy line formats (Copilot, Droid) we fall back to
running Compact twice so the stored offset stays byte-identical to the
prior `lines(full) - lines(scoped)` calculation — the user explicitly
ruled out any drift in start lines as a regression risk.

A property test pins the equivalence across every format fixture in the
package.

Migration's per-checkpoint loop now goes through compact.WithOffset; the
tryCompactTranscript / computeCompactOffset helpers stay for the resume
path's UpdateCommitted flow.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Entire-Checkpoint: 229bf9284c3e
This reverts commit 545c381.

Entire-Checkpoint: da90343bf9ae
Entire-Checkpoint: 671e0161dd29
Copilot AI review requested due to automatic review settings May 8, 2026 00:13
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR targets faster v1→v2 checkpoint migration in the Entire CLI by eliminating repeated blob/transcript work and reducing repeated /main tree rewrites during migration.

Changes:

  • Adds perf.Annotate to surface cumulative (summed) timings in perf traces without per-iteration spans.
  • Speeds migration by batching /main writes (WriteCommittedMainBatch), reusing v1 transcript blob hashes when packing v2 /full/*, and caching compact-transcript offset calculations.
  • Adjusts generation metadata packing to prefer already-loaded checkpoint CreatedAt instead of rescanning raw transcripts, with corresponding test updates.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
perf/span.go Adds Annotate for synthetic child spans with precomputed duration.
perf/span_test.go Adds tests for Annotate behavior (duration + no-parent no-op).
cmd/entire/cli/migrate.go Refactors migration loop for batching, caching, and reduced redundant work; adds perf aggregation annotations.
cmd/entire/cli/migrate_test.go Adds coverage for raw blob reuse and updates migration-related expectations.
cmd/entire/cli/checkpoint/v2_store_test.go Adds correctness + perf invariants tests for WriteCommittedMainBatch.
cmd/entire/cli/checkpoint/v2_committed.go Introduces WriteCommittedMainBatch and supporting subtree-building helpers.
cmd/entire/cli/checkpoint/committed.go Captures v1 transcript blob hashes during session reads for reuse during migration.
cmd/entire/cli/checkpoint/checkpoint.go Extends SessionContent with TranscriptBlobHashes to support blob reuse.

Comment thread cmd/entire/cli/checkpoint/v2_committed.go
@computermode computermode changed the title Migration performance improvements V2 checkpoints migration performance improvements May 8, 2026
Entire-Checkpoint: 78a51d5b7eef
@computermode computermode marked this pull request as ready for review May 8, 2026 00:40
@computermode computermode requested a review from a team as a code owner May 8, 2026 00:40
Comment thread cmd/entire/cli/checkpoint/v2_committed.go
@computermode computermode merged commit 30e58ea into main May 8, 2026
9 checks passed
@computermode computermode deleted the migration-performance-improvements branch May 8, 2026 22:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants