feat(cluster): add snapshot-then-merge architecture#29
Conversation
Store per-run cluster snapshots in a new cluster_snapshots table and track active/previous run pointers in repo_cluster_state. The read path prefers the state pointer over raw "latest completed run" queries. Prune now keeps both active and previous runs instead of deleting all but the current one. Follow-up to PR pwrdrvr#19 discussion with @huntharo on the cluster lineage tracking design. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
This is a good foundation direction. One thing I would explicitly leave room for: today Because of that, I would avoid treating snapshot state as permanently “members only”. If you later want lineage and maintainer-facing decision outputs, you will probably want an extension point for derived annotations such as matched previous cluster id, merge/split classification, representative rationale, or other decision metadata, without another storage rewrite. Not a blocker, just a request to keep the current schema/read path extensible in that direction. |
|
Agreed on keeping the extension path open. The Right now the merge output is consumed and discarded after the flip. When lineage tracking from #19 lands, the plan is to persist those merge outcomes alongside the snapshot - either as additional columns on |
Summary
Implements the snapshot-then-merge cluster storage model from the design doc. Each cluster rebuild now stores a frozen snapshot of cluster membership, and a
repo_cluster_statepointer table tracks active/previous runs per repo.Why this matters
Follow-up to the PR #19 discussion where @huntharo confirmed this direction. The current prune logic deletes all but the current run on every rebuild (
service.ts:pruneOldClusterRuns), which is destructive and prevents run-to-run comparison once the lineage tracking from PR #19 lands. The design doc (docs/designs/cluster-storage-cleanup.md) already proposedrepo_cluster_stateand an append-only model - this PR implements it.Changes
New tables (
db/migrate.ts):repo_cluster_state- per-repo pointer to active and previous cluster runscluster_snapshots- frozen cluster membership per run (JSON array of thread IDs)New module (
cluster/snapshot.ts):mergeClusterSnapshots()- Jaccard-based comparison of current vs previous snapshotsupdated,new,dissolvedService updates (
service.ts):clusterRepository()now callspersistClusterSnapshots()andflipClusterState()after persisting the rungetLatestClusterRun()reads fromrepo_cluster_statefirst, falls back to raw query for backward compatibilitylistClusters()uses the updated read pathpruneOldClusterRuns()keeps both active and previous runs instead of deleting everythingTests (
cluster/snapshot.test.ts):Testing
pnpm typecheckpassespnpm --filter @ghcrawl/api-core exec tsx --tsconfig tsconfig.test.json --test 'src/cluster/*.test.ts')This contribution was developed with AI assistance (Claude Code).