feat: entire prompts search - searchable prompt history from checkpoints#1211
Open
AasheeshLikePanner wants to merge 9 commits into
Open
feat: entire prompts search - searchable prompt history from checkpoints#1211AasheeshLikePanner wants to merge 9 commits into
AasheeshLikePanner wants to merge 9 commits into
Conversation
Implements 'entire prompts' command group: - search: Keyword search with filters - list: List recent prompts - show: Display full prompt for checkpoint - index: Manage index Auto-rebuilds index on first search. Integrates with PostCommit hook for incremental updates. Entire-Checkpoint: fdc9780864bb
Comprehensive doc covering: - What was implemented (commands, files) - Logic flow (index building, search, incremental updates) - Algorithm details (tokenizer, scorer, locking) - Data structures - How to test - Known limitations - Future improvements - Architecture diagram
- Fixed error wrapping (wrapcheck) - Added NFC unicode normalization to Tokenize - Added query guard for special characters - Fixed file permissions (gosec) - Added nil check handling Remaining: 12 lint issues (mostly style)
Tests added: - TestTokenize_stemming, stopwords, unicode, specialChars - TestParseQuery_basic, phrase, specialChars, tooShort - TestScore_exactPhrase, allTokens, termDensity - TestSearch_returnsRanked, emptyQuery, filters - BenchmarkSearch1K: 5.6ms for 1K entries (target <100ms) All tests pass.
Implements offline-first, searchable prompt history from checkpoint data: - Add entire prompts search/list/show/index commands - Build NDJSON index from checkpoint metadata - Tokenize with Porter stemmer and NFC normalization - Weighted scoring: phrase(+10), all tokens(+5), any(+1), density(*2) - File locking with retry and stale detection Fixes: - Replace bubble sort O(n²) with sort.Slice O(n log n) - Add 3-retry lock with backoff to prevent data loss - Add stale lock detection for crash recovery Follow-ups (documented): - ReviewPrompt not wired (agent_review kind) - --verify flag is placeholder - TokenCount/ParentCheckpointID/SubagentDepth not populated
Entire-Checkpoint: 7a862f395125
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
Implements
entire prompts— a local, offline-first command for searching the prompts behind your checkpoint history. This is the "search" feature from the roadmap: surfacing the why behind commits, not just the what.What's in this PR
Four commands:
Filters on search:
--agent,--branch,--kind,--after,--files,--limit,--jsonHow it works
On every commit, the PostCommit hook appends a new entry to
.entire/prompts/index.ndjson— a gitignored, appendable newline-delimited JSON file that lives next to your repo. No external service, no database, works offline.On search, the index loads into memory and each entry gets scored:
Tokenization runs NFC unicode normalization → lowercase → word boundary split → stopword filter → Porter stemmer. So
"caching"matches prompts containing"cache","cached","caches".Queries are sanitised before tokenizing — regex metacharacters stripped, minimum 2-char guard, quoted phrases extracted for exact matching.
File locking on writes:
O_CREATE|O_EXCLfor atomic acquisition, 3 retries with 50ms backoff for concurrent PostCommit hooks, stale lock detection (>30s) to survive crashes.Architecture decisions
github.com/kljensen/snowball) — one new pure-Go dependency, zero CGOTests
19 tests across
rank_test.goandstore_test.go:Benchmarks:
Tested live against 4 real checkpoints, 94 prompts, 98KB index. Search, list, show all working end-to-end.
Known gaps
Three things are stubbed or incomplete — none affect the core feature working:
ReviewPromptforagent_reviewkind not yet wired — entries withkind: agent_reviewfall back to empty prompt text. Will fix in follow-up.TokenCount,ParentCheckpointID,SubagentDepthfields exist in the schema but aren't populated from metadata yet.entire prompts index --verifyflag exists but is a no-op placeholder.