Skip to content

feat: entire prompts search - searchable prompt history from checkpoints#1211

Open
AasheeshLikePanner wants to merge 9 commits into
entireio:mainfrom
AasheeshLikePanner:feature/searchable-prompts
Open

feat: entire prompts search - searchable prompt history from checkpoints#1211
AasheeshLikePanner wants to merge 9 commits into
entireio:mainfrom
AasheeshLikePanner:feature/searchable-prompts

Conversation

@AasheeshLikePanner
Copy link
Copy Markdown

Summary:

Implements entire prompts — a local, offline-first command for searching the prompts behind your checkpoint history. This is the "search" feature from the roadmap: surfacing the why behind commits, not just the what.

What's in this PR

Four commands:

entire prompts search "cache decision"     # full-text search with filters
entire prompts list                         # recent prompts, newest first
entire prompts show <checkpoint-id>        # full prompt text for a checkpoint
entire prompts index --rebuild / --status  # manage the local index

Filters on search: --agent, --branch, --kind, --after, --files, --limit, --json

How it works

On every commit, the PostCommit hook appends a new entry to .entire/prompts/index.ndjson — a gitignored, appendable newline-delimited JSON file that lives next to your repo. No external service, no database, works offline.

On search, the index loads into memory and each entry gets scored:

  • Exact phrase match → +10
  • All query tokens found → +5
  • Any token found → +1
  • Term density bonus → up to +2

Tokenization runs NFC unicode normalization → lowercase → word boundary split → stopword filter → Porter stemmer. So "caching" matches prompts containing "cache", "cached", "caches".

Queries are sanitised before tokenizing — regex metacharacters stripped, minimum 2-char guard, quoted phrases extracted for exact matching.

File locking on writes: O_CREATE|O_EXCL for atomic acquisition, 3 retries with 50ms backoff for concurrent PostCommit hooks, stale lock detection (>30s) to survive crashes.

Architecture decisions

  • NDJSON over SQLite — appendable without full rewrites, no CGO, human-readable, portable
  • Porter stemmer (github.com/kljensen/snowball) — one new pure-Go dependency, zero CGO
  • Local index, not cloud — prompts are personal context, should stay local and work offline
  • PostCommit hook integration — index updates happen transparently, no user action needed

Tests

19 tests across rank_test.go and store_test.go:

  • Tokenizer: stemming, stopwords, unicode normalization, special chars
  • Query parser: basic, phrase extraction, regex stripping, min length
  • Scorer: exact phrase, all tokens, term density
  • Search: ranking, empty query, filter application
  • Store: concurrent writes, single entry, empty slice, lock contention

Benchmarks:

  • Tokenize: ~0.1ms
  • Search 1K entries: 5.6ms (target <100ms)
  • Index load 1K entries: 2.8ms (target <50ms)

Tested live against 4 real checkpoints, 94 prompts, 98KB index. Search, list, show all working end-to-end.

Known gaps

Three things are stubbed or incomplete — none affect the core feature working:

  • ReviewPrompt for agent_review kind not yet wired — entries with kind: agent_review fall back to empty prompt text. Will fix in follow-up.
  • TokenCount, ParentCheckpointID, SubagentDepth fields exist in the schema but aren't populated from metadata yet.
  • entire prompts index --verify flag exists but is a no-op placeholder.

Implements 'entire prompts' command group:
- search: Keyword search with filters
- list: List recent prompts
- show: Display full prompt for checkpoint
- index: Manage index

Auto-rebuilds index on first search.
Integrates with PostCommit hook for incremental updates.

Entire-Checkpoint: fdc9780864bb
Comprehensive doc covering:
- What was implemented (commands, files)
- Logic flow (index building, search, incremental updates)
- Algorithm details (tokenizer, scorer, locking)
- Data structures
- How to test
- Known limitations
- Future improvements
- Architecture diagram
- Fixed error wrapping (wrapcheck)
- Added NFC unicode normalization to Tokenize
- Added query guard for special characters
- Fixed file permissions (gosec)
- Added nil check handling

Remaining: 12 lint issues (mostly style)
Tests added:
- TestTokenize_stemming, stopwords, unicode, specialChars
- TestParseQuery_basic, phrase, specialChars, tooShort
- TestScore_exactPhrase, allTokens, termDensity
- TestSearch_returnsRanked, emptyQuery, filters
- BenchmarkSearch1K: 5.6ms for 1K entries (target <100ms)

All tests pass.
Implements offline-first, searchable prompt history from checkpoint data:

- Add entire prompts search/list/show/index commands
- Build NDJSON index from checkpoint metadata
- Tokenize with Porter stemmer and NFC normalization
- Weighted scoring: phrase(+10), all tokens(+5), any(+1), density(*2)
- File locking with retry and stale detection

Fixes:
- Replace bubble sort O(n²) with sort.Slice O(n log n)
- Add 3-retry lock with backoff to prevent data loss
- Add stale lock detection for crash recovery

Follow-ups (documented):
- ReviewPrompt not wired (agent_review kind)
- --verify flag is placeholder
- TokenCount/ParentCheckpointID/SubagentDepth not populated
@AasheeshLikePanner AasheeshLikePanner requested a review from a team as a code owner May 14, 2026 04:04
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant