Proj/curation enhancement#601
Conversation
Collapse the abstract queue's per-file LLM calls (2N today: one L0 .abstract.md + one L1 .overview.md per file) into batched calls of up to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per cycle (1 batched L0 + 1 batched L1). - generateFileAbstractsBatch(items, generator) in abstract-generator.ts: XML-tagged input/output, tolerant regex parser, missing paths get empty strings. - AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only triggers scheduleNext when pending >= cap or drainRequested. drain() sets the flag to flush partial batches at curate-end. - Per-batch failure → re-enqueue each item individually with its own attempts counter; existing 3x exponential backoff preserved. - Per-file content cap (20 KB) matches the non-batched path. Real-LLM verification on ByteRover paid-tier: 5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%) 6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%) 10-pack: per-batch failure + retry path exercised end-to-end
Move the LLM-bound parent-summary cascade off curate's hot path. Curate
now enqueues changed paths to a stale-summary queue persisted in
.brv/dream-state.json; the next dream cycle drains the queue, merges
with its own snapshot diff (A ∪ B), and runs propagateStaleness once
per dirty directory.
- DreamState schema gains staleSummaryPaths[]; new service methods
enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt
preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear).
- curate-executor swaps the inline propagateStaleness call for an
enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure
filesystem, no LLM).
- dream-executor's step 5 drains the queue, unions with the diff-set,
and runs propagateStaleness once per unique path. On propagation
failure the catch re-enqueues the drained snapshot so atomic drain
doesn't lose work.
- runStaleSummaryPropagation extracted as a protected seam; preserves
the ENG-2100 parentTaskId threading so summary regenerations still
share one billing session with the parent dream task.
Validation:
- validate.mjs: 15/15 BM25 invariants hold under stale _index.md
- Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier
- dedup, multi-curate accumulation, drain, re-enqueue on failure
- Full unit suite: 6981 passing
- Bypass dream activity gate when the stale-summary queue has work (codex P1): low-activity projects with 1-2 curates would otherwise strand queued paths indefinitely. dream-trigger.ts now treats a populated queue as enough reason to drain. - Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing a non-unique array no longer produces duplicate queue entries. - Split the curate-executor post-processing try block: snapshot diff, enqueue, and manifest rebuild now have independent fail-open scopes so a transient queue write error cannot skip the (no-LLM) manifest scan that keeps newly-curated leaves immediately discoverable. - Restore the runOperations JSDoc that the cherry-pick orphaned in front of runStaleSummaryPropagation. - Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths cannot persist into dream-state.json. Tests: - dream-trigger: bypass-when-queue-populated + still-block-when-empty - dream-state-service: within-batch duplicate dedup
- Dedup duplicate contextPath inputs in generateFileAbstractsBatch (codex P1): when a curate touches the same file twice before drain, both queue items had the same path; the prompt emitted two `<file>` blocks the model could answer in either order, and the tag parser's Map collapsed them non-deterministically. Now collapses by path keeping the LATEST content (the FIFO queue's last-seen item) so abstracts reflect the on-disk state, not an intermediate one. - CDATA-wrap file content in batched prompts (Claude review): docs that describe XML/HTML/JSX (perfectly normal for technical knowledge) can contain `</document>` or `</file>` markers that would terminate the envelope and conflate files. Wrap in CDATA with the standard `]]]]><![CDATA[>` in-section escape for any nested terminators. - Capture this.generator in a local const at the top of processNext so type narrowing survives the await boundary, and document the batched-retry semantics inline (batch failure consumes one attempt per item, not one per call). Tests: - abstract-generator-batch: duplicate-path dedup keeps latest content - abstract-generator-batch: CDATA wrap survives `</document>` markers - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid
- Reword the Step 1 snapshot-failure catch in dream-executor: the old comment claimed "atomic drain already removed entries", but drain doesn't run in this code path. Clarify that skipping drain here is safer than draining-and-failing because the atomic-drain RMW clears entries synchronously. - Add a checkEligibility test for the stale-summary queue bypass so both tryStartDream and checkEligibility pin the invariant. They delegate to the same checkGates1to3 path; without this, a future refactor of the shared path could regress only one call site.
- Re-anchor parseBatchedTags on `<file path>` openers instead of `</file>` closers (real correctness): the model output is plain text, not CDATA-wrapped, so an overview that mentions `</file>` literally in prose (perfectly normal for docs about XML, JSX, or build systems) would prematurely terminate the outer match and orphan the inner tag. The new strategy collects all opener positions, then slices each opener's body up to the next opener (or end-of-string) — a literal `</file>` in prose has no special meaning to the parser. - Improve the CDATA test to capture BOTH L0 and L1 prompts and assert the wrap independently on each builder. The previous test only validated the last-call's prompt, so a future refactor that forgot wrapCdata in one builder would slip past. - Add a parser test for literal `</file>` inside model overview prose to pin the new robustness invariant. - Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and "splits oversized backlogs" tests out of the `status file` describe block into a sibling `batching behaviour` block — these verify scheduling, not status I/O, and the misgrouping made the suite harder to navigate.
feat: [ENG-2485] defer summary cascade to dream
feat: [ENG-2518] batch abstract generation across queued files
Ports the caching change from experiment/03-enable-caching to proj/curation-enhancement. Two coupled behavior changes (must ship together) that engage Anthropic prompt caching cleanly and make OpenAI/Google auto-caches see a longer stable prefix: - Attach providerOptions.anthropic.cacheControl: ephemeral to the system message and the last tool definition. The system message now enters as a SystemModelMessage in the messages array (not via AI SDK's top-level `system` parameter, which strips providerOptions). - Relocate <dateTime> out of the system prompt into the first user message. DateTimeContributor was rebuilding the timestamp every iteration, and Anthropic prefix caching does token-level matching — one byte of churn ~1,800 tokens in poisoned every cache miss past it. Moving dateTime to a single iteration-0 user-message prefix keeps the system prefix byte-stable across iterations. Verified across a 60-curate fixture-driven A/B and a 30-curate progression run on proj/curation-enhancement (see notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md): - Anthropic claude-haiku-4-5 12-fixture: -21.5% normalized cost - Anthropic claude-haiku-4-5 30-curate progression: -29.7% - OpenAI gpt-4.1: -8.0% (auto-cache hits a longer stable prefix) - Google gemini-3-flash-preview: ~flat (model-specific noise) Tests: - New unit tests for prependCachedSystemMessage edge cases (empty/ undefined systemPrompt, non-mutation invariant) and toAiSdkTools' last-tool cache_control attachment (1, 3 tools). - Updated internal-llm-service test for completeTask: assertion now matches the iter-0 user-message dateTime prefix. Out of scope (deferred): - Cache-token field extraction in response.usage (cacheReadTokens, cacheCreationTokens). Without telemetry to consume them they would be dead code. - Telemetry / UsageLogger plumbing — separate ticket. - Anthropic daemon-stability mitigations (TPM retry-loop hangs, daemon-readiness races on large fixtures) — operational, separate. Builds: clean. Typecheck: clean. Lint: clean on touched files. Tests: 7004 passing, 16 pending, 0 failing.
…hazards
- Drop the unused `datetime` template variable from FileContributor's
renderTemplateVariables. There is no current YAML consumer; leaving it
in place was a landmine — any future `{{datetime}}` reference would
silently re-introduce per-iteration cache poisoning.
- Inject the dateTime block into the rolling-checkpoint continuation
prompt. clearHistory() wipes the iter-0 user message that originally
carried it, and the iter-0 guard upstream prevents re-injection, so
long curate/query runs that hit the checkpoint were losing time
context for every subsequent iteration.
- Extract `buildDateTimePrefix(now?)` as a small shared helper used by
both call sites; cover with unit tests.
- Expand the cache_control "last tool" test comment to clarify that the
contract is insertion-order, not alphabetical.
feat: [ENG-2519] enable prefix caching for providers
Run tools.curation.recon synchronously in CurateExecutor before spawning the agent session, inject the result both as a sandbox variable (__recon_result_<taskIdSafe>) and as inline fields in the agent's first prompt. Agent's first iteration then proceeds directly to extraction instead of paying a full LLM call to invoke recon. Recon is pure JS — no LLM, no decision — but was previously surfaced as an agent-tool, costing a full ~13K-token iteration on every curate just to invoke a deterministic helper. Measurements (12-fixture A/B vs proj/curation-enhancement + ENG-2519 caching baseline, full results in notes/token-usage-reduction/ eng-2530-prepipeline-recon/REPORT.md): OpenAI gpt-4.1 -16.3% calls -20.3% normalized cost Google gemini-3-flash-preview -1.9% calls -17.8% normalized cost Anthropic claude-haiku-4-5 -2.0% calls -0.9% normalized cost (flat) ByteRover gemini-3-flash-preview 15-curate progression: -6.5% normalized cost / -3.4% \$ cost. Quality risk is bounded: recon itself is unchanged; only the caller moves. Per-fixture variance follows the team's exp 04 pattern (some fixtures regress as the agent compensates for missing recon signal in later iterations); aggregate nets positive on OpenAI and Google. Anthropic claude-haiku-4-5 lands flat — different model-class behavior from the team's claude-sonnet-4-5 measurement; not a regression.
Pin the central correctness invariant of the change: recon's result is injected as `__recon_result_<taskIdSafe>` on the task session, and the agent prompt instructs the loop to skip the now-deterministic call. Without this test a refactor could silently remove the synchronous `recon()` call or the prompt instruction, defeating the whole change.
- curate-executor: add comment near reconHelper(...) noting that the `metadata` argument is currently a no-op — `recon` recomputes counts from `effectiveContext` directly. Prevents future readers from assuming changes to `metadata` would alter `reconResult`. Upstream signature fix is out of scope for this PR. - test: add stubs for FileContextTreeSnapshotService.getCurrentState and DreamStateService.incrementCurationCount so the unit test stays in-memory (mirrors the ENG-2485 test pattern); keeps to CLAUDE.md's "unit tests must run completely in memory" rule. - test: assert meta.messageCount alongside charCount and lineCount (the prompt surfaces all three).
feat: [ENG-2530] pre-pipeline recon to skip first agent iteration
Resolves three conflicts created by the overlap of ENG-2485 (defer summary cascade to dream, on this branch) and ENG-2522 (detach post-curate Phase 4, shipped on main, PR #582). The two changes touch the same lines but address different costs — LLM cost vs user-visible latency — and compose cleanly. curate-executor.ts - Adopt main's {finalize, response} architecture so the daemon fires task:completed before Phase 4 runs. - Inside finalize, keep cascade-defer: snapshot diff → enqueueStaleSummaryPaths + buildManifest. Drop main's propagateSummariesUnderLock — the dream lock existed to coordinate inline _index.md writes, which cascade-defer eliminates from curate. - Preserve all exp 03 (cache_control) and exp 04 (recon pre-pipelining) changes in Phases 1-3. - Extract propagateAndRebuild private helper alongside main's incrementDreamCounter. dream-executor.ts - Keep this branch's runStaleSummaryPropagation extracted seam (main did not change dream-executor; nothing lost). curate-executor.test.ts - Rewrite runAgentBody/finalize split assertions to expect cascade-defer (enqueueStaleSummaryPaths + buildManifest) instead of propagateStaleness. - Drop the dream-lock coordination describe block — the lock is no longer taken in curate's Phase 4 (folder-pack still uses it). folder-pack-executor still calls propagateSummariesUnderLock; the helper file stays. Verified: typecheck, lint (0 errors on resolved files), 7284 tests pass.
…nc-main-3-10-1 merge: sync main (3.10.1) into curation-enhancement
Chore/curation sync main 3 10 1
|
Claude finished @danhdoan's task in 4m 47s —— View job PR Review — Proj/curation enhancement
SummaryScore: 4 / 5 This is a well-executed batch of internal optimizations. The PR description is excellent — each feature is precisely scoped, the A/B evidence is thorough, and the risk/mitigation section is honest about per-fixture variance. The code quality is consistently high. What's strongENG-2485 (cascade-defer): The atomic drain design in ENG-2518 (batch abstract): The opener-anchored slice parser in ENG-2519 (prefix caching): Removing ENG-2530 (recon pre-pipeline): The prompt construction is clear and the sandbox injection pattern is consistent with existing RLM variables. The comment about Tests: Coverage is thorough. The Issues to addressModerate — Minor — Minor — Minor — Observation — No concerns with
|
|
Claude finished @danhdoan's task in 3m 59s —— View job PR Review — Proj/curation enhancement
SummaryScore: 4 / 5 This is a well-executed batch of performance optimizations with strong test coverage and a clear PR description. The code quality is high and the design choices are sound. The main outstanding items are three minor issues carried over from the prior review round that were not addressed, plus one new testability concern in What's strong in this revisionENG-2485 (cascade-defer): The ENG-2518 (batch abstract): The opener-anchored ENG-2519 (prefix caching): ENG-2530 (recon pre-pipeline): The
Issues to addressModerate — Minor — Minor — Minor — Minor — Minor — No concerns with |
Four small reviewer-flagged issues with focused fixes: - abstract-queue.ts: drop non-absolute contextPath at enqueue with a trace breadcrumb. Background batch writes derive .abstract.md / .overview.md via raw writeFile and errors are catch-suppressed, so a relative path would silently land under process.cwd(). - tool-manager.ts: document the insertion-order contract for CURATE_TOOL_NAMES and QUERY_TOOL_NAMES at the registration site. toAiSdkTools attaches Anthropic cacheControl: ephemeral to the LAST tool, so reordering this list silently shifts the cache breakpoint. - dream-state-service.ts: skip the disk write in update() when the updater returns the same state reference. drainStaleSummaryPaths and enqueueStaleSummaryPaths already follow the no-op-by-identity convention; this makes the contract observable at the disk level so empty drains don't tmpfile + rename for nothing. New test pins it. - curate-executor.ts: correct the catch-block comment under the cascade-defer enqueue. The previous wording implied diffStates would re-capture the lost paths automatically; in fact paths uniquely changed by the failing curate are dropped until the same files are modified again or dream's own snapshot diff touches them.
|
Claude finished @danhdoan's task in 3m 12s —— View job PR Review — Proj/curation enhancement
SummaryScore: 4.5 / 5 The latest commit ( What was fixed in this revision
Still openMinor — return {eligible: true, reason: state.staleSummaryPaths.length > 0 ? 'deferred-summary-queue' : 'activity'}Minor — No new concerns
|
) Brings in main's PR #601 (Proj/curation enhancement, db6560e) which restructured the post-curate flow: - ENG-2485: Phase 4 propagation deferred to dream — curate now ENQUEUES stale-summary paths via DreamStateService.enqueueStaleSummaryPaths and rebuilds the manifest inline; the LLM-driven propagateStaleness walk no longer runs on the curate hot path. - ENG-2530: pre-pipelined recon for the agent-loop path (a deterministic helper invocation that skips one full LLM iteration). - ENG-2518: batched abstract generation across queued files. Conflicts resolved (2 files): - src/server/infra/executor/curate-executor.ts (2 hunks) * Imports: kept HEAD's typed-slot DAG runner imports (TopologicalCurationRunner, NodeContext, buildCurationDAG, loadExistingMemory, buildLiveServices). Dropped main's `recon as reconHelper` import — our DAG already has a recon-node that runs deterministically as the first slot, so PR #601's pre-pipelined reconHelper for the agent-loop path is redundant. * Body: kept HEAD's typed-slot DAG runner (PR #578) inside the runAgentBody/finalize split. Adopted main's `propagateAndRebuild` helper (auto-merged from main) for the finalize thunk — enqueueStaleSummaryPaths + buildManifest, no inline propagateStaleness. - test/unit/infra/executor/curate-executor.test.ts * Response assertions updated to match the typed-slot DAG output (`/Curate completed via typed-slot DAG/`) instead of `'curated'`. * Phase-4 lifecycle assertions adopted from main (enqueueStub + buildManifestStub + propagateStalenessStub.called === false) — confirming ENG-2485 invariant. * Dropped the obsolete "pre-pipelined recon (ENG-2530)" describe block — the typed-slot DAG runs recon as a node, not via sandbox- variable injection. DAG-recon coverage lives in test/unit/agent/curate-flow/dag-builder.test.ts. * Dropped the "dream-lock coordination in Phase 4" describe block — propagation moved to dream itself (ENG-2485), so the lock dance no longer happens on the curate path. Verification post-merge - typecheck: 0 errors. - 212/212 curate-related tests passing across: * test/unit/infra/executor/curate-executor.test.ts (all split + leak + scoping tests) * test/integration/curate/services-adapter-live-write.test.ts (Phase A+B prefix-cluster + batching) * test/unit/agent/curate-flow/*.test.ts * test/integration/curate/*.test.ts * test/unit/agent/tools/curate-tool*.test.ts * test/unit/infra/process/curate-log-handler.test.ts - lint: 0 errors on changed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
reconhelper), prefix-cacheable prefixes weren't being cached on Anthropic, and the abstract-queue made 2N LLM calls (one per file × two levels).Type of change
Scope (select all touched areas)
Linked issues
Root cause (bug fixes only, otherwise write
N/A)N/A
Test plan
test/unit/infra/executor/curate-executor.test.ts— cascade-defer invariant + runAgentBody/finalize splittest/unit/infra/executor/dream-executor.test.ts— queue drain, A∪B merge, re-enqueue on failuretest/unit/infra/dream/dream-state-service.test.ts— enqueue dedup, atomic draintest/unit/agent/abstract/abstract-generation-queue.test.ts— buffer-until-cap, drain semantics, per-batch retrytest/unit/agent/abstract/abstract-generator.test.ts— batched parser tolerancetest/unit/infra/llm/prepend-cached-system-message.test.ts— system-message-as-array path, non-mutationtest/unit/agent/infra/llm/generators/ai-sdk-message-converter.test.ts— last-tool cache_control attachmentpropagateStalenessinline (ENG-2485 invariant)drain(); per-batch failure re-enqueues items individuallyproviderOptions.anthropic.cacheControl: ephemeral; dateTime no longer in system prefixtools.curation.reconUser-visible changes
None. All four are internal cost/latency optimizations transparent to the user. Curate completes faster on average; no CLI flags, prompts, or output formats changed.
Evidence
Token-cost measurements (12-fixture A/B + multi-curate progression runs, ByteRover paid-tier):
propagateStalenesswalk eliminated from curate hot pathDetailed reports:
notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.mdnotes/token-usage-reduction/eng-2530-prepipeline-recon/REPORT.mdChecklist
npm test) — 7284 passing, 16 pending, 0 failingnpm run lint) — 0 errors (warnings are pre-existing, unrelated)npm run typecheck)npm run build)main(synced via merge: sync main (3.10.1) into curation-enhancement #599 + retroactive merge-record)Risks and mitigations
catchblock re-enqueues the drained snapshot; atomic drain prevents partial loss; next dream cycle retries.prepend-cached-system-message.ts. Tests assert dateTime appears in iter-0 user message, not system.