Proj/curation enhancement by danhdoan · Pull Request #601 · campfirein/byterover-cli

danhdoan · 2026-05-04T03:53:45Z

Summary

Problem: Curate's hot path carried two avoidable LLM costs (synchronous parent-summary cascade after every curate, and a wasted first-iteration call to invoke the deterministic recon helper), prefix-cacheable prefixes weren't being cached on Anthropic, and the abstract-queue made 2N LLM calls (one per file × two levels).
Why it matters: Combined, these dominated user-visible curate latency and per-call token cost. The improvements compose multiplicatively.
What changed: Four features merged on this branch — ENG-2485 (defer summary cascade), ENG-2518 (batch abstract generation), ENG-2519 (enable prefix caching), ENG-2530 (pre-pipeline recon).
What did NOT change (scope boundary): Public CLI surface, dream operation set (consolidate/synthesize/prune unchanged), agent prompt structure beyond the recon hint, cache-token telemetry (deferred — separate ticket).

Type of change

Scope (select all touched areas)

Linked issues

Closes ENG-2485, ENG-2518, ENG-2519, ENG-2530
Related feat: [ENG-2485] defer summary cascade to dream #579, feat: [ENG-2518] batch abstract generation across queued files #580, feat: [ENG-2519] enable prefix caching for providers #591, feat: [ENG-2530] pre-pipeline recon to skip first agent iteration #593, merge: sync main (3.10.1) into curation-enhancement #599, Chore/curation sync main 3 10 1 #600

Root cause (bug fixes only, otherwise write `N/A`)

N/A

Test plan

Coverage added:
- Unit test
- Integration test
- Manual verification only
Test file(s):
- test/unit/infra/executor/curate-executor.test.ts — cascade-defer invariant + runAgentBody/finalize split
- test/unit/infra/executor/dream-executor.test.ts — queue drain, A∪B merge, re-enqueue on failure
- test/unit/infra/dream/dream-state-service.test.ts — enqueue dedup, atomic drain
- test/unit/agent/abstract/abstract-generation-queue.test.ts — buffer-until-cap, drain semantics, per-batch retry
- test/unit/agent/abstract/abstract-generator.test.ts — batched parser tolerance
- test/unit/infra/llm/prepend-cached-system-message.test.ts — system-message-as-array path, non-mutation
- test/unit/agent/infra/llm/generators/ai-sdk-message-converter.test.ts — last-tool cache_control attachment
Key scenario(s) covered:
- Curate enqueues changed paths and does NOT call propagateStaleness inline (ENG-2485 invariant)
- Dream drains queue, merges with snapshot diff, runs propagation once per dirty dir; re-enqueues on failure
- Abstract queue buffers until BATCH_SIZE_CAP=5 or drain(); per-batch failure re-enqueues items individually
- System message + last tool carry providerOptions.anthropic.cacheControl: ephemeral; dateTime no longer in system prefix
- Recon result reaches the agent as a sandbox variable AND inline prompt fields; first iteration skips tools.curation.recon

User-visible changes

None. All four are internal cost/latency optimizations transparent to the user. Curate completes faster on average; no CLI flags, prompts, or output formats changed.

Evidence

Failing test/log before + passing after
Trace/log snippets

Token-cost measurements (12-fixture A/B + multi-curate progression runs, ByteRover paid-tier):

Task	Provider	Result
ENG-2519 (caching)	Anthropic claude-haiku-4-5	−21.5% normalized cost (12-fixture), −29.7% (30-curate progression)
ENG-2519 (caching)	OpenAI gpt-4.1	−8.0% (auto-cache catches longer stable prefix)
ENG-2519 (caching)	Google gemini-3-flash-preview	~flat (model-specific)
ENG-2530 (recon)	OpenAI gpt-4.1	−16.3% calls, −20.3% normalized cost
ENG-2530 (recon)	Google gemini-3-flash-preview	−1.9% calls, −17.8% normalized cost
ENG-2530 (recon)	Anthropic claude-haiku-4-5	flat
ENG-2518 (batch abstract)	All	5-pack: 2 LLM calls vs 10 non-batched (−80%); 6-distinct: 4 vs 12 (−67%)
ENG-2485 (cascade-defer)	All	LLM-bound `propagateStaleness` walk eliminated from curate hot path

Detailed reports:

notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md
notes/token-usage-reduction/eng-2530-prepipeline-recon/REPORT.md

Checklist

Tests added or updated and passing (npm test) — 7284 passing, 16 pending, 0 failing
Lint passes (npm run lint) — 0 errors (warnings are pre-existing, unrelated)
Type check passes (npm run typecheck)
Build succeeds (npm run build)
Commits follow Conventional Commits format
Documentation updated (if applicable) — N/A, internal optimization
No breaking changes (or clearly documented above)
Branch is up to date with main (synced via merge: sync main (3.10.1) into curation-enhancement #599 + retroactive merge-record)

Risks and mitigations

Risk: Cascade-defer queue durability — if dream fails after draining but before propagating, paths could be lost.
- Mitigation: catch block re-enqueues the drained snapshot; atomic drain prevents partial loss; next dream cycle retries.
Risk: Anthropic prefix-cache requires byte-stable prefix; any future change that re-introduces dynamic content into the system prompt would silently halve cache benefit.
- Mitigation: dateTime relocation is now load-bearing; documented in prepend-cached-system-message.ts. Tests assert dateTime appears in iter-0 user message, not system.
Risk: Pre-pipelined recon shows per-fixture variance — some fixtures regress as the agent compensates without recon-mode signal in later iterations.
- Mitigation: Aggregate nets positive on OpenAI/Google; Anthropic flat (not regression). Recon helper itself unchanged; only the caller moves — easily reverted if fleet-level regression appears.
Risk: Batched abstract generation increases blast radius of a single LLM call (5 files per call vs 1).
- Mitigation: Per-batch failure re-enqueues items individually with their own retry counters; existing 3× exponential backoff preserved; per-file 20 KB content cap matches non-batched path.

Collapse the abstract queue's per-file LLM calls (2N today: one L0 .abstract.md + one L1 .overview.md per file) into batched calls of up to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per cycle (1 batched L0 + 1 batched L1). - generateFileAbstractsBatch(items, generator) in abstract-generator.ts: XML-tagged input/output, tolerant regex parser, missing paths get empty strings. - AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only triggers scheduleNext when pending >= cap or drainRequested. drain() sets the flag to flush partial batches at curate-end. - Per-batch failure → re-enqueue each item individually with its own attempts counter; existing 3x exponential backoff preserved. - Per-file content cap (20 KB) matches the non-batched path. Real-LLM verification on ByteRover paid-tier: 5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%) 6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%) 10-pack: per-batch failure + retry path exercised end-to-end

Move the LLM-bound parent-summary cascade off curate's hot path. Curate now enqueues changed paths to a stale-summary queue persisted in .brv/dream-state.json; the next dream cycle drains the queue, merges with its own snapshot diff (A ∪ B), and runs propagateStaleness once per dirty directory. - DreamState schema gains staleSummaryPaths[]; new service methods enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear). - curate-executor swaps the inline propagateStaleness call for an enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure filesystem, no LLM). - dream-executor's step 5 drains the queue, unions with the diff-set, and runs propagateStaleness once per unique path. On propagation failure the catch re-enqueues the drained snapshot so atomic drain doesn't lose work. - runStaleSummaryPropagation extracted as a protected seam; preserves the ENG-2100 parentTaskId threading so summary regenerations still share one billing session with the parent dream task. Validation: - validate.mjs: 15/15 BM25 invariants hold under stale _index.md - Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier - dedup, multi-curate accumulation, drain, re-enqueue on failure - Full unit suite: 6981 passing

- Bypass dream activity gate when the stale-summary queue has work (codex P1): low-activity projects with 1-2 curates would otherwise strand queued paths indefinitely. dream-trigger.ts now treats a populated queue as enough reason to drain. - Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing a non-unique array no longer produces duplicate queue entries. - Split the curate-executor post-processing try block: snapshot diff, enqueue, and manifest rebuild now have independent fail-open scopes so a transient queue write error cannot skip the (no-LLM) manifest scan that keeps newly-curated leaves immediately discoverable. - Restore the runOperations JSDoc that the cherry-pick orphaned in front of runStaleSummaryPropagation. - Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths cannot persist into dream-state.json. Tests: - dream-trigger: bypass-when-queue-populated + still-block-when-empty - dream-state-service: within-batch duplicate dedup

- Dedup duplicate contextPath inputs in generateFileAbstractsBatch (codex P1): when a curate touches the same file twice before drain, both queue items had the same path; the prompt emitted two `<file>` blocks the model could answer in either order, and the tag parser's Map collapsed them non-deterministically. Now collapses by path keeping the LATEST content (the FIFO queue's last-seen item) so abstracts reflect the on-disk state, not an intermediate one. - CDATA-wrap file content in batched prompts (Claude review): docs that describe XML/HTML/JSX (perfectly normal for technical knowledge) can contain `</document>` or `</file>` markers that would terminate the envelope and conflate files. Wrap in CDATA with the standard `]]]]><![CDATA[>` in-section escape for any nested terminators. - Capture this.generator in a local const at the top of processNext so type narrowing survives the await boundary, and document the batched-retry semantics inline (batch failure consumes one attempt per item, not one per call). Tests: - abstract-generator-batch: duplicate-path dedup keeps latest content - abstract-generator-batch: CDATA wrap survives `</document>` markers - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid

- Reword the Step 1 snapshot-failure catch in dream-executor: the old comment claimed "atomic drain already removed entries", but drain doesn't run in this code path. Clarify that skipping drain here is safer than draining-and-failing because the atomic-drain RMW clears entries synchronously. - Add a checkEligibility test for the stale-summary queue bypass so both tryStartDream and checkEligibility pin the invariant. They delegate to the same checkGates1to3 path; without this, a future refactor of the shared path could regress only one call site.

- Re-anchor parseBatchedTags on `<file path>` openers instead of `</file>` closers (real correctness): the model output is plain text, not CDATA-wrapped, so an overview that mentions `</file>` literally in prose (perfectly normal for docs about XML, JSX, or build systems) would prematurely terminate the outer match and orphan the inner tag. The new strategy collects all opener positions, then slices each opener's body up to the next opener (or end-of-string) — a literal `</file>` in prose has no special meaning to the parser. - Improve the CDATA test to capture BOTH L0 and L1 prompts and assert the wrap independently on each builder. The previous test only validated the last-call's prompt, so a future refactor that forgot wrapCdata in one builder would slip past. - Add a parser test for literal `</file>` inside model overview prose to pin the new robustness invariant. - Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and "splits oversized backlogs" tests out of the `status file` describe block into a sibling `batching behaviour` block — these verify scheduling, not status I/O, and the misgrouping made the suite harder to navigate.

feat: [ENG-2485] defer summary cascade to dream

feat: [ENG-2518] batch abstract generation across queued files

Ports the caching change from experiment/03-enable-caching to proj/curation-enhancement. Two coupled behavior changes (must ship together) that engage Anthropic prompt caching cleanly and make OpenAI/Google auto-caches see a longer stable prefix: - Attach providerOptions.anthropic.cacheControl: ephemeral to the system message and the last tool definition. The system message now enters as a SystemModelMessage in the messages array (not via AI SDK's top-level `system` parameter, which strips providerOptions). - Relocate <dateTime> out of the system prompt into the first user message. DateTimeContributor was rebuilding the timestamp every iteration, and Anthropic prefix caching does token-level matching — one byte of churn ~1,800 tokens in poisoned every cache miss past it. Moving dateTime to a single iteration-0 user-message prefix keeps the system prefix byte-stable across iterations. Verified across a 60-curate fixture-driven A/B and a 30-curate progression run on proj/curation-enhancement (see notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md): - Anthropic claude-haiku-4-5 12-fixture: -21.5% normalized cost - Anthropic claude-haiku-4-5 30-curate progression: -29.7% - OpenAI gpt-4.1: -8.0% (auto-cache hits a longer stable prefix) - Google gemini-3-flash-preview: ~flat (model-specific noise) Tests: - New unit tests for prependCachedSystemMessage edge cases (empty/ undefined systemPrompt, non-mutation invariant) and toAiSdkTools' last-tool cache_control attachment (1, 3 tools). - Updated internal-llm-service test for completeTask: assertion now matches the iter-0 user-message dateTime prefix. Out of scope (deferred): - Cache-token field extraction in response.usage (cacheReadTokens, cacheCreationTokens). Without telemetry to consume them they would be dead code. - Telemetry / UsageLogger plumbing — separate ticket. - Anthropic daemon-stability mitigations (TPM retry-loop hangs, daemon-readiness races on large fixtures) — operational, separate. Builds: clean. Typecheck: clean. Lint: clean on touched files. Tests: 7004 passing, 16 pending, 0 failing.

…hazards - Drop the unused `datetime` template variable from FileContributor's renderTemplateVariables. There is no current YAML consumer; leaving it in place was a landmine — any future `{{datetime}}` reference would silently re-introduce per-iteration cache poisoning. - Inject the dateTime block into the rolling-checkpoint continuation prompt. clearHistory() wipes the iter-0 user message that originally carried it, and the iter-0 guard upstream prevents re-injection, so long curate/query runs that hit the checkpoint were losing time context for every subsequent iteration. - Extract `buildDateTimePrefix(now?)` as a small shared helper used by both call sites; cover with unit tests. - Expand the cache_control "last tool" test comment to clarify that the contract is insertion-order, not alphabetical.

feat: [ENG-2519] enable prefix caching for providers

Run tools.curation.recon synchronously in CurateExecutor before spawning the agent session, inject the result both as a sandbox variable (__recon_result_<taskIdSafe>) and as inline fields in the agent's first prompt. Agent's first iteration then proceeds directly to extraction instead of paying a full LLM call to invoke recon. Recon is pure JS — no LLM, no decision — but was previously surfaced as an agent-tool, costing a full ~13K-token iteration on every curate just to invoke a deterministic helper. Measurements (12-fixture A/B vs proj/curation-enhancement + ENG-2519 caching baseline, full results in notes/token-usage-reduction/ eng-2530-prepipeline-recon/REPORT.md): OpenAI gpt-4.1 -16.3% calls -20.3% normalized cost Google gemini-3-flash-preview -1.9% calls -17.8% normalized cost Anthropic claude-haiku-4-5 -2.0% calls -0.9% normalized cost (flat) ByteRover gemini-3-flash-preview 15-curate progression: -6.5% normalized cost / -3.4% \$ cost. Quality risk is bounded: recon itself is unchanged; only the caller moves. Per-fixture variance follows the team's exp 04 pattern (some fixtures regress as the agent compensates for missing recon signal in later iterations); aggregate nets positive on OpenAI and Google. Anthropic claude-haiku-4-5 lands flat — different model-class behavior from the team's claude-sonnet-4-5 measurement; not a regression.

Pin the central correctness invariant of the change: recon's result is injected as `__recon_result_<taskIdSafe>` on the task session, and the agent prompt instructs the loop to skip the now-deterministic call. Without this test a refactor could silently remove the synchronous `recon()` call or the prompt instruction, defeating the whole change.

- curate-executor: add comment near reconHelper(...) noting that the `metadata` argument is currently a no-op — `recon` recomputes counts from `effectiveContext` directly. Prevents future readers from assuming changes to `metadata` would alter `reconResult`. Upstream signature fix is out of scope for this PR. - test: add stubs for FileContextTreeSnapshotService.getCurrentState and DreamStateService.incrementCurationCount so the unit test stays in-memory (mirrors the ENG-2485 test pattern); keeps to CLAUDE.md's "unit tests must run completely in memory" rule. - test: assert meta.messageCount alongside charCount and lineCount (the prompt surfaces all three).

feat: [ENG-2530] pre-pipeline recon to skip first agent iteration

Resolves three conflicts created by the overlap of ENG-2485 (defer summary cascade to dream, on this branch) and ENG-2522 (detach post-curate Phase 4, shipped on main, PR #582). The two changes touch the same lines but address different costs — LLM cost vs user-visible latency — and compose cleanly. curate-executor.ts - Adopt main's {finalize, response} architecture so the daemon fires task:completed before Phase 4 runs. - Inside finalize, keep cascade-defer: snapshot diff → enqueueStaleSummaryPaths + buildManifest. Drop main's propagateSummariesUnderLock — the dream lock existed to coordinate inline _index.md writes, which cascade-defer eliminates from curate. - Preserve all exp 03 (cache_control) and exp 04 (recon pre-pipelining) changes in Phases 1-3. - Extract propagateAndRebuild private helper alongside main's incrementDreamCounter. dream-executor.ts - Keep this branch's runStaleSummaryPropagation extracted seam (main did not change dream-executor; nothing lost). curate-executor.test.ts - Rewrite runAgentBody/finalize split assertions to expect cascade-defer (enqueueStaleSummaryPaths + buildManifest) instead of propagateStaleness. - Drop the dream-lock coordination describe block — the lock is no longer taken in curate's Phase 4 (folder-pack still uses it). folder-pack-executor still calls propagateSummariesUnderLock; the helper file stays. Verified: typecheck, lint (0 errors on resolved files), 7284 tests pass.

…nc-main-3-10-1 merge: sync main (3.10.1) into curation-enhancement

…#599)

Chore/curation sync main 3 10 1

github-actions · 2026-05-04T03:54:02Z

Claude finished @danhdoan's task in 4m 47s —— View job

PR Review — Proj/curation enhancement

Gather context and read changed files
Review core implementation files
Review test files
Post inline comments for specific issues
Post summary with overall score

Summary

Score: 4 / 5

This is a well-executed batch of internal optimizations. The PR description is excellent — each feature is precisely scoped, the A/B evidence is thorough, and the risk/mitigation section is honest about per-fixture variance. The code quality is consistently high.

What's strong

ENG-2485 (cascade-defer): The atomic drain design in DreamStateService is the right choice. Using a single RMW that captures-and-clears means any concurrent curate enqueue that arrives during propagation appends to an already-empty queue and survives independently — the previous "snapshot + clear-later" approach would have silently dropped same-path enqueues. The gate-2 bypass in DreamTrigger (staleSummaryPaths.length > 0 short-circuits the curationsSinceDream < minCurations check) correctly ensures low-activity projects with deferred work are still drained.

ENG-2518 (batch abstract): The opener-anchored slice parser in parseBatchedTags is a robust choice — by slicing from each <file path=...> opener to the next rather than relying on </file> closers, it handles LLM output that mentions </file> in prose (XML docs, JSX guides) without breaking the envelope. CDATA wrapping for file content is correctly implemented with the standard ]]> split escape.

ENG-2519 (prefix caching): Removing dateTime from the system prompt template and injecting it into the iter-0 user message is the right fix. The comment in file-contributor.ts documenting why datetime is intentionally absent from template variables is load-bearing and should stay. The prependCachedSystemMessage approach (system-role message in the messages array rather than the system: top-level param) is the correct way to attach Anthropic cache_control providerOptions via AI SDK.

ENG-2530 (recon pre-pipeline): The prompt construction is clear and the sandbox injection pattern is consistent with existing RLM variables. The comment about metadata being unused by reconHelper is good defensive documentation.

Tests: Coverage is thorough. The drainStaleSummaryPaths atomicity test, the A∪B merge invariant test via runStaleSummaryPropagation override, and the CDATA injection test in abstract-generator-batch.test.ts all pin exactly the right correctness properties.

Issues to address

Moderate — abstract-queue.ts line 215: contextPath is assumed to be absolute, but this is not enforced or documented. If a relative path reaches writeFile, it silently writes to process.cwd(). Given the queue is a background process whose failures are catch-suppressed, a misconfigured path would produce no visible error. An isAbsolute guard in enqueue() or a documented precondition would prevent silent mislocation. (See inline comment.)

Minor — ai-sdk-message-converter.ts line 84: The cacheControl attachment to the "last" tool relies on Object.entries() insertion order, which is stable in V8 but implicitly coupled to tool registration ordering. The companion test acknowledges this (and notes the sort-objects lint rule determines insertion order). A lint-rule change or future tool object reconstruction could silently shift the cache boundary. Documenting the ordering dependency in getToolNamesForCommand would make the contract visible. (See inline comment.)

Minor — dream-trigger.ts line 87: The gate-2 bypass path (staleSummaryPaths.length > 0) returns {eligible: true} with no signal about why the gate was bypassed. Future --dry-run or eligibility logging will be unable to distinguish activity-triggered from queue-triggered dreams. (See inline comment.)

Minor — dream-state-service.ts line 66: Missing test for the no-write-on-empty-drain invariant. The early-return guard (if (snapshot.length === 0) return state) is correct, but no test verifies that the write is suppressed. (See inline comment.)

Observation — curate-executor.ts parseCurationStatus fallback: The heuristic response.includes('failed') sets status: 'failed' on any response mentioning the word "failed" (including "I found no failures"). This predates the PR but is now more visible via the public lastStatus field. Worth tightening in a follow-up.

No concerns with

Security: toContextTreeRelative path traversal guard, XML attribute escaping in escapeXmlAttr, and CDATA handling are all correct.
The stateMutexes module-level registry is appropriately bounded and the resolve() key normalisation is correct.
The finalize thunk / runAgentBody split is clean — error path deletes the session and returns no finalize, success path deletes in finally so the session is always cleaned up.
The drainRequested flag pattern in AbstractGenerationQueue correctly handles below-cap flush at curate-end.
Dedup-by-last in generateFileAbstractsBatch is correct: the queue is FIFO so the last occurrence carries the most recent content.

Chore/curation sync main

github-actions · 2026-05-04T04:16:21Z

Claude finished @danhdoan's task in 3m 59s —— View job

PR Review — Proj/curation enhancement

Gather context and read changed files
Review core implementation files
Review test files
Post inline comments for specific issues
Post summary with overall score

Summary

Score: 4 / 5

This is a well-executed batch of performance optimizations with strong test coverage and a clear PR description. The code quality is high and the design choices are sound. The main outstanding items are three minor issues carried over from the prior review round that were not addressed, plus one new testability concern in CurateExecutor.

What's strong in this revision

ENG-2485 (cascade-defer): The drainStaleSummaryPaths atomic RMW design is correct and the new test suite is thorough — atomicity, dedup, enqueuedAt preservation, concurrent-enqueue safety, and the same-path-after-drain race are all pinned. The gate-2 bypass in DreamTrigger is correct and now tested via both tryStartDream and checkEligibility paths.

ENG-2518 (batch abstract): The opener-anchored parseBatchedTags and CDATA wrapping are both correctly implemented. The test suite covers out-of-order output, missing paths, malformed output, dedup-by-last semantics, ]]> escape, literal </file> in prose, and the two-LLM-calls-for-N=5 efficiency claim.

ENG-2519 (prefix caching): buildDateTimePrefix exported for testability, prependCachedSystemMessage correctly prepends a system-role message with providerOptions.anthropic.cacheControl. The refreshDateTime removal and service-initializer.ts comment documenting why datetime is absent from the system prompt are both load-bearing.

ENG-2530 (recon pre-pipeline): The reconHelper call at executor level is clean. Injecting the result as both a sandbox variable and inline prompt fields is the right dual-surface approach. The test asserting the variable name, shape, and prompt content is precise.

dream-executor.ts runStaleSummaryPropagation seam: The protected override pattern lets tests assert the A ∪ B merge paths without needing a real file system. Good design that pays off in the three new dream-executor tests.

Issues to address

Moderate — abstract-queue.ts line 214 (unfixed from prior review): r.contextPath is still assumed absolute with no enforcement. A relative path silently writes under process.cwd() with no visible error since failures are catch-suppressed. (See inline comment.)

Minor — curate-executor.ts line 408 (new): DreamStateService is instantiated directly in propagateAndRebuild rather than injected. Tests work around this via prototype stubbing, but future integration tests can't swap the state backend. (See inline comment.)

Minor — curate-executor.ts line 401 (new): The catch-block comment says "The next curate's enqueue will still capture the same paths via diffStates" — this is only true if those exact files are modified again in a future curate. Paths uniquely changed by the failing curate and not re-dirtied afterward are silently dropped. (See inline comment.)

Minor — dream-trigger.ts line 87 (unfixed from prior review): The activity-gate bypass returns {eligible: true} with no reason field distinguishing queue-triggered from activity-triggered dreams. The new tests expose this gap at the test level. (See inline comment.)

Minor — ai-sdk-message-converter.ts line 84 (unfixed from prior review): The insertion-order coupling for cacheControl is acknowledged in the test comment but not at the registration site where a future maintainer would add a tool. (See inline comment.)

Minor — dream-state-service.ts line 66 (partially addressed): The empty-drain no-write invariant is tested behaviorally (drain returns [], subsequent enqueue survives) but no spy asserts that the underlying update/writeFile is suppressed on an empty queue. (See inline comment.)

No concerns with

Security: path traversal via toContextTreeRelative, XML attribute escaping via escapeXmlAttr, CDATA terminator escape via `]]]]><

Four small reviewer-flagged issues with focused fixes: - abstract-queue.ts: drop non-absolute contextPath at enqueue with a trace breadcrumb. Background batch writes derive .abstract.md / .overview.md via raw writeFile and errors are catch-suppressed, so a relative path would silently land under process.cwd(). - tool-manager.ts: document the insertion-order contract for CURATE_TOOL_NAMES and QUERY_TOOL_NAMES at the registration site. toAiSdkTools attaches Anthropic cacheControl: ephemeral to the LAST tool, so reordering this list silently shifts the cache breakpoint. - dream-state-service.ts: skip the disk write in update() when the updater returns the same state reference. drainStaleSummaryPaths and enqueueStaleSummaryPaths already follow the no-op-by-identity convention; this makes the contract observable at the disk level so empty drains don't tmpfile + rename for nothing. New test pins it. - curate-executor.ts: correct the catch-block comment under the cascade-defer enqueue. The previous wording implied diffStates would re-capture the lost paths automatically; in fact paths uniquely changed by the failing curate are dropped until the same files are modified again or dream's own snapshot diff touches them.

github-actions · 2026-05-04T04:36:05Z

Claude finished @danhdoan's task in 3m 12s —— View job

PR Review — Proj/curation enhancement

Gather context: examine git diff and new commits since last review
Check status of previously flagged issues
Review key implementation files
Post inline comments for specific issues
Post summary with overall score

Summary

Score: 4.5 / 5

The latest commit (8b6167b refactor: address review-agent feedback on PR #601) addressed four of the six previously flagged issues. The code quality is high and the fixes are precise. Two minor items remain open.

What was fixed in this revision

abstract-queue.ts — isAbsolute guard (moderate, prior review): enqueue() now drops non-absolute contextPath entries with a queueLog trace breadcrumb before they can reach the catch-suppressed writeFile path. Correctly placed at the entry point. (See inline praise.)

tool-manager.ts — insertion-order documentation (minor, prior review): CURATE_TOOL_NAMES and QUERY_TOOL_NAMES now carry explicit warnings that their iteration order is the Anthropic cache breakpoint. Future maintainers adding a tool will see the constraint at the registration site. One minor follow-up: the comment cites filterToolsForCommand but the real method is getToolsForCommand — easy one-word fix. (See inline comment.)

dream-state-service.ts — no-write-on-empty-drain (minor, prior review): The update() identity guard (if (next !== state)) skips the atomic tmp→rename on no-op calls. The new spy test directly verifies write is not called on an empty drainStaleSummaryPaths(). This closes the invariant gap flagged in both prior rounds. (See inline praise.)

curate-executor.ts — misleading catch comment (minor, prior review): The comment under the cascade-defer enqueue catch block now accurately describes the recovery scope: paths are not automatically re-captured by diffStates and are only recovered if the same files are re-dirtied in a later curate or touched by dream's snapshot diff. Honest and actionable.

Still open

Minor — dream-trigger.ts line 100 (three rounds unaddressed): The gate-2 bypass returns {eligible: true} with no reason field distinguishing queue-triggered from activity-triggered dreams. The inline comment was not postable on this file since it wasn't changed in the current commit, but the gap persists. A one-liner resolves it:

return {eligible: true, reason: state.staleSummaryPaths.length > 0 ? 'deferred-summary-queue' : 'activity'}

Minor — curate-executor.ts line 405 (two rounds unaddressed): DreamStateService is still directly constructed at two sites (lines 273 and 405). Tests use prototype stubbing as a workaround. A dreamStateServiceFactory constructor parameter defaulting to (baseDir) => new DreamStateService({baseDir}) would let integration tests swap the state backend cleanly. (See inline comment.)

No new concerns

The batched pending.unshift on retry (priority re-queue) is consistent with the pre-batch behavior on main and intentional.
The scheduleNext() + resolveDrainersIfIdle() double-call after the finally block is harmless — resolveDrainersIfIdle is idempotent once the resolver array is spliced.
The drainRequested reset in the immediate-idle path (if (this.isIdle()) in drain()) correctly prevents the next curate's enqueue burst from firing single-item batches prematurely.
Security, path traversal, and XML escaping remain clean.

) Brings in main's PR #601 (Proj/curation enhancement, db6560e) which restructured the post-curate flow: - ENG-2485: Phase 4 propagation deferred to dream — curate now ENQUEUES stale-summary paths via DreamStateService.enqueueStaleSummaryPaths and rebuilds the manifest inline; the LLM-driven propagateStaleness walk no longer runs on the curate hot path. - ENG-2530: pre-pipelined recon for the agent-loop path (a deterministic helper invocation that skips one full LLM iteration). - ENG-2518: batched abstract generation across queued files. Conflicts resolved (2 files): - src/server/infra/executor/curate-executor.ts (2 hunks) * Imports: kept HEAD's typed-slot DAG runner imports (TopologicalCurationRunner, NodeContext, buildCurationDAG, loadExistingMemory, buildLiveServices). Dropped main's `recon as reconHelper` import — our DAG already has a recon-node that runs deterministically as the first slot, so PR #601's pre-pipelined reconHelper for the agent-loop path is redundant. * Body: kept HEAD's typed-slot DAG runner (PR #578) inside the runAgentBody/finalize split. Adopted main's `propagateAndRebuild` helper (auto-merged from main) for the finalize thunk — enqueueStaleSummaryPaths + buildManifest, no inline propagateStaleness. - test/unit/infra/executor/curate-executor.test.ts * Response assertions updated to match the typed-slot DAG output (`/Curate completed via typed-slot DAG/`) instead of `'curated'`. * Phase-4 lifecycle assertions adopted from main (enqueueStub + buildManifestStub + propagateStalenessStub.called === false) — confirming ENG-2485 invariant. * Dropped the obsolete "pre-pipelined recon (ENG-2530)" describe block — the typed-slot DAG runs recon as a node, not via sandbox- variable injection. DAG-recon coverage lives in test/unit/agent/curate-flow/dag-builder.test.ts. * Dropped the "dream-lock coordination in Phase 4" describe block — propagation moved to dream itself (ENG-2485), so the lock dance no longer happens on the curate path. Verification post-merge - typecheck: 0 errors. - 212/212 curate-related tests passing across: * test/unit/infra/executor/curate-executor.test.ts (all split + leak + scoping tests) * test/integration/curate/services-adapter-live-write.test.ts (Phase A+B prefix-cluster + batching) * test/unit/agent/curate-flow/*.test.ts * test/integration/curate/*.test.ts * test/unit/agent/tools/curate-tool*.test.ts * test/unit/infra/process/curate-log-handler.test.ts - lint: 0 errors on changed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

RyanNg1403 and others added 20 commits April 29, 2026 00:03

Merge pull request #579 from campfirein/feat/ENG-2485

adda26e

feat: [ENG-2485] defer summary cascade to dream

Merge branch 'proj/curation-enhancement' into feat/ENG-2518

b5c25ff

Merge pull request #580 from campfirein/feat/ENG-2518

3fbb17d

feat: [ENG-2518] batch abstract generation across queued files

Merge pull request #591 from campfirein/feat/ENG-2519

d51b243

feat: [ENG-2519] enable prefix caching for providers

Merge pull request #593 from campfirein/feat/ENG-2530

311b306

feat: [ENG-2530] pre-pipeline recon to skip first agent iteration

Merge pull request #599 from campfirein/chore/curation-enhancement-sy…

d8c15bc

…nc-main-3-10-1 merge: sync main (3.10.1) into curation-enhancement

merge: record main 3.10.1 as integrated (resolution already landed via …

3408a3d

…#599)

Merge pull request #600 from campfirein/chore/curation-sync-main-3-10-1

0f40a05

Chore/curation sync main 3 10 1

danhdoan requested review from RyanNg1403, bao-byterover, cuongdo-byterover, leehpham and ngduyanhece as code owners May 4, 2026 03:53