Skip to content

Proj/curation enhancement#601

Merged
bao-byterover merged 23 commits intomainfrom
proj/curation-enhancement
May 4, 2026
Merged

Proj/curation enhancement#601
bao-byterover merged 23 commits intomainfrom
proj/curation-enhancement

Conversation

@danhdoan
Copy link
Copy Markdown
Collaborator

@danhdoan danhdoan commented May 4, 2026

Summary

  • Problem: Curate's hot path carried two avoidable LLM costs (synchronous parent-summary cascade after every curate, and a wasted first-iteration call to invoke the deterministic recon helper), prefix-cacheable prefixes weren't being cached on Anthropic, and the abstract-queue made 2N LLM calls (one per file × two levels).
  • Why it matters: Combined, these dominated user-visible curate latency and per-call token cost. The improvements compose multiplicatively.
  • What changed: Four features merged on this branch — ENG-2485 (defer summary cascade), ENG-2518 (batch abstract generation), ENG-2519 (enable prefix caching), ENG-2530 (pre-pipeline recon).
  • What did NOT change (scope boundary): Public CLI surface, dream operation set (consolidate/synthesize/prune unchanged), agent prompt structure beyond the recon hint, cache-token telemetry (deferred — separate ticket).

Type of change

  • New feature
  • Refactor (no behavior change)
  • Bug fix
  • Documentation
  • Test
  • Chore (build, dependencies, CI)

Scope (select all touched areas)

  • TUI / REPL
  • Agent / Tools
  • LLM Providers
  • Server / Daemon
  • Shared (constants, types, transport events)
  • CLI Commands (oclif)
  • Hub / Connectors
  • Cloud Sync
  • CI/CD / Infra

Linked issues

Root cause (bug fixes only, otherwise write N/A)

N/A

Test plan

  • Coverage added:
    • Unit test
    • Integration test
    • Manual verification only
  • Test file(s):
    • test/unit/infra/executor/curate-executor.test.ts — cascade-defer invariant + runAgentBody/finalize split
    • test/unit/infra/executor/dream-executor.test.ts — queue drain, A∪B merge, re-enqueue on failure
    • test/unit/infra/dream/dream-state-service.test.ts — enqueue dedup, atomic drain
    • test/unit/agent/abstract/abstract-generation-queue.test.ts — buffer-until-cap, drain semantics, per-batch retry
    • test/unit/agent/abstract/abstract-generator.test.ts — batched parser tolerance
    • test/unit/infra/llm/prepend-cached-system-message.test.ts — system-message-as-array path, non-mutation
    • test/unit/agent/infra/llm/generators/ai-sdk-message-converter.test.ts — last-tool cache_control attachment
  • Key scenario(s) covered:
    • Curate enqueues changed paths and does NOT call propagateStaleness inline (ENG-2485 invariant)
    • Dream drains queue, merges with snapshot diff, runs propagation once per dirty dir; re-enqueues on failure
    • Abstract queue buffers until BATCH_SIZE_CAP=5 or drain(); per-batch failure re-enqueues items individually
    • System message + last tool carry providerOptions.anthropic.cacheControl: ephemeral; dateTime no longer in system prefix
    • Recon result reaches the agent as a sandbox variable AND inline prompt fields; first iteration skips tools.curation.recon

User-visible changes

None. All four are internal cost/latency optimizations transparent to the user. Curate completes faster on average; no CLI flags, prompts, or output formats changed.

Evidence

  • Failing test/log before + passing after
  • Trace/log snippets

Token-cost measurements (12-fixture A/B + multi-curate progression runs, ByteRover paid-tier):

Task Provider Result
ENG-2519 (caching) Anthropic claude-haiku-4-5 −21.5% normalized cost (12-fixture), −29.7% (30-curate progression)
ENG-2519 (caching) OpenAI gpt-4.1 −8.0% (auto-cache catches longer stable prefix)
ENG-2519 (caching) Google gemini-3-flash-preview ~flat (model-specific)
ENG-2530 (recon) OpenAI gpt-4.1 −16.3% calls, −20.3% normalized cost
ENG-2530 (recon) Google gemini-3-flash-preview −1.9% calls, −17.8% normalized cost
ENG-2530 (recon) Anthropic claude-haiku-4-5 flat
ENG-2518 (batch abstract) All 5-pack: 2 LLM calls vs 10 non-batched (−80%); 6-distinct: 4 vs 12 (−67%)
ENG-2485 (cascade-defer) All LLM-bound propagateStaleness walk eliminated from curate hot path

Detailed reports:

  • notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md
  • notes/token-usage-reduction/eng-2530-prepipeline-recon/REPORT.md

Checklist

  • Tests added or updated and passing (npm test) — 7284 passing, 16 pending, 0 failing
  • Lint passes (npm run lint) — 0 errors (warnings are pre-existing, unrelated)
  • Type check passes (npm run typecheck)
  • Build succeeds (npm run build)
  • Commits follow Conventional Commits format
  • Documentation updated (if applicable) — N/A, internal optimization
  • No breaking changes (or clearly documented above)
  • Branch is up to date with main (synced via merge: sync main (3.10.1) into curation-enhancement #599 + retroactive merge-record)

Risks and mitigations

  • Risk: Cascade-defer queue durability — if dream fails after draining but before propagating, paths could be lost.
    • Mitigation: catch block re-enqueues the drained snapshot; atomic drain prevents partial loss; next dream cycle retries.
  • Risk: Anthropic prefix-cache requires byte-stable prefix; any future change that re-introduces dynamic content into the system prompt would silently halve cache benefit.
    • Mitigation: dateTime relocation is now load-bearing; documented in prepend-cached-system-message.ts. Tests assert dateTime appears in iter-0 user message, not system.
  • Risk: Pre-pipelined recon shows per-fixture variance — some fixtures regress as the agent compensates without recon-mode signal in later iterations.
    • Mitigation: Aggregate nets positive on OpenAI/Google; Anthropic flat (not regression). Recon helper itself unchanged; only the caller moves — easily reverted if fleet-level regression appears.
  • Risk: Batched abstract generation increases blast radius of a single LLM call (5 files per call vs 1).
    • Mitigation: Per-batch failure re-enqueues items individually with their own retry counters; existing 3× exponential backoff preserved; per-file 20 KB content cap matches non-batched path.

RyanNg1403 and others added 20 commits April 29, 2026 00:03
Collapse the abstract queue's per-file LLM calls (2N today: one L0
.abstract.md + one L1 .overview.md per file) into batched calls of up
to BATCH_SIZE_CAP=5 files per cycle, with two parallel LLM calls per
cycle (1 batched L0 + 1 batched L1).

- generateFileAbstractsBatch(items, generator) in abstract-generator.ts:
  XML-tagged input/output, tolerant regex parser, missing paths get
  empty strings.
- AbstractGenerationQueue: buffer-until-cap-or-drain — enqueue only
  triggers scheduleNext when pending >= cap or drainRequested. drain()
  sets the flag to flush partial batches at curate-end.
- Per-batch failure → re-enqueue each item individually with its own
  attempts counter; existing 3x exponential backoff preserved.
- Per-file content cap (20 KB) matches the non-batched path.

Real-LLM verification on ByteRover paid-tier:
  5-pack (5 files): 1 batch, 2 LLM calls (vs 10 non-batched, -80%)
  6-distinct (6 files): 2 batches (5+1), 4 LLM calls (vs 12, -67%)
  10-pack: per-batch failure + retry path exercised end-to-end
Move the LLM-bound parent-summary cascade off curate's hot path. Curate
now enqueues changed paths to a stale-summary queue persisted in
.brv/dream-state.json; the next dream cycle drains the queue, merges
with its own snapshot diff (A ∪ B), and runs propagateStaleness once
per dirty directory.

- DreamState schema gains staleSummaryPaths[]; new service methods
  enqueueStaleSummaryPaths (RMW with path-level dedup, oldest enqueuedAt
  preserved) and drainStaleSummaryPaths (atomic snapshot-and-clear).
- curate-executor swaps the inline propagateStaleness call for an
  enqueueStaleSummaryPaths call. Manifest rebuild stays inline (pure
  filesystem, no LLM).
- dream-executor's step 5 drains the queue, unions with the diff-set,
  and runs propagateStaleness once per unique path. On propagation
  failure the catch re-enqueues the drained snapshot so atomic drain
  doesn't lose work.
- runStaleSummaryPropagation extracted as a protected seam; preserves
  the ENG-2100 parentTaskId threading so summary regenerations still
  share one billing session with the parent dream task.

Validation:
  - validate.mjs: 15/15 BM25 invariants hold under stale _index.md
  - Real-LLM E2E: 8 curates / 2 dreams on byterover paid-tier
    - dedup, multi-curate accumulation, drain, re-enqueue on failure
  - Full unit suite: 6981 passing
- Bypass dream activity gate when the stale-summary queue has work
  (codex P1): low-activity projects with 1-2 curates would otherwise
  strand queued paths indefinitely. dream-trigger.ts now treats a
  populated queue as enough reason to drain.
- Dedup within-batch input in enqueueStaleSummaryPaths: a caller passing
  a non-unique array no longer produces duplicate queue entries.
- Split the curate-executor post-processing try block: snapshot diff,
  enqueue, and manifest rebuild now have independent fail-open scopes
  so a transient queue write error cannot skip the (no-LLM) manifest
  scan that keeps newly-curated leaves immediately discoverable.
- Restore the runOperations JSDoc that the cherry-pick orphaned in
  front of runStaleSummaryPropagation.
- Tighten StaleSummaryEntrySchema to z.string().min(1) so empty paths
  cannot persist into dream-state.json.

Tests:
  - dream-trigger: bypass-when-queue-populated + still-block-when-empty
  - dream-state-service: within-batch duplicate dedup
- Dedup duplicate contextPath inputs in generateFileAbstractsBatch
  (codex P1): when a curate touches the same file twice before drain,
  both queue items had the same path; the prompt emitted two `<file>`
  blocks the model could answer in either order, and the tag parser's
  Map collapsed them non-deterministically. Now collapses by path
  keeping the LATEST content (the FIFO queue's last-seen item) so
  abstracts reflect the on-disk state, not an intermediate one.
- CDATA-wrap file content in batched prompts (Claude review): docs that
  describe XML/HTML/JSX (perfectly normal for technical knowledge) can
  contain `</document>` or `</file>` markers that would terminate the
  envelope and conflate files. Wrap in CDATA with the standard
  `]]]]><![CDATA[>` in-section escape for any nested terminators.
- Capture this.generator in a local const at the top of processNext
  so type narrowing survives the await boundary, and document the
  batched-retry semantics inline (batch failure consumes one attempt
  per item, not one per call).

Tests:
  - abstract-generator-batch: duplicate-path dedup keeps latest content
  - abstract-generator-batch: CDATA wrap survives `</document>` markers
  - abstract-generator-batch: nested `]]>` is escaped to keep CDATA valid
- Reword the Step 1 snapshot-failure catch in dream-executor: the old
  comment claimed "atomic drain already removed entries", but drain
  doesn't run in this code path. Clarify that skipping drain here is
  safer than draining-and-failing because the atomic-drain RMW clears
  entries synchronously.
- Add a checkEligibility test for the stale-summary queue bypass so
  both tryStartDream and checkEligibility pin the invariant. They
  delegate to the same checkGates1to3 path; without this, a future
  refactor of the shared path could regress only one call site.
- Re-anchor parseBatchedTags on `<file path>` openers instead of
  `</file>` closers (real correctness): the model output is plain text,
  not CDATA-wrapped, so an overview that mentions `</file>` literally
  in prose (perfectly normal for docs about XML, JSX, or build systems)
  would prematurely terminate the outer match and orphan the inner
  tag. The new strategy collects all opener positions, then slices
  each opener's body up to the next opener (or end-of-string) — a
  literal `</file>` in prose has no special meaning to the parser.
- Improve the CDATA test to capture BOTH L0 and L1 prompts and assert
  the wrap independently on each builder. The previous test only
  validated the last-call's prompt, so a future refactor that forgot
  wrapCdata in one builder would slip past.
- Add a parser test for literal `</file>` inside model overview prose
  to pin the new robustness invariant.
- Move "buffers below cap", "processes up to BATCH_SIZE_CAP", and
  "splits oversized backlogs" tests out of the `status file` describe
  block into a sibling `batching behaviour` block — these verify
  scheduling, not status I/O, and the misgrouping made the suite
  harder to navigate.
feat: [ENG-2485] defer summary cascade to dream
feat: [ENG-2518] batch abstract generation across queued files
Ports the caching change from experiment/03-enable-caching to
proj/curation-enhancement. Two coupled behavior changes (must ship
together) that engage Anthropic prompt caching cleanly and make
OpenAI/Google auto-caches see a longer stable prefix:

- Attach providerOptions.anthropic.cacheControl: ephemeral to the
  system message and the last tool definition. The system message
  now enters as a SystemModelMessage in the messages array (not via
  AI SDK's top-level `system` parameter, which strips providerOptions).
- Relocate <dateTime> out of the system prompt into the first user
  message. DateTimeContributor was rebuilding the timestamp every
  iteration, and Anthropic prefix caching does token-level matching —
  one byte of churn ~1,800 tokens in poisoned every cache miss past
  it. Moving dateTime to a single iteration-0 user-message prefix
  keeps the system prefix byte-stable across iterations.

Verified across a 60-curate fixture-driven A/B and a 30-curate
progression run on proj/curation-enhancement (see
notes/token-usage-reduction/eng-2519-prefix-caching/REPORT.md):

- Anthropic claude-haiku-4-5 12-fixture: -21.5% normalized cost
- Anthropic claude-haiku-4-5 30-curate progression: -29.7%
- OpenAI gpt-4.1: -8.0% (auto-cache hits a longer stable prefix)
- Google gemini-3-flash-preview: ~flat (model-specific noise)

Tests:
- New unit tests for prependCachedSystemMessage edge cases (empty/
  undefined systemPrompt, non-mutation invariant) and toAiSdkTools'
  last-tool cache_control attachment (1, 3 tools).
- Updated internal-llm-service test for completeTask: assertion now
  matches the iter-0 user-message dateTime prefix.

Out of scope (deferred):
- Cache-token field extraction in response.usage (cacheReadTokens,
  cacheCreationTokens). Without telemetry to consume them they would
  be dead code.
- Telemetry / UsageLogger plumbing — separate ticket.
- Anthropic daemon-stability mitigations (TPM retry-loop hangs,
  daemon-readiness races on large fixtures) — operational, separate.

Builds: clean. Typecheck: clean. Lint: clean on touched files.
Tests: 7004 passing, 16 pending, 0 failing.
…hazards

- Drop the unused `datetime` template variable from FileContributor's
  renderTemplateVariables. There is no current YAML consumer; leaving it
  in place was a landmine — any future `{{datetime}}` reference would
  silently re-introduce per-iteration cache poisoning.
- Inject the dateTime block into the rolling-checkpoint continuation
  prompt. clearHistory() wipes the iter-0 user message that originally
  carried it, and the iter-0 guard upstream prevents re-injection, so
  long curate/query runs that hit the checkpoint were losing time
  context for every subsequent iteration.
- Extract `buildDateTimePrefix(now?)` as a small shared helper used by
  both call sites; cover with unit tests.
- Expand the cache_control "last tool" test comment to clarify that the
  contract is insertion-order, not alphabetical.
feat: [ENG-2519] enable prefix caching for providers
Run tools.curation.recon synchronously in CurateExecutor before
spawning the agent session, inject the result both as a sandbox
variable (__recon_result_<taskIdSafe>) and as inline fields in the
agent's first prompt. Agent's first iteration then proceeds directly
to extraction instead of paying a full LLM call to invoke recon.

Recon is pure JS — no LLM, no decision — but was previously surfaced
as an agent-tool, costing a full ~13K-token iteration on every curate
just to invoke a deterministic helper.

Measurements (12-fixture A/B vs proj/curation-enhancement + ENG-2519
caching baseline, full results in notes/token-usage-reduction/
eng-2530-prepipeline-recon/REPORT.md):

  OpenAI gpt-4.1                  -16.3% calls  -20.3% normalized cost
  Google gemini-3-flash-preview   -1.9%  calls  -17.8% normalized cost
  Anthropic claude-haiku-4-5      -2.0%  calls  -0.9%  normalized cost (flat)

ByteRover gemini-3-flash-preview 15-curate progression: -6.5% normalized
cost / -3.4% \$ cost.

Quality risk is bounded: recon itself is unchanged; only the caller
moves. Per-fixture variance follows the team's exp 04 pattern (some
fixtures regress as the agent compensates for missing recon signal in
later iterations); aggregate nets positive on OpenAI and Google.
Anthropic claude-haiku-4-5 lands flat — different model-class behavior
from the team's claude-sonnet-4-5 measurement; not a regression.
Pin the central correctness invariant of the change: recon's result is
injected as `__recon_result_<taskIdSafe>` on the task session, and the
agent prompt instructs the loop to skip the now-deterministic call.
Without this test a refactor could silently remove the synchronous
`recon()` call or the prompt instruction, defeating the whole change.
- curate-executor: add comment near reconHelper(...) noting that the
  `metadata` argument is currently a no-op — `recon` recomputes counts
  from `effectiveContext` directly. Prevents future readers from
  assuming changes to `metadata` would alter `reconResult`. Upstream
  signature fix is out of scope for this PR.
- test: add stubs for FileContextTreeSnapshotService.getCurrentState
  and DreamStateService.incrementCurationCount so the unit test stays
  in-memory (mirrors the ENG-2485 test pattern); keeps to CLAUDE.md's
  "unit tests must run completely in memory" rule.
- test: assert meta.messageCount alongside charCount and lineCount
  (the prompt surfaces all three).
feat: [ENG-2530] pre-pipeline recon to skip first agent iteration
Resolves three conflicts created by the overlap of ENG-2485 (defer summary
cascade to dream, on this branch) and ENG-2522 (detach post-curate Phase 4,
shipped on main, PR #582). The two changes touch the same lines but address
different costs — LLM cost vs user-visible latency — and compose cleanly.

curate-executor.ts
- Adopt main's {finalize, response} architecture so the daemon fires
  task:completed before Phase 4 runs.
- Inside finalize, keep cascade-defer: snapshot diff →
  enqueueStaleSummaryPaths + buildManifest. Drop main's
  propagateSummariesUnderLock — the dream lock existed to coordinate
  inline _index.md writes, which cascade-defer eliminates from curate.
- Preserve all exp 03 (cache_control) and exp 04 (recon pre-pipelining)
  changes in Phases 1-3.
- Extract propagateAndRebuild private helper alongside main's
  incrementDreamCounter.

dream-executor.ts
- Keep this branch's runStaleSummaryPropagation extracted seam (main did
  not change dream-executor; nothing lost).

curate-executor.test.ts
- Rewrite runAgentBody/finalize split assertions to expect cascade-defer
  (enqueueStaleSummaryPaths + buildManifest) instead of propagateStaleness.
- Drop the dream-lock coordination describe block — the lock is no longer
  taken in curate's Phase 4 (folder-pack still uses it).

folder-pack-executor still calls propagateSummariesUnderLock; the helper
file stays.

Verified: typecheck, lint (0 errors on resolved files), 7284 tests pass.
…nc-main-3-10-1

merge: sync main (3.10.1) into curation-enhancement
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Claude finished @danhdoan's task in 4m 47s —— View job


PR Review — Proj/curation enhancement

  • Gather context and read changed files
  • Review core implementation files
  • Review test files
  • Post inline comments for specific issues
  • Post summary with overall score

Summary

Score: 4 / 5

This is a well-executed batch of internal optimizations. The PR description is excellent — each feature is precisely scoped, the A/B evidence is thorough, and the risk/mitigation section is honest about per-fixture variance. The code quality is consistently high.


What's strong

ENG-2485 (cascade-defer): The atomic drain design in DreamStateService is the right choice. Using a single RMW that captures-and-clears means any concurrent curate enqueue that arrives during propagation appends to an already-empty queue and survives independently — the previous "snapshot + clear-later" approach would have silently dropped same-path enqueues. The gate-2 bypass in DreamTrigger (staleSummaryPaths.length > 0 short-circuits the curationsSinceDream < minCurations check) correctly ensures low-activity projects with deferred work are still drained.

ENG-2518 (batch abstract): The opener-anchored slice parser in parseBatchedTags is a robust choice — by slicing from each <file path=...> opener to the next rather than relying on </file> closers, it handles LLM output that mentions </file> in prose (XML docs, JSX guides) without breaking the envelope. CDATA wrapping for file content is correctly implemented with the standard ]]> split escape.

ENG-2519 (prefix caching): Removing dateTime from the system prompt template and injecting it into the iter-0 user message is the right fix. The comment in file-contributor.ts documenting why datetime is intentionally absent from template variables is load-bearing and should stay. The prependCachedSystemMessage approach (system-role message in the messages array rather than the system: top-level param) is the correct way to attach Anthropic cache_control providerOptions via AI SDK.

ENG-2530 (recon pre-pipeline): The prompt construction is clear and the sandbox injection pattern is consistent with existing RLM variables. The comment about metadata being unused by reconHelper is good defensive documentation.

Tests: Coverage is thorough. The drainStaleSummaryPaths atomicity test, the A∪B merge invariant test via runStaleSummaryPropagation override, and the CDATA injection test in abstract-generator-batch.test.ts all pin exactly the right correctness properties.


Issues to address

Moderate — abstract-queue.ts line 215: contextPath is assumed to be absolute, but this is not enforced or documented. If a relative path reaches writeFile, it silently writes to process.cwd(). Given the queue is a background process whose failures are catch-suppressed, a misconfigured path would produce no visible error. An isAbsolute guard in enqueue() or a documented precondition would prevent silent mislocation. (See inline comment.)

Minor — ai-sdk-message-converter.ts line 84: The cacheControl attachment to the "last" tool relies on Object.entries() insertion order, which is stable in V8 but implicitly coupled to tool registration ordering. The companion test acknowledges this (and notes the sort-objects lint rule determines insertion order). A lint-rule change or future tool object reconstruction could silently shift the cache boundary. Documenting the ordering dependency in getToolNamesForCommand would make the contract visible. (See inline comment.)

Minor — dream-trigger.ts line 87: The gate-2 bypass path (staleSummaryPaths.length > 0) returns {eligible: true} with no signal about why the gate was bypassed. Future --dry-run or eligibility logging will be unable to distinguish activity-triggered from queue-triggered dreams. (See inline comment.)

Minor — dream-state-service.ts line 66: Missing test for the no-write-on-empty-drain invariant. The early-return guard (if (snapshot.length === 0) return state) is correct, but no test verifies that the write is suppressed. (See inline comment.)

Observation — curate-executor.ts parseCurationStatus fallback: The heuristic response.includes('failed') sets status: 'failed' on any response mentioning the word "failed" (including "I found no failures"). This predates the PR but is now more visible via the public lastStatus field. Worth tightening in a follow-up.


No concerns with

  • Security: toContextTreeRelative path traversal guard, XML attribute escaping in escapeXmlAttr, and CDATA handling are all correct.
  • The stateMutexes module-level registry is appropriately bounded and the resolve() key normalisation is correct.
  • The finalize thunk / runAgentBody split is clean — error path deletes the session and returns no finalize, success path deletes in finally so the session is always cleaned up.
  • The drainRequested flag pattern in AbstractGenerationQueue correctly handles below-cap flush at curate-end.
  • Dedup-by-last in generateFileAbstractsBatch is correct: the queue is FIFO so the last occurrence carries the most recent content.

Comment thread src/agent/infra/map/abstract-queue.ts
Comment thread src/agent/infra/llm/generators/ai-sdk-message-converter.ts
Comment thread src/server/infra/executor/curate-executor.ts
Comment thread src/server/infra/dream/dream-trigger.ts
Comment thread src/server/infra/executor/dream-executor.ts
Comment thread src/agent/infra/map/abstract-generator.ts
Comment thread src/server/infra/dream/dream-state-service.ts
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Claude finished @danhdoan's task in 3m 59s —— View job


PR Review — Proj/curation enhancement

  • Gather context and read changed files
  • Review core implementation files
  • Review test files
  • Post inline comments for specific issues
  • Post summary with overall score

Summary

Score: 4 / 5

This is a well-executed batch of performance optimizations with strong test coverage and a clear PR description. The code quality is high and the design choices are sound. The main outstanding items are three minor issues carried over from the prior review round that were not addressed, plus one new testability concern in CurateExecutor.


What's strong in this revision

ENG-2485 (cascade-defer): The drainStaleSummaryPaths atomic RMW design is correct and the new test suite is thorough — atomicity, dedup, enqueuedAt preservation, concurrent-enqueue safety, and the same-path-after-drain race are all pinned. The gate-2 bypass in DreamTrigger is correct and now tested via both tryStartDream and checkEligibility paths.

ENG-2518 (batch abstract): The opener-anchored parseBatchedTags and CDATA wrapping are both correctly implemented. The test suite covers out-of-order output, missing paths, malformed output, dedup-by-last semantics, ]]> escape, literal </file> in prose, and the two-LLM-calls-for-N=5 efficiency claim.

ENG-2519 (prefix caching): buildDateTimePrefix exported for testability, prependCachedSystemMessage correctly prepends a system-role message with providerOptions.anthropic.cacheControl. The refreshDateTime removal and service-initializer.ts comment documenting why datetime is absent from the system prompt are both load-bearing.

ENG-2530 (recon pre-pipeline): The reconHelper call at executor level is clean. Injecting the result as both a sandbox variable and inline prompt fields is the right dual-surface approach. The test asserting the variable name, shape, and prompt content is precise.

dream-executor.ts runStaleSummaryPropagation seam: The protected override pattern lets tests assert the A ∪ B merge paths without needing a real file system. Good design that pays off in the three new dream-executor tests.


Issues to address

Moderate — abstract-queue.ts line 214 (unfixed from prior review): r.contextPath is still assumed absolute with no enforcement. A relative path silently writes under process.cwd() with no visible error since failures are catch-suppressed. (See inline comment.)

Minor — curate-executor.ts line 408 (new): DreamStateService is instantiated directly in propagateAndRebuild rather than injected. Tests work around this via prototype stubbing, but future integration tests can't swap the state backend. (See inline comment.)

Minor — curate-executor.ts line 401 (new): The catch-block comment says "The next curate's enqueue will still capture the same paths via diffStates" — this is only true if those exact files are modified again in a future curate. Paths uniquely changed by the failing curate and not re-dirtied afterward are silently dropped. (See inline comment.)

Minor — dream-trigger.ts line 87 (unfixed from prior review): The activity-gate bypass returns {eligible: true} with no reason field distinguishing queue-triggered from activity-triggered dreams. The new tests expose this gap at the test level. (See inline comment.)

Minor — ai-sdk-message-converter.ts line 84 (unfixed from prior review): The insertion-order coupling for cacheControl is acknowledged in the test comment but not at the registration site where a future maintainer would add a tool. (See inline comment.)

Minor — dream-state-service.ts line 66 (partially addressed): The empty-drain no-write invariant is tested behaviorally (drain returns [], subsequent enqueue survives) but no spy asserts that the underlying update/writeFile is suppressed on an empty queue. (See inline comment.)


No concerns with

  • Security: path traversal via toContextTreeRelative, XML attribute escaping via escapeXmlAttr, CDATA terminator escape via `]]]]><

Comment thread src/agent/infra/map/abstract-queue.ts
Comment thread src/server/infra/executor/curate-executor.ts Outdated
Comment thread src/server/infra/executor/curate-executor.ts
Comment thread src/server/infra/dream/dream-trigger.ts
Comment thread src/agent/infra/llm/generators/ai-sdk-message-converter.ts
Comment thread src/server/infra/dream/dream-state-service.ts
Comment thread src/server/infra/executor/dream-executor.ts
Comment thread src/agent/infra/map/abstract-generator.ts
Four small reviewer-flagged issues with focused fixes:

- abstract-queue.ts: drop non-absolute contextPath at enqueue with a
  trace breadcrumb. Background batch writes derive .abstract.md /
  .overview.md via raw writeFile and errors are catch-suppressed, so a
  relative path would silently land under process.cwd().
- tool-manager.ts: document the insertion-order contract for
  CURATE_TOOL_NAMES and QUERY_TOOL_NAMES at the registration site.
  toAiSdkTools attaches Anthropic cacheControl: ephemeral to the LAST
  tool, so reordering this list silently shifts the cache breakpoint.
- dream-state-service.ts: skip the disk write in update() when the
  updater returns the same state reference. drainStaleSummaryPaths and
  enqueueStaleSummaryPaths already follow the no-op-by-identity
  convention; this makes the contract observable at the disk level so
  empty drains don't tmpfile + rename for nothing. New test pins it.
- curate-executor.ts: correct the catch-block comment under the
  cascade-defer enqueue. The previous wording implied diffStates would
  re-capture the lost paths automatically; in fact paths uniquely
  changed by the failing curate are dropped until the same files are
  modified again or dream's own snapshot diff touches them.
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

Claude finished @danhdoan's task in 3m 12s —— View job


PR Review — Proj/curation enhancement

  • Gather context: examine git diff and new commits since last review
  • Check status of previously flagged issues
  • Review key implementation files
  • Post inline comments for specific issues
  • Post summary with overall score

Summary

Score: 4.5 / 5

The latest commit (8b6167b refactor: address review-agent feedback on PR #601) addressed four of the six previously flagged issues. The code quality is high and the fixes are precise. Two minor items remain open.


What was fixed in this revision

abstract-queue.tsisAbsolute guard (moderate, prior review): enqueue() now drops non-absolute contextPath entries with a queueLog trace breadcrumb before they can reach the catch-suppressed writeFile path. Correctly placed at the entry point. (See inline praise.)

tool-manager.ts — insertion-order documentation (minor, prior review): CURATE_TOOL_NAMES and QUERY_TOOL_NAMES now carry explicit warnings that their iteration order is the Anthropic cache breakpoint. Future maintainers adding a tool will see the constraint at the registration site. One minor follow-up: the comment cites filterToolsForCommand but the real method is getToolsForCommand — easy one-word fix. (See inline comment.)

dream-state-service.ts — no-write-on-empty-drain (minor, prior review): The update() identity guard (if (next !== state)) skips the atomic tmp→rename on no-op calls. The new spy test directly verifies write is not called on an empty drainStaleSummaryPaths(). This closes the invariant gap flagged in both prior rounds. (See inline praise.)

curate-executor.ts — misleading catch comment (minor, prior review): The comment under the cascade-defer enqueue catch block now accurately describes the recovery scope: paths are not automatically re-captured by diffStates and are only recovered if the same files are re-dirtied in a later curate or touched by dream's snapshot diff. Honest and actionable.


Still open

Minor — dream-trigger.ts line 100 (three rounds unaddressed): The gate-2 bypass returns {eligible: true} with no reason field distinguishing queue-triggered from activity-triggered dreams. The inline comment was not postable on this file since it wasn't changed in the current commit, but the gap persists. A one-liner resolves it:

return {eligible: true, reason: state.staleSummaryPaths.length > 0 ? 'deferred-summary-queue' : 'activity'}

Minor — curate-executor.ts line 405 (two rounds unaddressed): DreamStateService is still directly constructed at two sites (lines 273 and 405). Tests use prototype stubbing as a workaround. A dreamStateServiceFactory constructor parameter defaulting to (baseDir) => new DreamStateService({baseDir}) would let integration tests swap the state backend cleanly. (See inline comment.)


No new concerns

  • The batched pending.unshift on retry (priority re-queue) is consistent with the pre-batch behavior on main and intentional.
  • The scheduleNext() + resolveDrainersIfIdle() double-call after the finally block is harmless — resolveDrainersIfIdle is idempotent once the resolver array is spliced.
  • The drainRequested reset in the immediate-idle path (if (this.isIdle()) in drain()) correctly prevents the next curate's enqueue burst from firing single-item batches prematurely.
  • Security, path traversal, and XML escaping remain clean.

Comment thread src/server/infra/executor/curate-executor.ts
Comment thread src/agent/infra/tools/tool-manager.ts
Comment thread src/agent/infra/map/abstract-queue.ts
Comment thread src/server/infra/dream/dream-state-service.ts
@bao-byterover bao-byterover merged commit db6560e into main May 4, 2026
6 checks passed
ngduyanhece added a commit that referenced this pull request May 4, 2026
)

Brings in main's PR #601 (Proj/curation enhancement, db6560e) which
restructured the post-curate flow:

- ENG-2485: Phase 4 propagation deferred to dream — curate now ENQUEUES
  stale-summary paths via DreamStateService.enqueueStaleSummaryPaths and
  rebuilds the manifest inline; the LLM-driven propagateStaleness walk
  no longer runs on the curate hot path.
- ENG-2530: pre-pipelined recon for the agent-loop path (a deterministic
  helper invocation that skips one full LLM iteration).
- ENG-2518: batched abstract generation across queued files.

Conflicts resolved (2 files):

- src/server/infra/executor/curate-executor.ts (2 hunks)
  * Imports: kept HEAD's typed-slot DAG runner imports
    (TopologicalCurationRunner, NodeContext, buildCurationDAG,
    loadExistingMemory, buildLiveServices). Dropped main's
    `recon as reconHelper` import — our DAG already has a recon-node
    that runs deterministically as the first slot, so PR #601's
    pre-pipelined reconHelper for the agent-loop path is redundant.
  * Body: kept HEAD's typed-slot DAG runner (PR #578) inside the
    runAgentBody/finalize split. Adopted main's `propagateAndRebuild`
    helper (auto-merged from main) for the finalize thunk —
    enqueueStaleSummaryPaths + buildManifest, no inline propagateStaleness.

- test/unit/infra/executor/curate-executor.test.ts
  * Response assertions updated to match the typed-slot DAG output
    (`/Curate completed via typed-slot DAG/`) instead of `'curated'`.
  * Phase-4 lifecycle assertions adopted from main (enqueueStub +
    buildManifestStub + propagateStalenessStub.called === false) —
    confirming ENG-2485 invariant.
  * Dropped the obsolete "pre-pipelined recon (ENG-2530)" describe
    block — the typed-slot DAG runs recon as a node, not via sandbox-
    variable injection. DAG-recon coverage lives in
    test/unit/agent/curate-flow/dag-builder.test.ts.
  * Dropped the "dream-lock coordination in Phase 4" describe block —
    propagation moved to dream itself (ENG-2485), so the lock dance
    no longer happens on the curate path.

Verification post-merge
- typecheck: 0 errors.
- 212/212 curate-related tests passing across:
  * test/unit/infra/executor/curate-executor.test.ts (all split + leak + scoping tests)
  * test/integration/curate/services-adapter-live-write.test.ts (Phase A+B prefix-cluster + batching)
  * test/unit/agent/curate-flow/*.test.ts
  * test/integration/curate/*.test.ts
  * test/unit/agent/tools/curate-tool*.test.ts
  * test/unit/infra/process/curate-log-handler.test.ts
- lint: 0 errors on changed files.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot mentioned this pull request May 4, 2026
29 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants