Skip to content

Commit d62cb4d

Browse files
ngduyanhececlaude
andcommitted
Merge branch 'main' into proj/curate-dag-cutover (re-merge after PR #601)
Brings in main's PR #601 (Proj/curation enhancement, db6560e) which restructured the post-curate flow: - ENG-2485: Phase 4 propagation deferred to dream — curate now ENQUEUES stale-summary paths via DreamStateService.enqueueStaleSummaryPaths and rebuilds the manifest inline; the LLM-driven propagateStaleness walk no longer runs on the curate hot path. - ENG-2530: pre-pipelined recon for the agent-loop path (a deterministic helper invocation that skips one full LLM iteration). - ENG-2518: batched abstract generation across queued files. Conflicts resolved (2 files): - src/server/infra/executor/curate-executor.ts (2 hunks) * Imports: kept HEAD's typed-slot DAG runner imports (TopologicalCurationRunner, NodeContext, buildCurationDAG, loadExistingMemory, buildLiveServices). Dropped main's `recon as reconHelper` import — our DAG already has a recon-node that runs deterministically as the first slot, so PR #601's pre-pipelined reconHelper for the agent-loop path is redundant. * Body: kept HEAD's typed-slot DAG runner (PR #578) inside the runAgentBody/finalize split. Adopted main's `propagateAndRebuild` helper (auto-merged from main) for the finalize thunk — enqueueStaleSummaryPaths + buildManifest, no inline propagateStaleness. - test/unit/infra/executor/curate-executor.test.ts * Response assertions updated to match the typed-slot DAG output (`/Curate completed via typed-slot DAG/`) instead of `'curated'`. * Phase-4 lifecycle assertions adopted from main (enqueueStub + buildManifestStub + propagateStalenessStub.called === false) — confirming ENG-2485 invariant. * Dropped the obsolete "pre-pipelined recon (ENG-2530)" describe block — the typed-slot DAG runs recon as a node, not via sandbox- variable injection. DAG-recon coverage lives in test/unit/agent/curate-flow/dag-builder.test.ts. * Dropped the "dream-lock coordination in Phase 4" describe block — propagation moved to dream itself (ENG-2485), so the lock dance no longer happens on the curate path. Verification post-merge - typecheck: 0 errors. - 212/212 curate-related tests passing across: * test/unit/infra/executor/curate-executor.test.ts (all split + leak + scoping tests) * test/integration/curate/services-adapter-live-write.test.ts (Phase A+B prefix-cluster + batching) * test/unit/agent/curate-flow/*.test.ts * test/integration/curate/*.test.ts * test/unit/agent/tools/curate-tool*.test.ts * test/unit/infra/process/curate-log-handler.test.ts - lint: 0 errors on changed files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2 parents 50a662e + db6560e commit d62cb4d

24 files changed

Lines changed: 1566 additions & 219 deletions

src/agent/infra/agent/service-initializer.ts

Lines changed: 8 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -193,12 +193,18 @@ export async function createCipherAgentServices(
193193
basePath: promptsBasePath,
194194
validateConfig: true,
195195
})
196-
// Register default contributors
196+
// Register default contributors.
197+
//
198+
// Note: dateTime is intentionally NOT in the system prompt. Anthropic
199+
// prompt caching does token-level prefix matching, so a per-iteration
200+
// refreshed timestamp here would invalidate the cache for everything
201+
// past it. dateTime is instead injected into the first user message
202+
// by AgentLLMService, where it lives after the cache breakpoints and
203+
// does not poison the cached prefix.
197204
systemPromptManager.registerContributors([
198205
{enabled: true, filepath: 'system-prompt.yml', id: 'base', priority: 0, type: 'file'},
199206
{enabled: true, id: 'env', priority: 10, type: 'environment'},
200207
{enabled: true, id: 'memories', priority: 20, type: 'memory'},
201-
{enabled: true, id: 'datetime', priority: 30, type: 'dateTime'},
202208
])
203209

204210
// Register context tree structure contributor for query/curate commands

src/agent/infra/llm/agent-llm-service.ts

Lines changed: 29 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -60,6 +60,18 @@ import {type ProcessedOutput, ToolOutputProcessor, type TruncationConfig} from '
6060
/** Target utilization ratio for message tokens (leaves headroom for response) */
6161
const TARGET_MESSAGE_TOKEN_UTILIZATION = 0.7
6262

63+
/**
64+
* Build a `<dateTime>...</dateTime>\n\n` prefix for a user-message body.
65+
*
66+
* Per-call timestamps must NOT enter the system prompt (they would poison
67+
* the prefix cache). They are injected into the user message instead, at
68+
* the boundaries where the model legitimately needs fresh time context:
69+
* the iter-0 input, and after a rolling-checkpoint history clear.
70+
*/
71+
export function buildDateTimePrefix(now: Date = new Date()): string {
72+
return `<dateTime>Current date and time: ${now.toISOString()}</dateTime>\n\n`
73+
}
74+
6375
/**
6476
* Result of parallel tool execution (before adding to context).
6577
* Contains all information needed to add the result to context in order.
@@ -902,8 +914,11 @@ export class AgentLLMService implements ILLMService {
902914
this.cachedBasePrompt = basePrompt
903915
this.memoryDirtyFlag = false
904916
} else {
905-
// Cache hit: reuse base prompt, only refresh the DateTime section
906-
basePrompt = this.refreshDateTime(this.cachedBasePrompt!)
917+
// Cache hit: reuse base prompt verbatim. The cached prompt has no
918+
// dateTime section to refresh — dateTime is injected into the
919+
// first user message instead so the system prefix stays byte-stable
920+
// across iterations and prompt caching can engage cleanly.
921+
basePrompt = this.cachedBasePrompt!
907922
}
908923

909924
let systemPrompt = basePrompt
@@ -944,9 +959,13 @@ export class AgentLLMService implements ILLMService {
944959

945960
// Add user message and compress context within mutex lock
946961
return this.mutex.withLock(async () => {
947-
// Add user message to context only on the first iteration
962+
// Add user message to context only on the first iteration. The
963+
// dateTime block is prefixed here (not in the system prompt) so
964+
// the cached system prefix stays byte-stable across iterations
965+
// and Anthropic/OpenAI/Google prefix caches can engage cleanly.
948966
if (iterationCount === 0) {
949-
await this.contextManager.addUserMessage(textInput, imageData, fileData)
967+
const inputWithDateTime = `${buildDateTimePrefix()}${textInput}`
968+
await this.contextManager.addUserMessage(inputWithDateTime, imageData, fileData)
950969
}
951970

952971
// Rolling checkpoint: periodically save progress and clear history for RLM commands.
@@ -1540,8 +1559,12 @@ export class AgentLLMService implements ILLMService {
15401559
// Clear conversation history
15411560
await this.contextManager.clearHistory()
15421561

1543-
// Re-inject continuation prompt with variable reference
1544-
const continuationPrompt = [
1562+
// Re-inject continuation prompt with variable reference.
1563+
// Prepend the dateTime block: clearHistory wiped the iter-0 user
1564+
// message that originally carried it, and the iter-0 guard upstream
1565+
// prevents re-injection. Without this, every iteration after the
1566+
// first checkpoint loses time context for the rest of the run.
1567+
const continuationPrompt = buildDateTimePrefix() + [
15451568
`Continue task. Iteration checkpoint at turn ${iterationCount}.`,
15461569
`Previous progress stored in variable: ${checkpointVar}`,
15471570
`Original task: ${textInput.slice(0, 200)}${textInput.length > 200 ? '...' : ''}`,
@@ -1555,19 +1578,6 @@ export class AgentLLMService implements ILLMService {
15551578
})
15561579
}
15571580

1558-
/**
1559-
* Replace the DateTime section in a cached system prompt with a fresh timestamp.
1560-
* DateTimeContributor wraps its output in <dateTime>...</dateTime> XML tags,
1561-
* enabling reliable regex replacement without rebuilding the entire prompt.
1562-
*
1563-
* @param cachedPrompt - Previously cached system prompt
1564-
* @returns Updated prompt with fresh DateTime
1565-
*/
1566-
private refreshDateTime(cachedPrompt: string): string {
1567-
const freshDateTime = `<dateTime>Current date and time: ${new Date().toISOString()}</dateTime>`
1568-
return cachedPrompt.replace(/<dateTime>[\S\s]*?<\/dateTime>/, freshDateTime)
1569-
}
1570-
15711581
/**
15721582
* Check if a rolling checkpoint should trigger.
15731583
* Triggers every N iterations for curate/query commands, or when token utilization is high.

src/agent/infra/llm/generators/ai-sdk-content-generator.ts

Lines changed: 25 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
* Replaces per-provider content generators with one unified implementation.
66
*/
77

8-
import type {LanguageModel} from 'ai'
8+
import type {LanguageModel, ModelMessage} from 'ai'
99

1010
import {generateText, streamText} from 'ai'
1111

@@ -22,6 +22,28 @@ import {toAiSdkTools, toModelMessages} from './ai-sdk-message-converter.js'
2222

2323
const DEFAULT_CHARS_PER_TOKEN = 4
2424

25+
/**
26+
* Prepend the system prompt as a system-role message carrying
27+
* `providerOptions.anthropic.cacheControl: ephemeral`. AI SDK's top-level
28+
* `system: string` parameter does not propagate providerOptions, so the
29+
* only way to attach Anthropic cache_control to the system block is to
30+
* pass it through the messages array. Non-Anthropic providers ignore the
31+
* `anthropic` namespace.
32+
*/
33+
export function prependCachedSystemMessage(systemPrompt: string | undefined, messages: ModelMessage[]): ModelMessage[] {
34+
if (!systemPrompt) {
35+
return messages
36+
}
37+
38+
const systemMessage: ModelMessage = {
39+
content: systemPrompt,
40+
providerOptions: {anthropic: {cacheControl: {type: 'ephemeral'}}},
41+
role: 'system',
42+
}
43+
44+
return [systemMessage, ...messages]
45+
}
46+
2547
/**
2648
* Configuration for AiSdkContentGenerator.
2749
*/
@@ -54,7 +76,7 @@ export class AiSdkContentGenerator implements IContentGenerator {
5476
}
5577

5678
public async generateContent(request: GenerateContentRequest): Promise<GenerateContentResponse> {
57-
const messages = toModelMessages(request.contents)
79+
const messages = prependCachedSystemMessage(request.systemPrompt, toModelMessages(request.contents))
5880
const tools = toAiSdkTools(request.tools)
5981

6082
const result = await generateText({
@@ -63,7 +85,6 @@ export class AiSdkContentGenerator implements IContentGenerator {
6385
messages,
6486
model: this.model,
6587
temperature: request.config.temperature,
66-
...(request.systemPrompt && {system: request.systemPrompt}),
6788
...(tools && {tools}),
6889
...(request.config.topK !== undefined && {topK: request.config.topK}),
6990
...(request.config.topP !== undefined && {topP: request.config.topP}),
@@ -100,7 +121,7 @@ export class AiSdkContentGenerator implements IContentGenerator {
100121
}
101122

102123
public async *generateContentStream(request: GenerateContentRequest): AsyncGenerator<GenerateContentChunk> {
103-
const messages = toModelMessages(request.contents)
124+
const messages = prependCachedSystemMessage(request.systemPrompt, toModelMessages(request.contents))
104125
const tools = toAiSdkTools(request.tools)
105126

106127
const result = streamText({
@@ -109,7 +130,6 @@ export class AiSdkContentGenerator implements IContentGenerator {
109130
messages,
110131
model: this.model,
111132
temperature: request.config.temperature,
112-
...(request.systemPrompt && {system: request.systemPrompt}),
113133
...(tools && {tools}),
114134
...(request.config.topK !== undefined && {topK: request.config.topK}),
115135
...(request.config.topP !== undefined && {topP: request.config.topP}),

src/agent/infra/llm/generators/ai-sdk-message-converter.ts

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -63,18 +63,25 @@ export function toModelMessages(messages: InternalMessage[]): ModelMessage[] {
6363
/**
6464
* Convert our ToolSet to AI SDK tool definitions.
6565
* Tools are declared without `execute` — our agentic loop handles execution.
66+
*
67+
* The last tool gets `providerOptions.anthropic.cacheControl: ephemeral`,
68+
* which makes Anthropic cache the entire tool block (and the system prompt
69+
* before it). Non-Anthropic providers ignore the `anthropic` namespace.
6670
*/
6771
export function toAiSdkTools(tools?: InternalToolSet): Record<string, ReturnType<typeof aiSdkTool>> | undefined {
6872
if (!tools || Object.keys(tools).length === 0) {
6973
return undefined
7074
}
7175

76+
const entries = Object.entries(tools)
7277
const result: Record<string, ReturnType<typeof aiSdkTool>> = {}
7378

74-
for (const [name, def] of Object.entries(tools)) {
79+
for (const [index, [name, def]] of entries.entries()) {
80+
const isLast = index === entries.length - 1
7581
result[name] = aiSdkTool({
7682
description: def.description ?? '',
7783
inputSchema: jsonSchema(def.parameters as Record<string, unknown>),
84+
...(isLast && {providerOptions: {anthropic: {cacheControl: {type: 'ephemeral'}}}}),
7885
})
7986
}
8087

0 commit comments

Comments
 (0)