Skip to content

Commit 44e3ab7

Browse files
committed
feat: add /distill skill and PreCompact/SessionStart hooks
Claude Code's built-in compaction uses the same degrading context to summarize itself, so user constraints, identifiers, and mid-session instructions can be lost progressively. The `/distill` skill and hook system address this with a fresh-context sub-agent pattern: a separate `claude -p --model sonnet` subprocess reads the session JSONL from disk and produces a structured summary with verbatim identifier preservation. PreCompact runs distillation before compaction, SessionStart re-injects the summary after compacted sessions start, and `/distill` gives agents a manual checkpoint path. Keep `DISABLE_COMPACT` on the personal profile so compaction behavior changes stay opt-in for this setup, with work inheriting it instead of duplicating the same override.
1 parent 485e9f8 commit 44e3ab7

9 files changed

Lines changed: 770 additions & 1 deletion

File tree

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
# Distill Sub-Agent Briefing
2+
3+
You are a specialized context distillation agent. Your job is to produce a high-fidelity structured summary of a coding session transcript.
4+
5+
You operate in a FRESH context — you have never seen this conversation before. The transcript below is your only source of truth.
6+
7+
## Date and Project
8+
9+
- Date: {{DATE}}
10+
- Project: {{PROJECT}}
11+
12+
## Verbatim Preservation Rules (CRITICAL)
13+
14+
These rules override all other summarization instincts. Violating them defeats the purpose of distillation.
15+
16+
**COPY EXACTLY — never paraphrase:**
17+
- File paths: `/src/ai_rules/config/claude/settings.json` not "the settings file"
18+
- Function/class/variable names: `extract_transcript()` not "the extraction function"
19+
- Error codes and messages: `ImportError: No module named 'distill_core'` not "an import error"
20+
- Branch names: `feature/distill-skill` not "the feature branch"
21+
- CLI flags and commands: `claude -p --model sonnet` not "the Claude CLI"
22+
- Config keys: `autoCompactEnabled` not "the auto-compact setting"
23+
24+
**COPY VERBATIM — user instructions are sacred:**
25+
- Section 7 (User Instructions and Constraints) is the most critical section
26+
- Every "don't do X", "always Y", "use Z approach" must be preserved word-for-word
27+
- Every correction ("no, not that — do this instead") must be captured
28+
- Do not soften, reinterpret, or paraphrase user constraints
29+
30+
**Recent exchanges get MORE detail, not less:**
31+
- The last 3-5 conversational turns should be summarized with higher fidelity
32+
- These represent the most immediately actionable context
33+
34+
## Anti-Patterns (DO NOT)
35+
36+
- "The user and assistant discussed X" → WRONG. State WHAT was decided, not that a discussion happened.
37+
- "Several files were modified" → WRONG. List WHICH files with their full paths.
38+
- "Various approaches were considered" → WRONG. List the specific approaches and their outcomes.
39+
- "The configuration was updated" → WRONG. State which config file, which keys, what values.
40+
- Abstractive paraphrase of technical terms → WRONG. Use the exact terms from the transcript.
41+
- Omitting failed approaches → WRONG. Dead ends prevent re-exploration.
42+
43+
## Prior Summary
44+
45+
{{PRIOR_SUMMARY}}
46+
47+
If a prior summary is provided above, this is an INCREMENTAL distillation:
48+
- Extend the prior summary rather than starting from scratch
49+
- Preserve ALL verbatim content from the prior summary
50+
- Add new information from the transcript that occurred after the prior distillation
51+
- If the prior summary conflicts with the transcript, trust the transcript
52+
- Update section 2 (Current Work State) and section 9 (Next Step) to reflect the latest state
53+
54+
If "[None — first distillation]" appears above, produce a complete summary from scratch.
55+
56+
## Output Format
57+
58+
Produce the summary following this exact structure. Do not add or remove sections.
59+
60+
---
61+
62+
## 1. Primary Request and Intent
63+
64+
[Concise summary of what the user is trying to accomplish and why]
65+
66+
## 2. Current Work State
67+
68+
[Exact current state with verbatim identifiers — branch, files, phase, active work]
69+
70+
## 3. Key Technical Decisions
71+
72+
[Each decision: what was decided, why, what was rejected]
73+
74+
## 4. Files and Code
75+
76+
[Every file touched — verbatim paths, role, what was done]
77+
78+
## 5. Errors and Fixes
79+
80+
[Every error — verbatim message, cause, resolution]
81+
82+
## 6. Problem Solving Progress
83+
84+
[Approaches tried, outcomes, direction, dead ends]
85+
86+
## 7. User Instructions and Constraints
87+
88+
[EVERY constraint verbatim — this is the most critical section]
89+
90+
## 8. Pending Tasks
91+
92+
1. [Task with actionable detail]
93+
2. [Task with actionable detail]
94+
95+
## 9. Next Step
96+
97+
[Single most important action with enough context to execute cold]
98+
99+
---
100+
101+
## Transcript
102+
103+
{{TRANSCRIPT}}
Lines changed: 180 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,180 @@
1+
"""Shared distillation logic for PreCompact and SessionStart hooks.
2+
3+
Extracts conversation transcripts from JSONL, applies observation masking,
4+
runs a fresh-context summarization subprocess, and persists artifacts.
5+
"""
6+
7+
import glob
8+
import json
9+
import os
10+
import shutil
11+
import subprocess
12+
13+
from datetime import datetime
14+
from pathlib import Path
15+
16+
17+
def get_project_slug(cwd: str) -> str:
18+
return cwd.replace("/", "-")
19+
20+
21+
def get_jsonl_path(cwd: str) -> str | None:
22+
home = os.path.expanduser("~")
23+
slug = get_project_slug(cwd)
24+
proj_dir = f"{home}/.claude/projects/{slug}"
25+
files = sorted(glob.glob(f"{proj_dir}/*.jsonl"), key=os.path.getmtime, reverse=True)
26+
return files[0] if files else None
27+
28+
29+
def get_summary_path(cwd: str) -> Path:
30+
home = os.path.expanduser("~")
31+
slug = get_project_slug(cwd)
32+
return Path(home) / ".claude" / "distill-summaries" / f"{slug}.md"
33+
34+
35+
def get_backup_path(cwd: str, date: str | None = None) -> Path:
36+
home = os.path.expanduser("~")
37+
slug = get_project_slug(cwd)
38+
if date is None:
39+
date = datetime.now().strftime("%Y-%m-%d")
40+
return Path(home) / ".claude" / "distill-backups" / f"{date}-{slug}.txt"
41+
42+
43+
def read_prior_summary(cwd: str) -> str | None:
44+
path = get_summary_path(cwd)
45+
if path.exists():
46+
return path.read_text()
47+
return None
48+
49+
50+
def extract_transcript(jsonl_path: str, max_chars: int = 120_000) -> str:
51+
lines: list[str] = []
52+
53+
with open(jsonl_path) as f:
54+
for raw_line in f:
55+
raw_line = raw_line.strip()
56+
if not raw_line:
57+
continue
58+
try:
59+
record = json.loads(raw_line)
60+
except json.JSONDecodeError:
61+
continue
62+
63+
msg_type = record.get("type", "")
64+
if msg_type == "summary":
65+
text = record.get("summary", "")
66+
if text:
67+
lines.append(f"[PRIOR COMPACTION SUMMARY]\n{text}\n")
68+
continue
69+
70+
message = record.get("message", {})
71+
if not isinstance(message, dict):
72+
continue
73+
74+
role = message.get("role", "")
75+
if role not in ("user", "assistant"):
76+
continue
77+
78+
content = message.get("content", "")
79+
if isinstance(content, str):
80+
lines.append(f"[{role.upper()}]\n{content}\n")
81+
elif isinstance(content, list):
82+
parts: list[str] = []
83+
for block in content:
84+
if isinstance(block, str):
85+
parts.append(block)
86+
elif isinstance(block, dict):
87+
parts.append(_process_content_block(block))
88+
if parts:
89+
lines.append(
90+
f"[{role.upper()}]\n" + "\n".join(p for p in parts if p) + "\n"
91+
)
92+
93+
transcript = "\n".join(lines)
94+
95+
if len(transcript) > max_chars:
96+
transcript = transcript[-max_chars:]
97+
first_newline = transcript.find("\n")
98+
if first_newline > 0:
99+
transcript = transcript[first_newline + 1 :]
100+
transcript = "[...transcript truncated from oldest end...]\n\n" + transcript
101+
102+
return transcript
103+
104+
105+
def _process_content_block(block: dict[str, object]) -> str:
106+
btype = block.get("type", "")
107+
108+
if btype == "text":
109+
return str(block.get("text", ""))
110+
111+
if btype == "tool_use":
112+
name = block.get("name", "unknown")
113+
inp = block.get("input", {})
114+
inp_str = json.dumps(inp) if isinstance(inp, dict) else str(inp)
115+
if len(inp_str) > 500:
116+
inp_str = inp_str[:500] + "..."
117+
return f"[Tool Call: {name}({inp_str})]"
118+
119+
if btype == "tool_result":
120+
tool_id = block.get("tool_use_id", "unknown")
121+
result_content = block.get("content", "")
122+
if isinstance(result_content, str):
123+
char_count = len(result_content)
124+
elif isinstance(result_content, list):
125+
char_count = sum(len(json.dumps(r)) for r in result_content)
126+
else:
127+
char_count = len(str(result_content))
128+
return f"[Tool Result: {tool_id} -- {char_count} chars, masked]"
129+
130+
return str(block.get("text", ""))
131+
132+
133+
def run_distill_subprocess(
134+
transcript: str,
135+
prior_summary: str | None,
136+
briefing_template: str,
137+
cwd: str | None = None,
138+
timeout: int = 120,
139+
) -> str | None:
140+
prior = prior_summary if prior_summary else "[None — first distillation]"
141+
date = datetime.now().strftime("%Y-%m-%d")
142+
if cwd is None:
143+
cwd = os.getcwd()
144+
145+
briefing = briefing_template
146+
briefing = briefing.replace("{{DATE}}", date)
147+
briefing = briefing.replace("{{PROJECT}}", cwd)
148+
briefing = briefing.replace("{{PRIOR_SUMMARY}}", prior)
149+
briefing = briefing.replace("{{TRANSCRIPT}}", transcript)
150+
151+
try:
152+
result = subprocess.run(
153+
["claude", "-p", "--model", "sonnet"],
154+
input=briefing,
155+
capture_output=True,
156+
text=True,
157+
timeout=timeout,
158+
)
159+
if result.returncode == 0 and result.stdout.strip():
160+
return result.stdout.strip()
161+
return None
162+
except (subprocess.TimeoutExpired, FileNotFoundError, OSError):
163+
return None
164+
165+
166+
def save_artifacts(cwd: str, summary: str, transcript: str) -> tuple[Path, Path]:
167+
summary_path = get_summary_path(cwd)
168+
backup_path = get_backup_path(cwd)
169+
170+
summary_path.parent.mkdir(parents=True, exist_ok=True)
171+
backup_path.parent.mkdir(parents=True, exist_ok=True)
172+
173+
prev_path = summary_path.with_name(f"{summary_path.stem}-prev{summary_path.suffix}")
174+
if summary_path.exists():
175+
shutil.move(str(summary_path), str(prev_path))
176+
177+
summary_path.write_text(summary)
178+
backup_path.write_text(transcript)
179+
180+
return summary_path, backup_path
Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
#!/usr/bin/env python3
2+
"""SessionStart compact hook: re-injects distill summary after CC compaction.
3+
4+
Safety net for the PreCompact hook. If the PreCompact stdout -> compaction
5+
model channel fails (undocumented behavior), this hook ensures the distill
6+
summary still reaches the post-compaction context as a system message.
7+
8+
Simple: read summary file, print to stdout. No subprocess, no heavy logic.
9+
Always exits 0.
10+
"""
11+
12+
import json
13+
import os
14+
import sys
15+
16+
sys.path.insert(0, os.path.dirname(os.path.realpath(__file__)))
17+
18+
try:
19+
import distill_core # type: ignore[import-not-found]
20+
except ImportError:
21+
sys.exit(0)
22+
23+
24+
def main() -> None:
25+
try:
26+
hook_input = json.load(sys.stdin)
27+
except (json.JSONDecodeError, EOFError):
28+
return
29+
30+
cwd = hook_input.get("cwd", os.getcwd())
31+
32+
summary = distill_core.read_prior_summary(cwd)
33+
if summary:
34+
print(summary)
35+
36+
37+
if __name__ == "__main__":
38+
try:
39+
main()
40+
except Exception:
41+
sys.exit(0)

0 commit comments

Comments
 (0)