Skip to content

Add Continuity compaction strategy#306

Draft
nhicks00 wants to merge 33 commits intompfaffenberger:mainfrom
nhicks00:continuity-compaction
Draft

Add Continuity compaction strategy#306
nhicks00 wants to merge 33 commits intompfaffenberger:mainfrom
nhicks00:continuity-compaction

Conversation

@nhicks00
Copy link
Copy Markdown
Collaborator

@nhicks00 nhicks00 commented Apr 24, 2026

Summary

Makes compaction_strategy=continuity the default compaction mode and adds the Continuity strategy for preserving long-session working context through predictive triggers, deterministic observation masking, durable task-scoped memory, archive retrieval hints, fallback summarization, recent raw-tail protection, and target trimming.

The existing truncation and summarization strategies remain available for users who prefer the legacy behavior.

Structure

Continuity is implemented as a built-in plugin under code_puppy/plugins/continuity_compaction/.

The core changes are limited to generic plugin extension points and their invocations:

  • register_config_keys: lets plugins expose config keys in /set help.
  • register_compaction_strategies: lets plugins register strategy names such as continuity.
  • compact_message_history: lets a plugin handle message-history compaction and return rebuilt history plus dropped-message bookkeeping.
  • /compact now routes through the unified compaction entrypoint with force=True, so plugin strategies can handle manual compaction without command-specific core logic.

Why Continuity

Practical Scenario Truncation Risk Summarization Risk Continuity Solves It By
A long session starts with OAuth work, then later switches to dashboard work. Early OAuth goals and constraints can be deleted completely. Summaries can blur task boundaries and make old constraints look current. Keeping original root, active task, task ledger, and task-scoped constraints separately.
The agent reads huge files and long test logs many times. Old observations vanish, including useful failures. Large outputs become prose that may omit exact tool/status/archive details. Archiving bulky raw observations locally and replacing them with deterministic capsules.
A later bug depends on an old failed test or invalidated hypothesis. The key failure may be outside the retained tail. Stale hypotheses can survive as vague summary text. Tracking validation status, accepted decisions, invalidated hypotheses, and archive hints.
The session runs through many compactions. Repeated hard cuts erase session roots. Repeated summaries can compound drift. Refreshing one bounded durable memory snapshot while preserving recent raw context separately.
The next turn is likely to be large. Compaction can happen too late. Same threshold problem unless manually compacted. Predicting next-turn growth and compacting before the projected turn crosses the soft trigger.
The user needs to tune behavior. Mostly global threshold/protected-token controls. Mostly summarization model/settings. Exposing plugin-owned trigger, target, raw-tail, archive, retention, timeout, and semantic-model knobs.

What Changed

  • Defaults get_compaction_strategy() to continuity when no strategy is configured or an invalid strategy is provided.
  • Adds code_puppy/plugins/continuity_compaction/ with:
    • percentage-scaled trigger settings per model context window
    • predictive compaction based on recent growth history
    • default post-compaction target of 35% full context
    • final trim protection for the newest raw tail, using continuity_compaction_recent_raw_floor_ratio (20% by default)
    • deterministic archiving and masking of old bulky tool-return observations
    • durable memory snapshots for active task, task ledger, constraints, decisions, validation state, active files, next action, and archive hints
    • optional semantic memory update using continuity_compaction_semantic_model, defaulting to the active chat model when unset and falling back to summarization_model only for non-agent utility calls
    • archive metadata indexing, search, retrieval snippets, retention cleanup, and schema v1-to-v2 migration
  • Registers /continuity as a plugin command for memory status, task ledger, archive search/show, and diagnostics.
  • Adds live comparison tooling and docs for the practical compaction evaluation.

Before / After Model

Before compaction, a long session can contain the current task, old completed task work, repeated file reads, large test output, tool-return logs, and the latest raw conversation tail all mixed together.

After Continuity compaction, the recent raw tail stays intact, old bulky tool returns are replaced in place with short deterministic capsules, the raw logs are archived locally, and one compact durable-memory snapshot is injected near the front of the rebuilt history. If masking still cannot hit the 35% target, only the oldest already-masked region is summarized while preserving a visible recent archive capsule when practical; if the transcript is still above target, older compacted history is trimmed while preserving the latest user request, current error context, and newest raw tail.

User Impact

Continuity is now the default compaction strategy. Users can still set it explicitly with:

/set compaction_strategy=continuity

Legacy strategies remain available:

/set compaction_strategy=truncation
/set compaction_strategy=summarization

Useful related knobs include:

/set continuity_compaction_semantic_model=gpt-5.4
/set continuity_compaction_semantic_timeout_seconds=60
/set continuity_compaction_soft_trigger_ratio=0.825
/set continuity_compaction_predictive_trigger_min_ratio=0.725
/set continuity_compaction_target_ratio=0.35
/set continuity_compaction_recent_raw_floor_ratio=0.20
/set continuity_compaction_emergency_trigger_ratio=0.90
/set continuity_compaction_archive_retention_days=30
/set continuity_compaction_archive_retention_count=500

continuity_compaction_semantic_model controls the semantic memory LLM call. If unset, Continuity uses the active chat model from the current Code Puppy session. If no active model is available for a direct utility call, it falls back to summarization_model. Fallback summarization uses Code Puppy's existing summarization path, so it is still controlled by summarization_model.

Validation

  • Focused pytest suite for Continuity, config, compaction routing, and related command coverage
    • 329 passed
  • uv run ruff check ...
    • passed locally
  • uv run ruff format --check ...
    • passed locally
  • Broad uv run pytest --no-cov -q
    • reached 9802 passed, 87 skipped, 1 xpassed before manual interrupt after the suite stalled late in an unrelated file-operations area

Provider Compatibility Note

The ChatGPT OAuth/Codex stream reconstruction fix is intentionally isolated from Continuity. It lives only in chatgpt_codex_client.py plus its focused test because that provider can stream text deltas and then finish with response.completed.output=[], which appears to pydantic-ai as ModelResponse(parts=[]). Continuity simply uses the active model through the normal model factory; it does not import ChatGPT OAuth code or depend on this provider patch.

For forks that do not include ChatGPT OAuth, such as a Walmart fork without that provider, this patch can be omitted while still porting the Continuity plugin and the generic compaction callback hooks.

Why Any Core Code Changed

Continuity's implementation is plugin-owned, but Code Puppy did not previously expose a compaction-strategy plugin lifecycle. The small core diff adds generic extension plumbing rather than Continuity-specific behavior:

  • callbacks.py adds hooks for plugin config keys, plugin compaction strategy registration, and plugin-owned message-history compaction.
  • agents/_compaction.py invokes the compaction hook before falling back to built-in truncation/summarization.
  • config.py accepts plugin-registered strategy names and defaults to continuity.
  • /compact, /show, and /set use the generic strategy/config discovery path so plugin strategies are usable and visible from the CLI.

Without those generic hooks, the plugin could register files and commands, but it could not become a selectable compaction strategy or participate in automatic/manual compaction.

Reviewer Notes

  • The legacy truncation and summarization strategies remain available through /set compaction_strategy=....
  • Local observation archives stay under Code Puppy's data directory and are bounded by retention settings.
  • A follow-up may be useful if maintainers want archived observation lookup to survive restored histories across fresh agent IDs.

@mpfaffenberger
Copy link
Copy Markdown
Owner

I love the concept and I'd love to have this feature, but structurally this P/R is not mergeable. This needs to be created as 100% a plugin.

Your agent wrote a justification for why it shouldn't be a plugin, but that contradicts everything that's in our AGENTS.md.

For a feature like this, the only feasible way to create changes in core is to add lifecycle hooks.

I would love to have this feature, so if you can propose a set of lifecycle hooks to add in callbacks.py (and corresponding invocations), that will allow this P/R to fit into the guidelines, we can move forward. Otherwise we can close this P/R.

@nhicks00 nhicks00 force-pushed the continuity-compaction branch from 0b64051 to e194e62 Compare April 30, 2026 02:10
@nhicks00 nhicks00 force-pushed the continuity-compaction branch from 314aab0 to 6c54f3f Compare April 30, 2026 03:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants