Add AdaptiveReasoning capability by DouweM · Pull Request #174 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:02:15Z

Summary

Adds AdaptiveReasoning capability that dynamically adjusts model thinking effort per step using get_model_settings() with a callable
Built-in heuristic: high effort on first step and after tool errors, low effort on simple follow-ups
Supports custom effort_fn(RunContext) -> Literal['low', 'medium', 'high'] for domain-specific logic
Exported from pydantic_harness package

Test plan

18 tests covering _has_tool_errors helper, default_effort_fn heuristic, AdaptiveReasoning capability with default and custom effort functions
100% code coverage
pyright strict mode passes
ruff lint and format pass

Closes #84

🤖 Generated with Claude Code

Implements a capability that dynamically adjusts model thinking effort based on task complexity signals via `get_model_settings()` returning a callable. Built-in heuristic uses high effort on first step and after tool errors, low effort on simple follow-ups. Supports custom effort functions for domain-specific logic. Closes #84 Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

- Step 2 now returns medium (was incorrectly going straight to low) - Step 3+ returns low as intended - Removed unreachable default branch - Added many-tool-calls signal: if the last ModelResponse had 3+ ToolCallParts, use medium effort regardless of step number Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>

devin-ai-integration

Devin Review found 2 potential issues.

View 3 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:05:26Z

+    """Built-in heuristic effort selector.
+
+    Rules (evaluated in order):
+    1. First step (``run_step == 1``): ``'high'`` -- understand the task.


🟡 Docstring says run_step == 1 but code uses run_step <= 1

The docstring for default_effort_fn documents rule 1 as "First step (run_step == 1)" but the actual implementation at line 68 uses ctx.run_step <= 1, which also matches run_step == 0. The test test_step_zero_high (tests/test_adaptive_reasoning.py:152-154) confirms that run_step=0 returns 'high', matching the code but not the docstring. The PLAN.md:38 correctly documents this as run_step <= 1.

Suggested change

1. First step (``run_step == 1``): ``'high'`` -- understand the task.

1. First step (``run_step <= 1``): ``'high'`` -- understand the task.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T01:05:28Z

+@dataclass
+class AdaptiveReasoning(AbstractCapability[Any]):
+    """Dynamically adjusts model thinking effort per step.
+
+    By default a built-in heuristic is used:
+
+    * **First step** -> ``'high'`` (understand the task)
+    * **After tool errors** -> ``'high'`` (reason about what went wrong)
+    * **Simple follow-ups** -> ``'low'`` (just incorporating tool results)
+
+    Supply a custom ``effort_fn`` to override these rules::
+
+        def my_effort(ctx: RunContext[MyDeps]) -> Literal['low', 'medium', 'high']:
+            if ctx.run_step > 5:
+                return 'high'  # wrap-up needs careful thought
+            return 'medium'
+
+        agent = Agent(..., capabilities=[AdaptiveReasoning(effort_fn=my_effort)])
+    """
+
+    effort_fn: Callable[[RunContext[Any]], Literal['low', 'medium', 'high']] = field(default=default_effort_fn)
+    """Callable that receives the current ``RunContext`` and returns an effort level."""
+
+    def get_model_settings(self) -> Callable[[RunContext[Any]], ModelSettings]:
+        """Return a dynamic model-settings callable that sets ``thinking`` per step."""
+
+        def _resolve(ctx: RunContext[Any]) -> ModelSettings:
+            effort = self.effort_fn(ctx)
+            return ModelSettings(thinking=_EFFORT_TO_THINKING[effort])
+
+        return _resolve


🚩 Capability only overrides get_model_settings, no other hooks

The AdaptiveReasoning class only implements get_model_settings() from the AbstractCapability interface. Without being able to inspect the AbstractCapability source (the pydantic-ai-slim package isn't installed in the review environment), I could not verify whether there are other required or recommended methods (e.g., get_system_prompt, get_tools). The tests do verify isinstance(cap, AbstractCapability) passes, and the tests presumably run in CI, so this is likely fine.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:06:39Z

Originally posted by @DouweM in #155 comment (PR was recreated)

Audit vs prior art: AdaptiveReasoning

Worth adding now:

Medium effort for step 2 (currently jumps straight from high to low)
Tool-count signal: 3+ tool calls in last response -> bump effort
Long tool output signal: >2000 chars -> medium effort

Follow-up opportunities:

Model-routed effort prediction, phase-based effort

DouweM · 2026-04-10T15:06:40Z

Originally posted by @DouweM in #155 comment (PR was recreated)

Audit vs prior art: AdaptiveReasoning

Worth adding now:

Medium effort for step 2 (currently jumps from high to low)
Tool-count signal: 3+ tool calls -> bump effort
Long tool output signal: >2000 chars -> medium

Follow-up opportunities:

Model-routed effort prediction, phase-based effort

DouweM and others added 2 commits April 2, 2026 05:30

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:11

DouweM marked this pull request as draft April 10, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add AdaptiveReasoning capability#174

Add AdaptiveReasoning capability#174
DouweM wants to merge 2 commits intomainfrom
capability/adaptive-reasoning

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	1. First step (``run_step == 1``): ``'high'`` -- understand the task.
	1. First step (``run_step <= 1``): ``'high'`` -- understand the task.

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: AdaptiveReasoning

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: AdaptiveReasoning

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DouweM commented Apr 10, 2026 •

edited

Loading