Add VerificationLoop capability by DouweM · Pull Request #169 · pydantic/pydantic-ai-harness

DouweM · 2026-04-10T01:01:59Z

Summary

Adds a VerificationLoop capability that runs configurable verification checks after agent completion and retries with failure feedback on failure
Three new public types: VerificationLoop, Verifier, VerificationResult
Uses wrap_run hook to orchestrate the verify-fix-retry loop with ctx.agent.run() for retries, passing accumulated message history plus structured failure feedback
Configurable max_retries (default 3); emits UserWarning if all retries exhausted

Test plan

15 tests covering all code paths (100% coverage)
Unit tests for _build_feedback and _run_verifiers helpers
Integration tests with TestModel: pass on first try, retry then pass, max retries exceeded, partial failures, no verifiers, feedback content verification, final-check-after-loop pass
ruff check and ruff format pass
pyright strict mode passes on both src/ and tests/
coverage report shows 100% across all files

Closes #79

🤖 Generated with Claude Code

Implements a capability that runs configurable verification checks (e.g. lint, test, build) after agent completion and automatically retries with failure feedback if any check fails, up to a configurable maximum number of retries. Closes #79 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add `parallel: bool = True` parameter to run verifiers concurrently via `asyncio.gather` (falls back to sequential for single verifier) - Improve retry feedback prompt to explicitly say "ONLY fix the failing checks, do not make other changes" Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

devin-ai-integration

Devin Review found 2 potential issues.

View 2 additional findings in Devin Review.

devin-ai-integration · 2026-04-10T01:04:43Z

+            self._in_retry = True
+            try:
+                result = await agent.run(
+                    feedback,
+                    message_history=result.all_messages(),
+                )
+            finally:
+                self._in_retry = False


🔴 Instance-level _in_retry flag causes verification to be silently skipped during concurrent agent runs

The _in_retry boolean is mutable shared state on the VerificationLoop instance. When the agent retries, it sets self._in_retry = True (line 157) and then await agent.run(...) which yields control. If another concurrent agent.run() call enters wrap_run on the same agent (and thus the same capability instance) while the first run is retrying, it will see _in_retry = True at line 130 and skip all verification, returning the result unchecked. After the first run finishes its retry, _in_retry is reset to False (line 164), but the damage is done — the second run silently bypassed verification entirely. Since PydanticAI agents are designed to be reusable across concurrent calls, this is a realistic scenario. A per-call token (e.g., using contextvars or checking the RunContext identity) would avoid this.

Prompt for agents

The _in_retry flag at line 112 is instance-level mutable state that is shared across all concurrent wrap_run invocations on the same VerificationLoop instance. When one run sets _in_retry = True and awaits agent.run() (yielding control), another concurrent run entering wrap_run will see _in_retry as True and skip verification entirely. To fix this, use a per-call mechanism instead of a shared boolean. Options include: 1. Use a contextvars.ContextVar to track whether the current execution context is a retry, so each async task has its own value. 2. Pass a unique run identifier through the RunContext and track which run IDs are retries in a set. 3. Use an asyncio.Lock or counter (e.g., an integer tracking nested retry depth per task) instead of a plain boolean. The fix needs to ensure that (a) retry runs on the same call chain still skip verification (to prevent infinite recursion), while (b) unrelated concurrent agent.run() calls on the same agent are not affected.

Was this helpful? React with 👍 or 👎 to provide feedback.

devin-ai-integration · 2026-04-10T01:04:44Z

+        for attempt in range(1, self.max_retries + 1):
+            failures = await self._run_verifiers()
+            if not failures:
+                return result
+
+            failure_summary = '; '.join(f'{name}: {msg}' for name, msg in failures)
+            feedback = self._build_feedback(failures, attempt)
+            logger.info(
+                'Verification failed (attempt %d/%d): %s',
+                attempt,
+                self.max_retries,
+                failure_summary,
+            )
+
+            if agent is None:  # pragma: no cover — defensive; agent is always set in practice
+                warnings.warn(
+                    'Verification failed but agent is not available on RunContext for retry. Returning last result.',
+                    stacklevel=2,
+                )
+                return result
+
+            # Mark that the next run is a retry so wrap_run passes through.
+            self._in_retry = True
+            try:
+                result = await agent.run(
+                    feedback,
+                    message_history=result.all_messages(),
+                )
+            finally:
+                self._in_retry = False
+
+        # Final verification after last retry.
+        failures = await self._run_verifiers()
+        if not failures:
+            return result
+
+        warnings.warn(
+            f'Verification still failing after {self.max_retries} retries: '
+            + '; '.join(f'{name}: {msg}' for name, msg in failures),
+            stacklevel=2,
+        )
+        return result


🚩 Verification still runs even with max_retries=0

With max_retries=0, the for loop at line 135 is range(1, 1) which is empty, so no retries happen. However, the final verification block at lines 167-176 still executes, running verifiers and potentially emitting a warning like 'Verification still failing after 0 retries'. This means max_retries=0 does not disable verification — it disables retries but still verifies once. This may be the intended behavior (verify but don't retry), but it's worth documenting since a user might expect max_retries=0 to skip verification entirely.

Was this helpful? React with 👍 or 👎 to provide feedback.

DouweM · 2026-04-10T15:06:09Z

Originally posted by @DouweM in #160 comment (PR closed due to history rewrite)

Audit vs prior art: VerificationLoop

Worth adding now:

Parallel verifier execution via asyncio.gather
Fix-only instruction in retry prompt

Follow-up opportunities:

Selective re-run, trigger on file edit events

DouweM and others added 3 commits April 2, 2026 05:35

Fix trio compatibility: restrict async tests to asyncio backend

5a41785

DouweM requested review from Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin as code owners April 10, 2026 01:02

devin-ai-integration Bot reviewed Apr 10, 2026

View reviewed changes

DouweM removed request for Kludex, adtyavrdhn, dmontagu, dsfaccini and samuelcolvin April 10, 2026 15:11

DouweM marked this pull request as draft April 10, 2026 15:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add VerificationLoop capability#169

Add VerificationLoop capability#169
DouweM wants to merge 3 commits intomainfrom
capability/verification-loop

DouweM commented Apr 10, 2026 •

edited

Loading

Uh oh!

devin-ai-integration Bot left a comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Uh oh!

DouweM commented Apr 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

DouweM commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

devin-ai-integration Bot Apr 10, 2026

Choose a reason for hiding this comment

Uh oh!

DouweM commented Apr 10, 2026

Audit vs prior art: VerificationLoop

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

DouweM commented Apr 10, 2026 •

edited

Loading