Skip to content

Add VerificationLoop capability#169

Draft
DouweM wants to merge 3 commits intomainfrom
capability/verification-loop
Draft

Add VerificationLoop capability#169
DouweM wants to merge 3 commits intomainfrom
capability/verification-loop

Conversation

@DouweM
Copy link
Copy Markdown
Contributor

@DouweM DouweM commented Apr 10, 2026

Summary

  • Adds a VerificationLoop capability that runs configurable verification checks after agent completion and retries with failure feedback on failure
  • Three new public types: VerificationLoop, Verifier, VerificationResult
  • Uses wrap_run hook to orchestrate the verify-fix-retry loop with ctx.agent.run() for retries, passing accumulated message history plus structured failure feedback
  • Configurable max_retries (default 3); emits UserWarning if all retries exhausted

Test plan

  • 15 tests covering all code paths (100% coverage)
  • Unit tests for _build_feedback and _run_verifiers helpers
  • Integration tests with TestModel: pass on first try, retry then pass, max retries exceeded, partial failures, no verifiers, feedback content verification, final-check-after-loop pass
  • ruff check and ruff format pass
  • pyright strict mode passes on both src/ and tests/
  • coverage report shows 100% across all files

Closes #79

🤖 Generated with Claude Code

DouweM and others added 3 commits April 2, 2026 05:35
Implements a capability that runs configurable verification checks
(e.g. lint, test, build) after agent completion and automatically
retries with failure feedback if any check fails, up to a configurable
maximum number of retries.

Closes #79

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add `parallel: bool = True` parameter to run verifiers concurrently
  via `asyncio.gather` (falls back to sequential for single verifier)
- Improve retry feedback prompt to explicitly say "ONLY fix the failing
  checks, do not make other changes"

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 potential issues.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +157 to +164
self._in_retry = True
try:
result = await agent.run(
feedback,
message_history=result.all_messages(),
)
finally:
self._in_retry = False
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Instance-level _in_retry flag causes verification to be silently skipped during concurrent agent runs

The _in_retry boolean is mutable shared state on the VerificationLoop instance. When the agent retries, it sets self._in_retry = True (line 157) and then await agent.run(...) which yields control. If another concurrent agent.run() call enters wrap_run on the same agent (and thus the same capability instance) while the first run is retrying, it will see _in_retry = True at line 130 and skip all verification, returning the result unchecked. After the first run finishes its retry, _in_retry is reset to False (line 164), but the damage is done — the second run silently bypassed verification entirely. Since PydanticAI agents are designed to be reusable across concurrent calls, this is a realistic scenario. A per-call token (e.g., using contextvars or checking the RunContext identity) would avoid this.

Prompt for agents
The _in_retry flag at line 112 is instance-level mutable state that is shared across all concurrent wrap_run invocations on the same VerificationLoop instance. When one run sets _in_retry = True and awaits agent.run() (yielding control), another concurrent run entering wrap_run will see _in_retry as True and skip verification entirely.

To fix this, use a per-call mechanism instead of a shared boolean. Options include:
1. Use a contextvars.ContextVar to track whether the current execution context is a retry, so each async task has its own value.
2. Pass a unique run identifier through the RunContext and track which run IDs are retries in a set.
3. Use an asyncio.Lock or counter (e.g., an integer tracking nested retry depth per task) instead of a plain boolean.

The fix needs to ensure that (a) retry runs on the same call chain still skip verification (to prevent infinite recursion), while (b) unrelated concurrent agent.run() calls on the same agent are not affected.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Comment on lines +135 to +176
for attempt in range(1, self.max_retries + 1):
failures = await self._run_verifiers()
if not failures:
return result

failure_summary = '; '.join(f'{name}: {msg}' for name, msg in failures)
feedback = self._build_feedback(failures, attempt)
logger.info(
'Verification failed (attempt %d/%d): %s',
attempt,
self.max_retries,
failure_summary,
)

if agent is None: # pragma: no cover — defensive; agent is always set in practice
warnings.warn(
'Verification failed but agent is not available on RunContext for retry. Returning last result.',
stacklevel=2,
)
return result

# Mark that the next run is a retry so wrap_run passes through.
self._in_retry = True
try:
result = await agent.run(
feedback,
message_history=result.all_messages(),
)
finally:
self._in_retry = False

# Final verification after last retry.
failures = await self._run_verifiers()
if not failures:
return result

warnings.warn(
f'Verification still failing after {self.max_retries} retries: '
+ '; '.join(f'{name}: {msg}' for name, msg in failures),
stacklevel=2,
)
return result
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Verification still runs even with max_retries=0

With max_retries=0, the for loop at line 135 is range(1, 1) which is empty, so no retries happen. However, the final verification block at lines 167-176 still executes, running verifiers and potentially emitting a warning like 'Verification still failing after 0 retries'. This means max_retries=0 does not disable verification — it disables retries but still verifies once. This may be the intended behavior (verify but don't retry), but it's worth documenting since a user might expect max_retries=0 to skip verification entirely.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

@DouweM
Copy link
Copy Markdown
Contributor Author

DouweM commented Apr 10, 2026

Originally posted by @DouweM in #160 comment (PR closed due to history rewrite)

Audit vs prior art: VerificationLoop

Worth adding now:

  • Parallel verifier execution via asyncio.gather
  • Fix-only instruction in retry prompt

Follow-up opportunities:

  • Selective re-run, trigger on file edit events

@DouweM DouweM marked this pull request as draft April 10, 2026 15:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Verification Loop capability (test-lint-build → fix → repeat)

1 participant