Skip to content

feat(vmcp): add optional backend keepalive#4054

Draft
yrobla wants to merge 1 commit intomainfrom
issue-3870
Draft

feat(vmcp): add optional backend keepalive#4054
yrobla wants to merge 1 commit intomainfrom
issue-3870

Conversation

@yrobla
Copy link
Contributor

@yrobla yrobla commented Mar 9, 2026

Closes: #3870

Summary

Introduce per-backend keepalive probing with a circuit breaker to detect
and survive backend connection failures without terminating the vMCP session.

Fixes #3870

Type of change

  • Bug fix
  • New feature
  • Refactoring (no behavior change)
  • Dependency update
  • Documentation
  • Other (describe):

Test plan

  • Unit tests (`task test`)
  • E2E tests (`task test-e2e`)
  • Linting (`task lint-fix`)
  • Manual testing (describe below)

Changes

pkg/vmcp/types.go — adds KeepaliveMethod type (ping, tool, none) and KeepaliveToolName field to BackendTarget
pkg/vmcp/session/keepalive.go — new KeepaliveManager with per-backend goroutines, configurable interval/jitter, circuit breaker (disable after N failures, re-probe after quiet window), and OTel metrics (vmcp_keepalive_attempt_total, vmcp_keepalive_success_total, vmcp_keepalive_failure_total, vmcp_keepalive_latency_seconds, vmcp_keepalive_auto_disabled_total)
pkg/vmcp/session/factory.go — wires KeepaliveManager into session creation via WithKeepaliveConfig and WithMeterProvider options; keepalive starts automatically when the factory is used (which itself is only constructed when SessionManagementV2 is enabled)
pkg/vmcp/session/default_session.go — stops keepalive goroutines in Close() before closing backend connections
pkg/vmcp/session/internal/backend/session.go and mcp_session.go — adds Ping(ctx) to the Session interface, delegating to client.Ping
pkg/vmcp/session/keepalive_test.go — 11 new unit tests covering probe dispatch, circuit breaker lifecycle, and manager start/stop

Does this introduce a user-facing change?

Yes. When `SessionManagementV2` is enabled, the vMCP server now sends periodic keepalive probes to each connected backend. This is transparent to clients but affects backend traffic patterns and exposes new OTel metrics (`vmcp_keepalive_*`). The feature is inactive when `SessionManagementV2` is disabled.

@github-actions github-actions bot added the size/L Large PR: 600-999 lines changed label Mar 9, 2026
@yrobla yrobla requested a review from Copilot March 9, 2026 16:27
Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e38d4007e4

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a vMCP session-scoped keepalive subsystem intended to probe backend connections (ping or tool-based) with circuit-breaker behavior and OTel metrics, wiring it into MultiSession creation and teardown.

Changes:

  • Extends backend targeting/types with keepalive method/tool configuration.
  • Introduces a per-backend keepalive manager (goroutines, jittered interval, circuit breaker) plus OTel metrics.
  • Wires keepalive into session factory creation and ensures it is stopped during session close; adds Ping(ctx) to backend session interface and unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
pkg/vmcp/types.go Adds KeepaliveMethod enum and per-backend keepalive fields on BackendTarget.
pkg/vmcp/session/keepalive.go Implements keepalive probing, jitter scheduling, circuit breaker state, and OTel metrics; adds KeepaliveManager.
pkg/vmcp/session/keepalive_test.go Adds unit tests for probe dispatch, circuit breaker behavior, and manager start/stop.
pkg/vmcp/session/internal/backend/session.go Extends backend Session interface with Ping(ctx) for keepalive.
pkg/vmcp/session/internal/backend/mcp_session.go Implements Ping(ctx) by delegating to the MCP client.
pkg/vmcp/session/factory.go Adds factory options for keepalive config/metrics and starts keepalive manager during session creation.
pkg/vmcp/session/default_session.go Stops keepalive manager during Close() before closing backend connections.
pkg/vmcp/session/default_session_test.go Updates mocks to satisfy the new Ping(ctx) interface requirement.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@codecov
Copy link

codecov bot commented Mar 9, 2026

Codecov Report

❌ Patch coverage is 78.49462% with 40 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.52%. Comparing base (a2824e0) to head (e54d978).

Files with missing lines Patch % Lines
pkg/vmcp/session/keepalive.go 84.37% 18 Missing and 7 partials ⚠️
pkg/vmcp/session/factory.go 39.13% 13 Missing and 1 partial ⚠️
pkg/vmcp/session/internal/backend/mcp_session.go 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4054      +/-   ##
==========================================
- Coverage   68.56%   68.52%   -0.04%     
==========================================
  Files         446      447       +1     
  Lines       45574    45760     +186     
==========================================
+ Hits        31246    31359     +113     
- Misses      11914    11979      +65     
- Partials     2414     2422       +8     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@amirejaz
Copy link
Contributor

@yrobla LGTM. Once the Copilot comments are resolved, I am happy to approve.

@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 10, 2026
@yrobla
Copy link
Contributor Author

yrobla commented Mar 10, 2026

same, setting as draft for now, let's prioritize launching the code and testing it

@yrobla yrobla marked this pull request as draft March 10, 2026 15:34
Introduce per-backend keepalive probing with a circuit breaker
to detect and survive backend connection failures without terminating
the vMCP session.

Closes: #3870
@github-actions github-actions bot added size/L Large PR: 600-999 lines changed and removed size/L Large PR: 600-999 lines changed labels Mar 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/L Large PR: 600-999 lines changed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[vMCP] Implement optional backend session keepalive

4 participants