Dynamic mcp handling #291

jdecker76 · 2025-07-16T16:55:19Z

Adds comprehensive MCP server polling and dynamic agent management

Fast Agent will no longer crash when it tries to load an agent where a dependent MCP Server is not available
New polling system can poll MCP servers and activate/deactivate agents dynamically based on MCP server availability (polling is off by default)
Introduces new FastAgent parameter mcp_polling_interval, which defaults to None (I.e. polling disabled by default)
Implement simple agent status management when polling is enabled:
- Deactivate agents when their servers go offline during runtime
- Reactivate agents when their servers come back online
Updated progress display:
- Show the MCP Server Polling in the progress display
- Show "Deactivated" status when polling is disabled
- Show "Running" during active server health checks
- Show "Ready" between cycles
Give users full control over polling frequency vs performance trade-offs - they can set choose whether to enable polling, and use an interval that works best for their situation

This provides comprehensive MCP server monitoring with automatic agent lifecycle
management while maintaining zero performance impact by default.

If this is considered for inclusion in Fast Agent, I will update the docs repo

don't register the same agent multiple times

Also disabled human input callback

Disable human input when nont in interactive mode

and undo the input suppression in server mode

Added Visible INFO Logging for Agent Deactivation Fixed Animated Dots for DEACTIVATED Status Hide MCP Server Entries from Progress Display Fixed Background Polling for Agent Reactivation

fix indentation issues

fix import - i think this is why the Agent is not being reactivated once the mcp server comes back online

dynamic handling is now working, and the progress display is nice

Adds comprehensive MCP server polling and dynamic agent management - Change mcp_polling_interval default from 60s to None (no polling by default) - Add proactive monitoring of ALL MCP servers during polling cycles - Implement simple agent status management: * Deactivate agents when their servers go offline during runtime * Reactivate agents when their servers come back online - Use direct server connectivity testing via temporary connections - Add 3-retry logic with exponential backoff for tool calls - Update progress display: * Show the MCP Server Polling in the progress display * Show "Deactivated" status when polling is disabled * Show "Running" during active server health checks * Show "Ready" between cycles with server status details - Give users full control over polling frequency vs performance trade-offs - they can set an interval that works best for their situation This provides comprehensive server monitoring with automatic agent lifecycle management while maintaining zero performance impact by default.

Restore the cli handling that was accidentally deleted

Use mcp_polling_ferquency as a switch to also enable/disable automatic disabling of agents at startup when the mcp servers it need are not available. This preserves the existing behavior, since mcp_polling_frequency defaults to None The failing test should now pass

This should fix the broken tests

jdecker76 · 2025-07-16T20:37:40Z

Here you can see that the MCP Server Polling is in Ready state, and the mcp_test agent is disabled (due to the SSE MCP Server not being available)

When the mcp_polling_interval is reached, you can see that the MCP Server Polling status changs to Running.

For this example, I started the SSE server. The next polling interval, the MCP server was able to connect, and the mcp_test agent was reactivated (Loaded state)

If mcp_polling_interval is not set (or 0), then the polling system is disabled (this is the default). The progress display will show Deactivated for the status of MCP Server Polling

Likewise, if an agent is Loaded and an MCP server goes offline, then the next polling interval the agent will be deactivated again.

This change is 100% backwards compatible, as the new mcp_polling_interval parameter defaults to None, and the existing current behavior is the same. The user can choose to give the mcp_polling_interval a value, after which the new polling and agent registration/deregistration takes effect. Additionally, when in polling mode, Fast Agent will not crash on startup if an agent has a broken MCP server connection.

evalstate · 2025-07-20T16:01:27Z

Hi @jdecker76 this looks good at first pass, and I think intersects with some improvements I think are necessary for any Production level client dealing with Streamable HTTP.

Below is my list of connection "things" that I wanted to get covered somehow:

MCP Server Connection Handling.
 - Configure Client to Server ping interval and timeout handling
 - Show most recent inbound/outbound Server communication (MCP)
 - Show most recent inbound/outbound Server communication (Ping)

STDIO
 - Attempt restart on unexpected termination (one time only) 

Remote (SSE/SHTTP)
 - Identify/Display whether a Streamable HTTP Server is in "Server Push" mode
 - Identify/Display whether a remote Server has assigned a SessionID
 - Session resumption - configure whether 404 on reconnect is a "failure" or attempt new Session
 - Identify HTTP Connection health for "Server Push" connections

Some of this is driven from: https://huggingface.co/blog/building-hf-mcp

Q - What does it mean for an agent to be "Deactivated". It can't meaningfully participate in workflows, so without a queueing/resume system (perhaps A2A participation) I'm not sure what the desired behaviour is?

jdecker76 · 2025-08-14T09:54:03Z

Sorry for the late reply, just got back from vacation

To answer your question above, in the context of this PR - a "Deactivated" agent is not available for use, but is registered with fast-agent. Once all MCP servers are available, it becomes active and available for use.

I just added PR 342 for SSE reconnection as an alternative that is less invasive, but still not perfect (I.e. fast-agent still aborts at startup if an MCP server is not available, though it does allow SSE MCP server reconnect once it's up and running)

I'm open to suggestions for a more thorough solution, but I need something in place for my deployments. For example, without this PR (or PR 342), if I redeploy my MCP servers then I absolutely must restart my agents. This sounds trivial, but this involves some down time, plus there is the fact that my deployments are serverless on AWS ECS/Fargate - which adds quite a bit of complexity to the situation. WIth one of these PRs, I can publish my MCP server changes and my agents reconnect gracefully. I think this is very important for production systems, whether it's one of these solutions or another solution that fits the project better.

When you get a chance, lets discuss this so I can better understand your vision in this area - I'm eager to use fast-agent in production (not a large project by any means, but with ~800 active users it could be a support nightmare for our small team if we don't solve these types of issues)

evalstate · 2025-08-19T19:28:25Z

Agree this is an important feature; quick question - is there a way to flag this on/off? iirc the progress display always showed the new watchdog is that right? and would you be able to help with the reqs/implementation on the mcp handling on the 0.3.0 branch? i think we should have good options there too.

jondecker76 added 24 commits July 16, 2025 05:18

Fix startup errors when MCP servers are not available

a6776ca

Update direct_decorators.py

281086d

don't register the same agent multiple times

Make logging of agent availability not appear to be an error

a589d04

Also disabled human input callback

Fix status display output

0a8b431

Update fastagent.py

6c3e9af

Disable human input when nont in interactive mode

Handle the Exception Group

5dc5a68

and undo the input suppression in server mode

Error handling improvements

4f959a7

Try a synchronous connection so that the error can be caught easier

c3269a0

More exception handling

95c7544

Error handling fixes, progress display improvements

7aba3a7

More fixes for startup without MCP

e8e58d4

Added Visible INFO Logging for Agent Deactivation Fixed Animated Dots for DEACTIVATED Status Hide MCP Server Entries from Progress Display Fixed Background Polling for Agent Reactivation

Add debug logging to trace down remaining issues

d21c44a

Clean up logging, improve server detection

78103e9

Update fastagent.py

597d522

fix indentation issues

Server polling fix

2ac4cd5

Update fastagent.py

cd91ced

fix import - i think this is why the Agent is not being reactivated once the mcp server comes back online

Clean up progress display

b5de3bc

Fix logging issues

2f00f72

dynamic handling is now working, and the progress display is nice

Add configurable polling interval

ff2c4d6

fix linting errors

83714d2

Update fastagent.py

046304d

Restore the cli handling that was accidentally deleted

Fix import

747291b

This should fix the broken tests

Merge branch 'main' into pr/jdecker76/291

5944e08

evalstate mentioned this pull request Aug 14, 2025

Implement watchdog to capture session termination and gracefully exist from cleint session. #337

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Dynamic mcp handling #291

Dynamic mcp handling #291

Uh oh!

jdecker76 commented Jul 16, 2025

Uh oh!

jdecker76 commented Jul 16, 2025 •

edited

Loading

Uh oh!

evalstate commented Jul 20, 2025

Uh oh!

jdecker76 commented Aug 14, 2025

Uh oh!

evalstate commented Aug 19, 2025

Uh oh!

Uh oh!

Dynamic mcp handling #291

Are you sure you want to change the base?

Dynamic mcp handling #291

Uh oh!

Conversation

jdecker76 commented Jul 16, 2025

Uh oh!

jdecker76 commented Jul 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

evalstate commented Jul 20, 2025

Uh oh!

jdecker76 commented Aug 14, 2025

Uh oh!

evalstate commented Aug 19, 2025

Uh oh!

Uh oh!

jdecker76 commented Jul 16, 2025 •

edited

Loading