Skip to content

Conversation

jdecker76
Copy link
Contributor

refs #289

Adds comprehensive MCP server polling and dynamic agent management

  • Fast Agent will no longer crash when it tries to load an agent where a dependent MCP Server is not available
  • New polling system can poll MCP servers and activate/deactivate agents dynamically based on MCP server availability (polling is off by default)
  • Introduces new FastAgent parameter mcp_polling_interval, which defaults to None (I.e. polling disabled by default)
  • Implement simple agent status management when polling is enabled:
    • Deactivate agents when their servers go offline during runtime
    • Reactivate agents when their servers come back online
  • Updated progress display:
    • Show the MCP Server Polling in the progress display
    • Show "Deactivated" status when polling is disabled
    • Show "Running" during active server health checks
    • Show "Ready" between cycles
  • Give users full control over polling frequency vs performance trade-offs - they can set choose whether to enable polling, and use an interval that works best for their situation

This provides comprehensive MCP server monitoring with automatic agent lifecycle
management while maintaining zero performance impact by default.

If this is considered for inclusion in Fast Agent, I will update the docs repo

don't register the same agent multiple times
Disable human input when nont in interactive mode
and undo the input suppression in server mode
Added Visible INFO Logging for Agent Deactivation
Fixed Animated Dots for DEACTIVATED Status
Hide MCP Server Entries from Progress Display
Fixed Background Polling for Agent Reactivation
fix indentation issues
fix import - i think this is why the Agent is not being reactivated once the mcp server comes back online
dynamic handling is now working, and the progress display is nice
Adds comprehensive MCP server polling and dynamic agent management

- Change mcp_polling_interval default from 60s to None (no polling by default)
- Add proactive monitoring of ALL MCP servers during polling cycles
- Implement simple agent status management:
  * Deactivate agents when their servers go offline during runtime
  * Reactivate agents when their servers come back online
- Use direct server connectivity testing via temporary connections
- Add 3-retry logic with exponential backoff for tool calls
- Update progress display:
  * Show the MCP Server Polling in the progress display
  * Show "Deactivated" status when polling is disabled
  * Show "Running" during active server health checks
  * Show "Ready" between cycles with server status details
- Give users full control over polling frequency vs performance trade-offs - they can set an interval that works best for their situation

This provides comprehensive server monitoring with automatic agent lifecycle
management while maintaining zero performance impact by default.
Restore the cli handling that was accidentally deleted
Use mcp_polling_ferquency as a switch to also enable/disable automatic disabling of agents at startup when the mcp servers it need are not available.  This preserves the existing behavior, since mcp_polling_frequency defaults to None

The failing test should now pass
This should fix the broken tests
@jdecker76
Copy link
Contributor Author

jdecker76 commented Jul 16, 2025

Here you can see that the MCP Server Polling is in Ready state, and the mcp_test agent is disabled (due to the SSE MCP Server not being available)
image

When the mcp_polling_interval is reached, you can see that the MCP Server Polling status changs to Running.
image

For this example, I started the SSE server. The next polling interval, the MCP server was able to connect, and the mcp_test agent was reactivated (Loaded state)
image

If mcp_polling_interval is not set (or 0), then the polling system is disabled (this is the default). The progress display will show Deactivated for the status of MCP Server Polling
image

Likewise, if an agent is Loaded and an MCP server goes offline, then the next polling interval the agent will be deactivated again.

This change is 100% backwards compatible, as the new mcp_polling_interval parameter defaults to None, and the existing current behavior is the same. The user can choose to give the mcp_polling_interval a value, after which the new polling and agent registration/deregistration takes effect. Additionally, when in polling mode, Fast Agent will not crash on startup if an agent has a broken MCP server connection.

@evalstate
Copy link
Owner

Hi @jdecker76 this looks good at first pass, and I think intersects with some improvements I think are necessary for any Production level client dealing with Streamable HTTP.

Below is my list of connection "things" that I wanted to get covered somehow:

MCP Server Connection Handling.
 - Configure Client to Server ping interval and timeout handling
 - Show most recent inbound/outbound Server communication (MCP)
 - Show most recent inbound/outbound Server communication (Ping)

STDIO
 - Attempt restart on unexpected termination (one time only) 

Remote (SSE/SHTTP)
 - Identify/Display whether a Streamable HTTP Server is in "Server Push" mode
 - Identify/Display whether a remote Server has assigned a SessionID
 - Session resumption - configure whether 404 on reconnect is a "failure" or attempt new Session
 - Identify HTTP Connection health for "Server Push" connections

Some of this is driven from: https://huggingface.co/blog/building-hf-mcp

Q - What does it mean for an agent to be "Deactivated". It can't meaningfully participate in workflows, so without a queueing/resume system (perhaps A2A participation) I'm not sure what the desired behaviour is?

@jdecker76
Copy link
Contributor Author

Sorry for the late reply, just got back from vacation

To answer your question above, in the context of this PR - a "Deactivated" agent is not available for use, but is registered with fast-agent. Once all MCP servers are available, it becomes active and available for use.

I just added PR 342 for SSE reconnection as an alternative that is less invasive, but still not perfect (I.e. fast-agent still aborts at startup if an MCP server is not available, though it does allow SSE MCP server reconnect once it's up and running)

I'm open to suggestions for a more thorough solution, but I need something in place for my deployments. For example, without this PR (or PR 342), if I redeploy my MCP servers then I absolutely must restart my agents. This sounds trivial, but this involves some down time, plus there is the fact that my deployments are serverless on AWS ECS/Fargate - which adds quite a bit of complexity to the situation. WIth one of these PRs, I can publish my MCP server changes and my agents reconnect gracefully. I think this is very important for production systems, whether it's one of these solutions or another solution that fits the project better.

When you get a chance, lets discuss this so I can better understand your vision in this area - I'm eager to use fast-agent in production (not a large project by any means, but with ~800 active users it could be a support nightmare for our small team if we don't solve these types of issues)

@evalstate
Copy link
Owner

Agree this is an important feature; quick question - is there a way to flag this on/off? iirc the progress display always showed the new watchdog is that right? and would you be able to help with the reqs/implementation on the mcp handling on the 0.3.0 branch? i think we should have good options there too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants