A Twilio Conversation Relay integration that uses the AG-UI protocol to communicate with backend AI systems. This allows you to build voice assistants that can work with any AG-UI compatible backend, not just specific LLM providers.
Phone Call → Twilio → ConversationRelay → WebSocket → TwilioAgent → AG-UI Protocol → Backend Agent
The key components:
- TwilioAgent - Translates between Twilio's WebSocket protocol and AG-UI events
- HttpAgent - AG-UI client for connecting to HTTP/SSE backends
- AG-UI Protocol - Standardized event-based protocol for agent communication
- ✅ Real-time voice conversations via Twilio
- ✅ Streaming responses with proper token handling
- ✅ Interrupt handling (user can interrupt the assistant mid-response)
- ✅ Configurable conversation modes (stateful/stateless)
- ✅ Support for any AG-UI compatible backend
- ✅ Clean separation between voice interface and AI logic
- ✅ Runtime schema validation with AG-UI core
- Node.js 18+
- Twilio account with a phone number
- ngrok (for exposing local server to Twilio)
- AG-UI compatible backend server
-
Clone and install dependencies:
npm install
-
Configure environment variables:
cp .env.example .env # Edit .env with your values
-
Start ngrok:
ngrok http 8080
Copy the HTTPS URL (e.g.,
https://abc123.ngrok.io
) -
Update .env with your configuration:
NGROK_URL=abc123.ngrok.io # Just the domain, no https:// AGUI_BACKEND_URL=http://localhost:3000/chat # Your AG-UI backend # AGUI_API_KEY=your-api-key # Optional authorization # STATEFUL=false # Optional: send only current message instead of full history
-
Start the server:
npm start
-
Configure Twilio:
- Go to your Twilio phone number settings
- Set the webhook URL to:
https://YOUR_NGROK_URL/twiml
- Set the HTTP method to: POST
-
Test it:
- Call your Twilio phone number
- Start talking after the greeting!
├── server.js # Main server entry point
├── TwilioAgent.js # Protocol translation layer
├── package.json # Dependencies
├── .env.example # Environment variables template
└── README.md # This file
The system supports two conversation modes:
Stateful Mode (Default):
- Maintains full conversation history
- Sends complete message thread with each request
- Provides full context to the backend
- Higher bandwidth usage as conversation grows
Stateless Mode:
- Sends only the current user message
- Minimal bandwidth usage
- Backend must handle context externally
Set STATEFUL=false
in your .env
file to enable stateless mode.
Your backend must accept RunAgentInput
requests and return AG-UI events via Server-Sent Events (SSE). The system uses the @ag-ui/client
package's HttpAgent
to communicate with your backend.
Twilio sends three types of messages:
setup
- Initialize the session with a call IDprompt
- User's spoken textinterrupt
- User interrupted the assistant
Twilio expects responses in this format:
{
type: "text",
token: "Hello ", // Incremental text
last: false // true when message is complete
}
The AG-UI protocol uses events like:
TEXT_MESSAGE_START
- Beginning of assistant responseTEXT_MESSAGE_CONTENT
- Incremental content (delta)TEXT_MESSAGE_END
- End of responseTOOL_CALL_*
- Tool/function calling eventsSTATE_*
- State management events
The TwilioAgent
handles the translation:
// AG-UI Event → Twilio Token
TEXT_MESSAGE_CONTENT: { delta: "Hello " } → { type: "text", token: "Hello ", last: false }
TEXT_MESSAGE_END: {} → { type: "text", token: "", last: true }
The system supports AG-UI tool calling events. You could announce tool usage to the user:
case "TOOL_CALL_START":
ws.send(JSON.stringify({
type: "text",
token: `Let me check that for you... `,
last: false
}));
break;
The AG-UI protocol supports state snapshots and deltas, allowing you to maintain conversation context across calls or implement more complex conversation flows.
You can handle custom AG-UI events to implement domain-specific features.
npm run dev # Uses node --watch for auto-reload
Set LOG_LEVEL=debug
in your .env
file for detailed logging.
Key configuration options in .env
:
NGROK_URL
- Your ngrok domain (without https://)AGUI_BACKEND_URL
- Your AG-UI backend endpointAGUI_API_KEY
- Optional authorization tokenSTATEFUL
- Conversation mode (true/false)LOG_LEVEL
- Logging verbosity (info/debug)
- Check that ngrok is running and the URL is correct in
.env
- Ensure the server is running
- Check your AG-UI backend is running and accessible
- Verify AGUI_BACKEND_URL is correct
- Look at server logs for errors
- Test your backend endpoint directly
- This is normal if the assistant is speaking very quickly
- The system tracks what was actually spoken via
utteranceUntilInterrupt
MIT
Contributions welcome! This is a reference implementation showing how to integrate Twilio with the AG-UI protocol. Feel free to:
- Add support for more AG-UI events
- Implement additional backends
- Improve error handling
- Add tests
- Built on Fastify 5.x with WebSocket support
- Uses @ag-ui/client for backend communication
- Runtime schema validation with @ag-ui/core
- Each phone call gets isolated agent instance
- Proper interrupt handling with abort controller management
- Real-time streaming responses with conversation tracking