Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions docs/additional-features/fastapi-integration.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ Optionally, you can specify following parameters:
- enable_agui (default: `False`) - Enable AG-UI protocol compatibility for streaming endpoints
- enable_logging (default: `False`) - Enable request tracking and expose `/get_logs` endpoint
- logs_dir (default: `"activity-logs"`) - Directory for log files when logging is enabled
- enable_realtime (default: `False`) - Mount `/your_agency/realtime` websocket routes in addition to the REST endpoints
- realtime_options (default: `{}`) - Optional overrides for the realtime bridge (`model`, `turn_detection`, `input_audio_format`, etc.)

This will create 4 endpoints for the agency:
- `/test_agency/get_response`
Expand Down Expand Up @@ -112,6 +114,8 @@ run_fastapi(
"test_agency_2": create_agency_2,
},
tools=[example_tool, test_tool],
enable_realtime=True,
realtime_options={"model": "gpt-realtime-mini"},
)
```

Expand All @@ -126,6 +130,7 @@ This will create the following endpoints:
- `/test_agency_2/get_metadata`
- `/tool/ExampleTool` (for BaseTools) or `/tool/example_tool` (for function tools)
- `/tool/TestTool` (for BaseTools) or `/tool/test_tool` (for function tools)
- `/test_agency_1/realtime` and `/test_agency_2/realtime` websocket routes when `enable_realtime=True`

If `enable_logging=True`, a `/get_logs` endpoint is also added.

Expand Down Expand Up @@ -185,6 +190,10 @@ print("Response:", tool_response.json())
When `enable_logging=True`:
- `/get_logs` (POST)

- **Realtime Websocket Endpoint:**
When `enable_realtime=True`:
- `/your_agency_name/realtime` (WebSocket)

- **AG-UI Protocol:**
When `enable_agui=True`, only the streaming endpoint is exposed and follows the AG-UI protocol for enhanced frontend integration. The cancel endpoint is not registered in AG-UI mode.

Expand Down
91 changes: 91 additions & 0 deletions docs/additional-features/voice-agents/deployment.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
---
title: "Deployment"
description: "Run the realtime FastAPI bridge, serve the bundled web client, and connect Twilio phone calls."
icon: "server"
---

Use this guide once your agent is ready and you want to host a realtime bridge or connect phone infrastructure. It builds on the [Overview](./overview) and assumes your agents are ready for deployment.

## Host the FastAPI bridge

`run_realtime` starts a FastAPI app that proxies between your Agency Swarm agents and the OpenAI Realtime API. The helper already converts your agency to the realtime runtime, exposes a `/realtime` websocket, and streams events back to callers.

```python
from agency_swarm import Agency
from agency_swarm.integrations import run_realtime
from voice_agent import voice_agent

agency = Agency(voice_agent)

run_realtime(
agency=agency,
model="gpt-realtime",
host="0.0.0.0",
port=8000,
turn_detection={"type": "server_vad"},
)
```

```bash
python app.py
```

The server prints every incoming websocket connection. When your agents declare `voice=`, the bridge carries that choice automatically; omit the parameter for the entry agent to inherit its own voice. Supply `cors_origins` when you deploy behind a browser client that runs on a different domain.

<Tip>
`run_realtime(..., return_app=True)` returns the FastAPI `app` object if you want to mount it inside an existing application rather than start a dedicated Uvicorn process.
</Tip>

## Serve voice endpoints from `run_fastapi`

Keep your existing REST endpoints and add realtime voice routes with one flag:

```python
run_fastapi(
agencies={"support": create_agency},
enable_realtime=True,
realtime_options={
"model": "gpt-realtime",
"turn_detection": {"type": "server_vad", "interrupt_response": True},
},
enable_logging=True,
)
```

`enable_realtime=True` mounts `/support/realtime` alongside the normal JSON endpoints. Pass `realtime_options` when you want to override model settings (they map directly to `run_realtime` keyword arguments). Authentication and logging apply to the new websocket route automatically.

## Serve the packaged browser client

The static site in `src/agency_swarm/ui/demos/realtime/app` is bundled with the library. Point it at your server by editing `examples/interactive/realtime/demo.py` or by hosting the static files yourself:

```bash
python -m agency_swarm.ui.demos.realtime.app.server
```

This mounts the frontend and websocket bridge under the same process—ideal for internal demos or QA.

## Twilio phone calls

Pass a Twilio number to `run_realtime` to expose a media-stream bridge. The helper exposes `/incoming-call` (returns TwiML) and `/twilio/media-stream` for bidirectional audio.

```python
run_realtime(
agency=agency,
model="gpt-realtime",
twilio_number="+15551234567",
twilio_audio_format="g711_ulaw",
twilio_greeting="Connecting you to the assistant.",
)
```

Deployment checklist:

1. Start the server (with extras installed) and expose it publicly, e.g. `ngrok http 8000`.
2. In the Twilio Console, set your phone number’s voice webhook to `https://<public-host>/incoming-call`.
3. Call the number—the helper streams audio in both directions and reuses your existing tools and handoffs.

For a lower-level implementation (custom playback tracking, fine-grained buffering), see `src/agency_swarm/ui/demos/realtime/twilio/README.md`.

<Note>
Store your Twilio account SID and auth token in a local `.env`, export them before launching the demo, and keep `OPENAI_API_KEY` alongside them. The packaged server reads standard environment variables; no credentials live in source control.
</Note>
84 changes: 84 additions & 0 deletions docs/additional-features/voice-agents/overview.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
---
title: "Overview"
description: "Design voice-first assistants with the same Agency Swarm agents you already use."
icon: "microphone"
---

Agency Swarm reuses your existing `Agent` definitions for voice. This page shows how to adapt agents for spoken conversations; deployment lives on the dedicated [Deployment](./deployment) guide.

## What you can build

- **Phone receptionist** — answers calls, routes to specialists, captures caller details.
- **Live support triage** — gathers context, lets callers interrupt, and escalates to a human or another agent.
- **Language coach** — listens, corrects pronunciation, and keeps the dialogue short and encouraging.

## Prerequisites

- Access to OpenAI Realtime models (`gpt-realtime` or `gpt-realtime-mini` are the recommended latest options).
- `agency-swarm` with FastAPI extras:

```bash
pip install "agency-swarm[fastapi]"
```

## Define your agent (same API)

You keep using the standard `Agent` class—voice agents are regular agents with the same tools, handoffs, and instructions.

```python
from agency_swarm import Agent, function_tool

@function_tool
def lookup_order(order_id: str) -> str:
"""Return a short order status by ID."""
return f"Order {order_id} has shipped and will arrive soon."

voice_agent = Agent(
name="Voice Concierge",
instructions=(
"You are a friendly concierge. Answer in one or two sentences and offer to look up order "
"details when the caller mentions a number."
),
tools=[lookup_order],
voice="nova",
)
```

<Tip>
Keep using the same agent definitions—add `voice=` only when you care about the spoken persona.
</Tip>

Set `voice` to any of the OpenAI realtime voices: `alloy`, `ash`, `coral`, `echo`, `fable`, `onyx`, `nova`, `sage`, or `shimmer`. Each agent can declare its own voice, and the realtime bridge keeps it consistent across handoffs. If you prefer variety without manual assignments, construct your agency with `randomize_agent_voices=True`; any agent missing an explicit voice receives a deterministic random pick at initialization.

## Add handoffs (optional)

Handoffs work exactly as they do in text mode. Register your flows once and they will carry over to voice sessions.

```python
from agency_swarm import Agency, Agent
from agency_swarm.tools import SendMessageHandoff

billing = Agent(name="Billing", instructions="Handle billing questions briefly.")
faq = Agent(name="FAQ", instructions="Answer frequently asked questions.")

concierge = Agent(
name="Concierge",
instructions="Greet the caller, collect intent, then hand off when a specialist is needed.",
)

agency = Agency(
concierge,
communication_flows=[
(concierge > billing, SendMessageHandoff),
(concierge > faq, SendMessageHandoff),
],
)
```

When the concierge invokes `SendMessageHandoff`, the realtime session routes audio and tool access to the designated specialist agent.

## Next steps

- Try the [realtime browser demo](https://github.com/VRSEN/agency-swarm/tree/main/examples/interactive/realtime)
- [Deploy your agents](./deployment) for phone calls using Twilio.
- Review OpenAI’s realtime [Quickstart](https://openai.github.io/openai-agents-python/realtime/quickstart/) and [Guide](https://openai.github.io/openai-agents-python/realtime/guide/) for protocol details—Agency Swarm builds on those primitives.
8 changes: 8 additions & 0 deletions docs/docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,14 @@
"additional-features/few-shot-examples",
"additional-features/guardrails",
"additional-features/streaming",
{
"group": "Voice Agents",
"icon": "microphone",
"pages": [
"additional-features/voice-agents/overview",
"additional-features/voice-agents/deployment"
]
},
"additional-features/fastapi-integration",
"additional-features/mcp-tools-server",
{
Expand Down
11 changes: 11 additions & 0 deletions docs/references/api.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ class Agency:
load_threads_callback: ThreadLoadCallback | None = None,
save_threads_callback: ThreadSaveCallback | None = None,
user_context: dict[str, Any] | None = None,
randomize_agent_voices: bool = False,
voice_random_seed: int | None = None,
**kwargs: Any,
):
"""
Expand All @@ -38,6 +40,8 @@ class Agency:
load_threads_callback: Callable used to load persisted conversation threads
save_threads_callback: Callable used to save conversation threads
user_context: Initial shared context accessible to all agents
randomize_agent_voices: Assign random voices (from the realtime voice list) to agents that do not set `voice`
voice_random_seed: Optional seed used to make voice randomization deterministic
**kwargs: Captures deprecated parameters, emitting warnings when used
"""
```
Expand Down Expand Up @@ -166,6 +170,8 @@ def run_fastapi(
app_token_env: str = "APP_TOKEN",
cors_origins: list[str] | None = None,
enable_agui: bool = False,
enable_realtime: bool = False,
realtime_options: dict[str, Any] | None = None,
):
"""
Serve this agency via the FastAPI integration.
Expand All @@ -176,6 +182,8 @@ def run_fastapi(
app_token_env: Environment variable name for authentication token
cors_origins: List of allowed CORS origins
enable_agui: Enable Agency UI interface
enable_realtime: Mount `/agency_name/realtime` websocket routes in addition to REST endpoints
realtime_options: Optional overrides for realtime sessions (mirrors :func:`integrations.run_realtime`)
"""
```

Expand Down Expand Up @@ -314,6 +322,8 @@ class Agent(BaseAgent[MasterContext]):
tool_use_behavior ("run_llm_again" | "stop_on_first_tool" | StopAtTools | dict[str, Any] | Callable):
Tool execution policy passed through to the agents SDK
reset_tool_choice (bool | None): Whether to reset tool choice after tool calls
voice (Literal["alloy", "ash", "coral", "echo", "fable", "onyx", "nova", "sage", "shimmer"] | None):
Optional realtime voice name for audio output
"""
```

Expand All @@ -330,6 +340,7 @@ class Agent(BaseAgent[MasterContext]):
- **`throw_input_guardrail_error`** (bool): Controls input guardrail mode—False for friendly (guidance as assistant), True for strict (raises exceptions)
- **`handoff_reminder`** (str | None): Custom reminder appended to handoff prompts
- **`tool_concurrency_manager`** (ToolConcurrencyManager): Coordinates concurrent tool execution
- **`voice`** (str | None): Preferred realtime voice for the agent

### Core Execution Methods

Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ This directory contains runnable examples demonstrating key features of Agency S
- **`fastapi_integration/`** – FastAPI server and client examples
- `server.py` – FastAPI server with streaming support
- `client.py` – Client examples for testing endpoints
- **`interactive/realtime/demo.py`** – Launch the packaged realtime voice/web demo (edit to customize agents)
- **`mcp_servers.py`** – Using tools from MCP servers (local and hosted)
- **`connectors.py`** – Google Calendar integration using OpenAI hosted tools

Expand Down
1 change: 1 addition & 0 deletions examples/interactive/realtime/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
"""Package marker for interactive realtime demo examples."""
51 changes: 51 additions & 0 deletions examples/interactive/realtime/demo.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
"""
Interactive realtime voice demo.

Launches the packaged browser frontend + FastAPI backend.
Edit this file to customize the agent behavior.
"""

import sys
from pathlib import Path

# Ensure local src/ is importable when running directly from the repo checkout.
sys.path.insert(0, str(Path(__file__).resolve().parents[2] / "src"))

from agency_swarm import Agency, Agent, function_tool
from agency_swarm.ui.demos.realtime import RealtimeDemoLauncher


@function_tool
def lookup_order(order_id: str) -> str:
"""Return a short order status by ID."""
return f"Order {order_id} has shipped and will arrive soon."


VOICE_AGENT = Agent(
name="Voice Concierge",
instructions=(
"You are a helpful voice concierge. Answer succinctly and offer to look up order details "
"with the provided tool when asked about an order number."
),
tools=[lookup_order],
)

VOICE_AGENCY = Agency(VOICE_AGENT)


def main() -> None:
print("Agency Swarm Realtime Browser Demo")
print("=" * 50)
print("Open http://localhost:8000 after launch.")
print("Press Ctrl+C to stop.\n")

RealtimeDemoLauncher.start(
VOICE_AGENCY,
model="gpt-realtime",
voice="alloy",
turn_detection={"type": "server_vad"},
)


if __name__ == "__main__":
main()
2 changes: 2 additions & 0 deletions src/agency_swarm/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@
from .hooks import PersistenceHooks # noqa: E402
from .integrations.fastapi import run_fastapi # noqa: E402
from .integrations.mcp_server import run_mcp # noqa: E402
from .integrations.realtime import run_realtime # noqa: E402
from .tools import ( # noqa: E402
BaseTool,
CodeInterpreter,
Expand Down Expand Up @@ -96,6 +97,7 @@
"PersistenceHooks",
"SendMessage",
"run_fastapi",
"run_realtime",
"run_mcp",
# Re-exports from Agents SDK
"ModelSettings",
Expand Down
Loading
Loading