| Layer | Service | Purpose |
|---|---|---|
| STT | Sarvam saaras:v3 |
Speech-to-text (Indian languages + English) |
| LLM | OpenAI gpt-4o |
Conversation intelligence |
| TTS | Sarvam bulbul:v3 |
Natural Indian-accent voice output |
| Transport | LiveKit | WebRTC real-time audio |
| Telephony | Vobiz SIP | Inbound phone call routing |
You need four sets of credentials before writing any code.
- Go to dashboard.sarvam.ai
- Sign up and go to API Keys
- Copy your key →
sk_xxxxxxxxxxxxxxxxxx
- Go to cloud.livekit.io
- Create a project
- Go to Settings → Keys, copy:
LIVEKIT_URL→ e.g.wss://my-project-abc123.livekit.cloudLIVEKIT_API_KEY→ e.g.APIxxxxxxxxxxxxxLIVEKIT_API_SECRET→ long secret string
- Go to Settings → Project, copy your:
SIP URI→ e.g.sip:my-project-id.sip.livekit.cloud:5060
- Go to platform.openai.com/api-keys
- Create a new secret key →
sk-proj-xxxxxxxxxxxxxxxx
- Go to vobiz.ai and create an account
- Add balance for inbound calls
- Create a SIP trunk (Part 3 covers this in detail)
- Purchase a DID phone number
# Recommended: use a virtual environment
python -m venv .venv
source .venv/bin/activate # macOS/Linux
# OR
.venv\Scripts\activate # Windows
# Install all required packages
pip install "livekit-agents[sarvam,openai,silero]~=1.3" python-dotenvNote: The
~=1.3pins you to LiveKit Agents v1.3+ which officially supports the Sarvam plugin.
voice-agent/
├── agent.py ← Main agent logic
├── .env ← All your API keys (never commit this)
├── requirements.txt ← Pinned dependencies
└── README.md
# LiveKit
LIVEKIT_URL=wss://your-project-abc123.livekit.cloud
LIVEKIT_API_KEY=APIxxxxxxxxxxxxx
LIVEKIT_API_SECRET=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# Sarvam AI
SARVAM_API_KEY=sk_xxxxxxxxxxxxxxxxxxxxxxxx
# OpenAI
OPENAI_API_KEY=sk-proj-xxxxxxxxxxxxxxxxlivekit-agents[sarvam,openai,silero]~=1.3
python-dotenv>=1.0This is the production-ready agent with all Sarvam best practices applied.
import logging
from dotenv import load_dotenv
from livekit.agents import JobContext, WorkerOptions, cli
from livekit.agents.voice import Agent, AgentSession
from livekit.plugins import openai, sarvam
load_dotenv()
logger = logging.getLogger("sarvam-voice-agent")
logger.setLevel(logging.INFO)
class InboundVoiceAgent(Agent):
def __init__(self) -> None:
super().__init__(
instructions="""
You are a friendly, professional inbound voice assistant.
Keep your responses short, clear, and conversational — you are
speaking on a phone call. Always greet callers warmly and help
them efficiently. Avoid long monologues; ask one question at a time.
If you don't understand something, ask the caller to repeat it.
""",
# ── STT: Sarvam Saaras v3 ──────────────────────────────────
# flush_signal=True is REQUIRED for proper turn detection
stt=sarvam.STT(
language="unknown", # Auto-detect: en-IN, hi-IN, mr-IN, etc.
model="saaras:v3", # Latest Sarvam STT model
mode="transcribe", # Use "translate" to force English output
flush_signal=True, # Enables speech start/end events
),
# ── LLM: OpenAI GPT-4o ────────────────────────────────────
llm=openai.LLM(model="gpt-4o"),
# ── TTS: Sarvam Bulbul v3 ─────────────────────────────────
tts=sarvam.TTS(
target_language_code="en-IN", # Indian English output
model="bulbul:v3", # Latest Sarvam TTS model
speaker="anand", # Male, clear Indian accent
# Other voices ↓
# Female: priya, simran, ishita, kavya, ritu, neha, pooja
# Male: aditya, rohan, shubh, rahul, amit, dev, varun
pitch=0.0, # Range: -20.0 to 20.0
pace=1.0, # Range: 0.5 to 2.0 (speed)
loudness=1.0, # Range: 0.5 to 2.0
),
)
async def on_enter(self):
"""Triggered when a caller connects — agent speaks first."""
await self.session.generate_reply(
instructions="Greet the caller warmly. Say: 'Hello! Thanks for calling. How can I help you today?'"
)
async def entrypoint(ctx: JobContext):
"""
LiveKit calls this function every time a new call arrives.
The agent name 'voice-assistant' MUST match your LiveKit dispatch rule.
"""
logger.info(f"Inbound call connected to room: {ctx.room.name}")
# ── AgentSession: Sarvam-optimised settings ────────────────────────
# ❌ Do NOT pass vad= here — Sarvam handles VAD internally
session = AgentSession(
turn_detection="stt", # Let Sarvam STT handle turn detection
min_endpointing_delay=0.07, # 70ms — matches Sarvam STT latency
)
await session.start(
agent=InboundVoiceAgent(),
room=ctx.room,
)
if __name__ == "__main__":
cli.run_app(
WorkerOptions(
entrypoint_fnc=entrypoint,
agent_name="voice-assistant", # ← Must match LiveKit dispatch rule
)
)curl -X POST https://api.vobiz.ai/api/v1/account/{YOUR_ACCOUNT_ID}/trunks \
-H "Authorization: Bearer YOUR_VOBIZ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"name": "Sarvam-LiveKit-Agent-Trunk",
"auth_type": "credentials"
}'Save from the response:
sip_domain→ e.g.5f3a607b.sip.vobiz.aiusernamepassword
curl -X POST https://api.vobiz.ai/api/v1/account/{ACCOUNT_ID}/numbers \
-H "Authorization: Bearer YOUR_VOBIZ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"country": "IN",
"type": "local"
}'
⚠️ Critical: Remove thesip:prefix from the LiveKit SIP URI.
curl -X PATCH https://api.vobiz.ai/api/v1/account/{ACCOUNT_ID}/trunks/{TRUNK_ID} \
-H "Authorization: Bearer YOUR_VOBIZ_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"inbound_destination": "my-project-id.sip.livekit.cloud:5060"
}'| LiveKit Shows | What You Enter in Vobiz |
|---|---|
sip:my-project.sip.livekit.cloud:5060 |
my-project.sip.livekit.cloud:5060 |
- Go to LiveKit Cloud Dashboard → Telephony → Trunks
- Click Create new trunk → Inbound
- Fill in:
- Phone Numbers: Your Vobiz DID number (e.g.
+918071XXXXXX) - Allowed Addresses:
0.0.0.0/0(restrict to Vobiz IPs in production)
- Phone Numbers: Your Vobiz DID number (e.g.
- Click Create and save the Trunk ID
This tells LiveKit to auto-spawn your agent when a call arrives.
- Go to Telephony → Dispatch Rules
- Click Create new dispatch rule
- Configure:
- Rule Type:
Individual - Room Prefix:
call- - Match Trunks: Select your inbound trunk from Step 5.1
- Rule Type:
- Expand "Agent dispatch" section and set:
- Agent Name:
voice-assistant← Must exactly matchagent_nameinagent.py
- Agent Name:
- Click Create
If you want your agent to also make outbound calls:
import asyncio
from livekit import api as livekit_api
async def setup_outbound_trunk():
lk = livekit_api.LiveKitAPI(
url="YOUR_LIVEKIT_URL",
api_key="YOUR_LIVEKIT_API_KEY",
api_secret="YOUR_LIVEKIT_API_SECRET",
)
trunk = await lk.sip.create_sip_outbound_trunk(
livekit_api.CreateSIPOutboundTrunkRequest(
trunk=livekit_api.SIPOutboundTrunkInfo(
name="Vobiz Outbound Trunk",
address="5f3a607b.sip.vobiz.ai", # Your Vobiz sip_domain
auth_username="YOUR_VOBIZ_USERNAME",
auth_password="YOUR_VOBIZ_PASSWORD",
numbers=["+918071XXXXXX"], # Your Vobiz DID number
)
)
)
print(f"Trunk created: {trunk.sip_trunk_id}")
asyncio.run(setup_outbound_trunk())# Development mode (verbose logging)
python agent.py dev
# Production mode
python agent.py start# In a second terminal — simulates a caller
python agent.py console- Ensure
agent.py devis running - Call your Vobiz DID phone number from any phone
- The call routes:
Phone → Vobiz SIP → LiveKit → Your Agent - You should hear the greeting from Sarvam's
anandvoice
| Gender | Voices |
|---|---|
| Male (23) | shubh, aditya, rahul, rohan, amit, dev, ratan, varun, manan, sumit, kabir, aayan, anand, tarun, sunny, mani, gokul, vijay, mohit, rehan, soham |
| Female (16) | ritu, priya, neha, pooja, simran, kavya, ishita, shreya, roopa, amelia, sophia, tanya, shruti, suhani, kavitha, rupali |
| Language | Code |
|---|---|
| English (India) | en-IN |
| Hindi | hi-IN |
| Marathi | mr-IN |
| Tamil | ta-IN |
| Telugu | te-IN |
| Gujarati | gu-IN |
| Kannada | kn-IN |
| Bengali | bn-IN |
| Auto-detect | unknown |
Sarvam models natively handle code-mixed speech (Hinglish, Tanglish, etc.):
stt=sarvam.STT(
language="unknown", # Auto-detects Hindi, Marathi, Hinglish, etc.
model="saaras:v3",
mode="transcribe",
flush_signal=True,
),
tts=sarvam.TTS(
target_language_code="hi-IN",
model="bulbul:v3",
speaker="priya",
),| Problem | Cause | Fix |
|---|---|---|
| Agent doesn't answer inbound call | Dispatch rule misconfigured | Verify agent name matches exactly (voice-assistant) |
| Call disconnects immediately | sip: prefix not removed |
Remove sip: from Vobiz inbound_destination |
401 Unauthorized |
Credentials mismatch | Re-check Vobiz username/password in LiveKit trunk |
| Poor transcription quality | Wrong language code | Use language="unknown" for auto-detection |
| Agent interrupts caller mid-sentence | VAD conflict | Ensure NO vad= param in AgentSession() |
| High latency | Endpointing delay not set | Add min_endpointing_delay=0.07 to AgentSession |
-
.envfile is in.gitignore - Restrict
allowed_addressesin LiveKit inbound trunk to Vobiz IP ranges - Set
language=explicitly (not"unknown") if you know the caller's language - Monitor Vobiz account balance
- Add error handling / reconnection logic in
entrypoint() - Deploy
agent.pyto a cloud server (Railway, Fly.io, or a VPS) so it runs 24/7 - Use
python agent.py start(notdev) in production
Caller dials DID number
↓
Vobiz SIP Trunk
↓
LiveKit SIP Gateway ←── Dispatch Rule auto-spawns agent
↓
LiveKit WebRTC Room
↓
┌─────────────────────────────────────────┐
│ InboundVoiceAgent │
│ │
│ Audio In → Sarvam STT (saaras:v3) │
│ → OpenAI GPT-4o (LLM) │
│ → Sarvam TTS (bulbul:v3) │
│ → Audio Out │
└─────────────────────────────────────────┘
Sarvam AI docs: docs.sarvam.ai | Vobiz docs: docs.vobiz.ai | LiveKit docs: docs.livekit.io/agents