This document provides a comprehensive overview of the current codebase, its architecture, the purpose of each file, and an analysis of our current objectives and recent debugging efforts.
This project implements a real-time, conversational AI voice assistant. It bridges traditional telephony (phone calls) with cutting-edge AI models using LiveKit as the real-time audio transport layer and Vobiz as the SIP trunk provider.
The system is designed to handle both outbound (system calls the user) and inbound (user calls the system) workflows.
The agent operates on a continuous loop of three core components:
- STT (Speech-to-Text): Listens to the human caller and transcribes audio to text (using Deepgram or Sarvam).
- LLM (Large Language Model): Understands the transcribed text, maintains conversation context, and generates a text response (using OpenAI GPT-4o-mini / GPT-4o).
- TTS (Text-to-Speech): Converts the LLM's text response back into natural-sounding audio (using OpenAI, Cartesia, or Sarvam) and streams it back to the caller.
agent.py: The Brain. This is the main LiveKit worker. It runs continuously in the background listening for "job requests" (either an incoming call or a dispatch command to make an outbound call).- It configures the STT, LLM, and TTS models.
- It contains the
OutboundAssistantdefining the AI's prompt and behavior. - It contains tools (like
TransferFunctions) that the AI can trigger mid-conversation to transfer calls to human agents.
make_call.py: The Trigger. A utility script used to initiate outbound calls. It takes a phone number as an argument (--to +91...), generates a unique LiveKit room, and sends anAgentDispatchRequestto LiveKit, injecting the target phone number into the job'smetadata.setup_trunk.py: The Bridge Builder. A setup script used to programmatically configure your LiveKit project to securely connect to your Vobiz SIP Trunk using the credentials stored in your.envfile.
README.md: The standard setup guide, detailing prerequisites, environment variable configuration, and basic usage commands.SOP.md(Standard Operating Protocol): A highly detailed, mandate-driven instruction manual for AI Agents (like myself). It mandates how we should architect solutions, use multiple "software skills" in tandem, review code, and structure debugging sessions to ensure high quality and security.saravm.md: Excellent specialized documentation for swapping out the default OpenAI/Deepgram pipeline for Sarvam AI, which provides highly optimized Speech-to-Text and Text-to-Speech models specifically tuned for Indian languages, accents, and code-mixed speech (e.g., Hinglish). It also details inbound SIP routing via Vobiz.mpconfig.md: Architectural documentation detailing a recent structural change to the booking flow. It explains moving the calendar booking logic from an active "during-call" tool (which causes awkward pauses) to a "post-call" shutdown hook that executes silently after the user hangs up.
.env: Contains all vital, secret API keys (LiveKit, OpenAI, Deepgram, Vobiz, etc.).requirements.txt: Pins the exact Python package versions required to run the project stably, most notablylivekit-agentsand its associated plugins.
Earlier, your agent was crashing immediately on startup with AttributeError: 'TransferFunctions' object has no attribute 'all_tools'.
- The Cause: The newest versions of the
livekit-agentsSDK changed how tools are registered. TheAgentSessionclass no longer accepts atoolsargument, andToolContextno longer uses.all_tools. - The Fix: We updated
agent.pyto usellm.find_function_tools()to extract the tools from your class, and we moved the registration of those tools directly into theOutboundAssistant(theAgentclass) rather than theAgentSession. The agent now successfully starts and registers with LiveKit.
Currently, you are experiencing an issue where running python3 make_call.py --to [NUMBER] does not result in your phone ringing.
- The Symptoms:
make_call.pysuccessfully dispatches the job.agent.pyreceives the job and connects to the room.- However, the agent logs output: "No valid JSON metadata found. This might be an inbound call."
- Because it thinks it's an inbound call, it skips the crucial block of code (
api.sip.create_sip_participant(...)) that actually dials out to your phone via the Vobiz trunk.
- Our Next Steps:
- We added debug logging to
agent.pyto print exactly what raw metadata it receives from LiveKit. - Once you trigger a test call, those logs will reveal if the metadata string is empty, malformed, or if LiveKit has changed the property name (e.g., from
ctx.job.metadatato something else). - Once we identify the malformed data, we can fix the parsing logic in
agent.pyso it properly extracts the number and dials out.
- We added debug logging to