Flask service that receives WhatsApp audio messages via Twilio and replies with a transcription using OpenAI Whisper.
WhatsApp voice notes are slow to skim. This bot turns them into text.
- Loads Whisper model once at startup
- Validates Twilio signatures
- Supports WhatsApp voice notes (OGG/Opus) and general
audio/* - Sync or async replies via env flag
- Health endpoint at
/healthz
- Python 3.10+
- ffmpeg installed on system path
- Twilio WhatsApp Sandbox or Business API
pip install -r requirements.txtInstall ffmpeg if you don't have it:
brew install ffmpegCreate a .env file:
ACCOUNT_SID=ACxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
AUTH_TOKEN=your_auth_token
FROM=whatsapp:+1415xxxxxxx
MODEL_NAME=small
PORT=5000
DEBUG=false
ASYNC_REPLY=falseAlternatively, copy and edit env.sample to .env.
Notes:
FROMcan be configured with or without thewhatsapp:prefix; the app handles both.MODEL_NAMEoptions:tiny,base,small,medium,large(trade speed vs accuracy).ASYNC_REPLY=truereturns immediately and sends the transcription in a second message.
python main.pyExpose locally for Twilio callbacks:
ngrok http 5000Set your Twilio WhatsApp sandbox Inbound Webhook URL to:
POST https://<your-ngrok-domain>/whatsapp
- Send a voice note or audio file to your Twilio WhatsApp number
- You will receive the transcription back
Each audio incurs costs for messaging and compute. Whisper model size affects speed and cost; smaller models are faster and cheaper to run.