Real-time Speech-to-Text (STT) system with multi-client support, horizontal scaling, and LiveKit integration
# 1. Setup environment
./scripts/setup.sh # Linux/macOS
.\scripts\setup.ps1 # Windows
# 2. Download models
./scripts/download-vosk-model.sh # STT model (Linux/macOS)
.\scripts\download-vosk-model.ps1 # STT model (Windows)
./scripts/download-kokoro-model.sh # TTS model (Linux/macOS)
.\scripts\download-kokoro-model.ps1 # TTS model (Windows)
# 3. Configure environment
cp env.example .env # Edit with your LiveKit credentials
# 4. Run the system
./scripts/run-dev.sh # Development mode
./scripts/run-prod.sh # Production mode# Start with 5 server instances behind load balancer
./scripts/scale-deploy.sh start 5 # Linux/macOS
.\scripts\scale-deploy.ps1 start 5 # Windows
# Scale to 10 instances
./scripts/scale-deploy.sh scale 10
# Check status
./scripts/scale-deploy.sh statusSystem will be available at: http://localhost:8000
- System Overview
- Architecture
- Features
- Quick Start
- Documentation
- Deployment Options
- API Reference
- Monitoring
- Contributing
- Support
Mezon Call Translation is a production-ready, scalable Speech-to-Text system designed for real-time communication platforms. It provides:
- Real-time STT: Convert speech to text with low latency using Vosk engine
- Multi-client Support: Handle multiple simultaneous audio streams
- Horizontal Scaling: Scale across multiple server instances with load balancing
- LiveKit Integration: Seamless integration with LiveKit rooms and agents
- High Availability: Circuit breaker pattern, health monitoring, and graceful degradation
- Vosk: Offline speech recognition engine
- FastAPI: Modern web framework with WebSocket support
- LiveKit: Real-time communication platform
- Docker: Containerization and orchestration
- Nginx: Load balancer and proxy
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β Multiple β β Nginx Load β β Server Pool β
β Clients βββββΊβ Balancer βββββΊβ (Scalable) β
β β β (Port 8000) β β β
βββββββββββββββββββ βββββββββββββββββββ βββββββββββββββββββ
β β
βΌ βΌ
βββββββββββββββββββ βββββββββββββββββββ
β LiveKit Agent β β Vosk STT β
β (Port 8080) β β Workers β
βββββββββββββββββββ βββββββββββββββββββ
- Server - FastAPI with multi-worker STT processing
- Agent - LiveKit integration with VAD processing
- Load Balancer - Nginx for traffic distribution and WebSocket proxy
- Session Management - Multi-client session coordination
- Health Monitoring - Comprehensive health checks and metrics
- β Real-time Speech-to-Text with Vosk engine
- β Multi-client Session Management with language support
- β WebSocket-based Communication for low latency
- β Adaptive Processing based on system load
- β Circuit Breaker Pattern for fault tolerance
- β Voice Activity Detection (VAD) for efficiency
- π Horizontal Scaling with automatic load balancing
- π Comprehensive Monitoring and health checks
- π Auto-recovery and graceful degradation
- π Performance Metrics and analysis tools
- π³ Docker Containerization for easy deployment
- π§ Configuration Management via environment variables
- π€ LiveKit Integration for room management
- π REST API for system management
- π‘ WebSocket API for real-time communication
- π JWT Authentication support
- π Multi-language Support for transcripts
- Setup Guide - Complete installation and configuration guide
- Environment Configuration - LiveKit credentials and system settings
- Model Management - Vosk model download and configuration
- Server Architecture - Detailed server design and components
- Agent Architecture - LiveKit agent implementation
- System Design Patterns - Circuit breaker, session management, worker pools
- Metrics Guide - Performance monitoring and log analysis
- Health Check Endpoints - System status and monitoring
- Troubleshooting - Common issues and solutions
- API Documentation - REST and WebSocket endpoints
- Configuration Reference - All environment variables and settings
- Development Setup - Hot reload and debugging
# Hot reload enabled, debug logging
./scripts/run-dev.sh- β Hot reload for code changes
- β Enhanced debugging
- β Local volume mounting
# Optimized for production
./scripts/run-prod.sh- β Performance optimizations
- β Resource limits
- β Production logging
# Multiple server instances with load balancing
./scripts/scale-deploy.sh start 5- β Multiple server instances
- β Nginx load balancer
- β Auto-scaling capabilities
- β High availability
# Custom scaling
docker-compose up -d --scale server=3ws://localhost:8000/ws/vosk/
Parameters:
client_id: Unique client identifiersession_id: Session identifiertranscript: Enable transcript deliverytranslation: Enable translation deliverylanguage: Client language (en, vi, etc.)
Input: Binary audio data
Output: JSON transcript/translation results
| Endpoint | Method | Description |
|---|---|---|
/health |
GET | Detailed health status |
/health/simple |
GET | Simple health check |
/agent/join |
POST | LiveKit agent dispatch |
/ws/stats |
GET | WebSocket statistics |
# Simple health check
curl http://localhost:8000/health/simple
# Detailed health information
curl http://localhost:8000/health- Health Endpoints: Real-time system status
- Metrics Collection: Performance and usage statistics
- Log Analysis: Comprehensive logging with structured format
- Worker Statistics: STT worker performance tracking
- Audio Processing Latency: Real-time performance tracking
- Worker Load Distribution: Load balancing effectiveness
- Session Management: Client connection statistics
- Error Rates: System reliability monitoring
# Check system status
./scripts/scale-deploy.sh status
# View real-time logs
docker-compose logs -f server
# Monitor resource usage
docker statsFor detailed metrics analysis, see the Metrics Guide.
# Core Configuration
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
VOSK_MODEL_PATH=/app/models/vosk-model
# Performance Tuning
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
LOG_LEVEL=INFOThe system supports extensive configuration for:
- Audio processing parameters
- Worker pool management
- Circuit breaker settings
- Health check intervals
- Scaling parameters
See Setup Guide for complete configuration options.
- Fork the repository
- Follow the Setup Guide
- Use development mode:
./scripts/run-dev.sh - Make changes with hot reload enabled
- Test thoroughly with multiple clients
- Submit pull request with documentation updates
- Follow the existing service pattern
- Maintain thread-safe operations
- Add appropriate error handling
- Include health check integration
- Update documentation
Server won't start:
# Check Vosk STT model exists
ls -la models/vosk-model/
# Download if missing
./scripts/download-vosk-model.shAgent TTS not working:
# Check Kokoro TTS model exists
ls -la models/kokoro_models/
# Download if missing
./scripts/download-kokoro-model.sh
# Or download with specific voices
./scripts/download-kokoro-model.sh -v "af_heart,am_adam"Agent connection failed:
# Verify server is running
curl http://localhost:8000/health/simple
# Check environment variables
cat .envPoor performance:
# Check resource usage
docker stats
# Scale up servers
./scripts/scale-deploy.sh scale 10For comprehensive troubleshooting, see the Setup Guide.
- Check Documentation: Review relevant guides in
/docs - System Requirements: Ensure Docker & Docker Compose are installed
- Health Checks: Verify system status via health endpoints
- Log Analysis: Examine service logs for specific errors
- Resource Check: Ensure adequate CPU, memory, and disk space
# System status
docker-compose ps
docker stats
# Service health
curl http://localhost:8000/health
# Recent logs
docker-compose logs --tail=100 server- Setup Guide - Installation and configuration
- Server Architecture - System design
- Operations Guide - Monitoring and troubleshooting
This project is part of the Mezon platform ecosystem. See the project documentation for licensing information.
speech-to-text real-time vosk fastapi livekit docker microservices websocket audio-processing scalable/degraded/unhealthy)
β βββ Uptime information
β βββ Component details
β βββ HTTP status codes (200/503)
βββ /health/simple: Simple boolean check
## Outstanding Technical Features
### 1. **Scalability**
- Multi-worker architecture cho STT processing
- Async/await pattern cho I/O operations
- Queue-based load balancing
- Adaptive processing based on system load
### 2. **Reliability**
- Circuit breaker pattern cho error handling
- Graceful degradation (VAD fallback)
- Resource cleanup vΓ memory management
- Health monitoring vΓ metrics
### 3. **Performance Optimization**
- VAD pre-filtering Δα» giαΊ£m STT workload
- Chunk accumulation strategy
- Overlapping audio processing
- GPU acceleration support (VAD)
### 4. **Real-time Capabilities**
- WebSocket-based communication
- Non-blocking audio submission
- Async result dispatching
- Low-latency processing pipeline
### 5. **Horizontal Scaling with Load Balancer**
- Nginx load balancer for multiple server instances
- Docker Compose scaling capabilities
- Health check integration with load balancer
- Session-independent client routing
- Zero-downtime scaling operations
### 6. **Multi-tenant Support**
- Session-based client isolation
- Per-client language settings
- Flexible subscription model (transcript/translation)
- Resource sharing vα»i isolation
## Overall Operational Flow
1. **Client connection** β WebSocket with parameters
2. **Audio streaming** β Continuous audio chunks
3. **VAD filtering** β Silence elimination
4. **STT processing** β Multi-worker Vosk recognition
5. **Result dispatching** β Async delivery to subscribed clients
6. **Session management** β Multi-client coordination
7. **Resource cleanup** β Automatic maintenance
The system is designed to handle real-time speech-to-text for multiple clients simultaneously with low latency and high reliability.
## Horizontal Scaling with Load Balancer
### Scaling Architecture
The system supports horizontal scaling with Nginx load balancer:
Client β Nginx Load Balancer β Multiple Server Instances β STT Processing Workers β Shared Result Queue β Agent Services
### How It Works
1. **Nginx Load Balancer**:
- Distribute traffic to multiple server instances
- Automatic health check for backend servers
- WebSocket proxy with timeout configuration
- Round-robin load balancing (other configurations are possible)
2. **Server Scaling**:
- Multiple FastAPI server instances running in parallel
- Each instance has its own set of STT workers
- Independent session management on each instance
- Agent connects via Nginx instead of direct connection
### Scaling Deployment
#### 1. Quick Start with Script
**Linux/macOS:**
```bash
# Start with 5 server instances
./scripts/scale-deploy.sh start 5
# Scale to 10 instances
./scripts/scale-deploy.sh scale 10
# Check status
./scripts/scale-deploy.sh status
Windows:
# Start with 5 server instances
.\scripts\scale-deploy.ps1 start 5
# Scale to 10 instances
.\scripts\scale-deploy.ps1 scale 10
# Check status
.\scripts\scale-deploy.ps1 status# Build images
docker-compose build
# Start with 3 server instances
docker-compose up -d --scale server=3
# Scale to 5 instances
docker-compose up -d --scale server=5
# Check status
docker-compose ps- Load Balancer Health:
http://localhost:8000/health/simple - Nginx Status: Automatic health checks to backend servers
- Container Status:
docker-compose ps - Logs:
docker-compose logs -f [service_name]
The nginx.conf file is optimized for:
- WebSocket proxy support
- Health check integration
- Timeout configuration for long-running connections
- Load balancing strategy (adjustable)
- Increased Throughput: Multiple servers handle concurrent requests
- High Availability: Server failure does not affect the entire system
- Zero Downtime Scaling: Add/remove instances without interrupting service
- Resource Optimization: Distribute load evenly across multiple instances