Skip to content

mezonai/mezon-call-translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

486 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Mezon Call Translation

Real-time Speech-to-Text (STT) system with multi-client support, horizontal scaling, and LiveKit integration

Docker FastAPI Vosk LiveKit

πŸš€ Quick Start

Option 1: Basic Setup

# 1. Setup environment
./scripts/setup.sh                    # Linux/macOS
.\scripts\setup.ps1                    # Windows

# 2. Download models
./scripts/download-vosk-model.sh       # STT model (Linux/macOS)
.\scripts\download-vosk-model.ps1       # STT model (Windows)

./scripts/download-kokoro-model.sh     # TTS model (Linux/macOS)
.\scripts\download-kokoro-model.ps1     # TTS model (Windows)

# 3. Configure environment
cp env.example .env                    # Edit with your LiveKit credentials

# 4. Run the system
./scripts/run-dev.sh                   # Development mode
./scripts/run-prod.sh                  # Production mode

Option 2: Horizontal Scaling (Recommended for Production)

# Start with 5 server instances behind load balancer
./scripts/scale-deploy.sh start 5      # Linux/macOS
.\scripts\scale-deploy.ps1 start 5      # Windows

# Scale to 10 instances
./scripts/scale-deploy.sh scale 10

# Check status
./scripts/scale-deploy.sh status

System will be available at: http://localhost:8000

πŸ“‹ Table of Contents

🎯 System Overview

Mezon Call Translation is a production-ready, scalable Speech-to-Text system designed for real-time communication platforms. It provides:

  • Real-time STT: Convert speech to text with low latency using Vosk engine
  • Multi-client Support: Handle multiple simultaneous audio streams
  • Horizontal Scaling: Scale across multiple server instances with load balancing
  • LiveKit Integration: Seamless integration with LiveKit rooms and agents
  • High Availability: Circuit breaker pattern, health monitoring, and graceful degradation

Key Technologies

  • Vosk: Offline speech recognition engine
  • FastAPI: Modern web framework with WebSocket support
  • LiveKit: Real-time communication platform
  • Docker: Containerization and orchestration
  • Nginx: Load balancer and proxy

πŸ—οΈ Architecture

High-Level Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Multiple      β”‚    β”‚  Nginx Load     β”‚    β”‚  Server Pool    β”‚
β”‚   Clients       │◄──►│  Balancer       │◄──►│  (Scalable)     β”‚
β”‚                 β”‚    β”‚  (Port 8000)    β”‚    β”‚                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                β”‚                       β”‚
                                β–Ό                       β–Ό
                       β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                       β”‚  LiveKit Agent  β”‚    β”‚  Vosk STT       β”‚
                       β”‚  (Port 8080)    β”‚    β”‚  Workers        β”‚
                       β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Components

  1. Server - FastAPI with multi-worker STT processing
  2. Agent - LiveKit integration with VAD processing
  3. Load Balancer - Nginx for traffic distribution and WebSocket proxy
  4. Session Management - Multi-client session coordination
  5. Health Monitoring - Comprehensive health checks and metrics

✨ Features

Core Capabilities

  • βœ… Real-time Speech-to-Text with Vosk engine
  • βœ… Multi-client Session Management with language support
  • βœ… WebSocket-based Communication for low latency
  • βœ… Adaptive Processing based on system load
  • βœ… Circuit Breaker Pattern for fault tolerance
  • βœ… Voice Activity Detection (VAD) for efficiency

Scalability & Operations

  • πŸš€ Horizontal Scaling with automatic load balancing
  • πŸ“Š Comprehensive Monitoring and health checks
  • πŸ”„ Auto-recovery and graceful degradation
  • πŸ“ˆ Performance Metrics and analysis tools
  • 🐳 Docker Containerization for easy deployment
  • πŸ”§ Configuration Management via environment variables

Integration Features

  • 🎀 LiveKit Integration for room management
  • 🌐 REST API for system management
  • πŸ“‘ WebSocket API for real-time communication
  • πŸ” JWT Authentication support
  • πŸ“ Multi-language Support for transcripts

πŸ“š Documentation

Setup & Installation

  • Setup Guide - Complete installation and configuration guide
  • Environment Configuration - LiveKit credentials and system settings
  • Model Management - Vosk model download and configuration

Architecture Documentation

  • Server Architecture - Detailed server design and components
  • Agent Architecture - LiveKit agent implementation
  • System Design Patterns - Circuit breaker, session management, worker pools

Operations & Monitoring

  • Metrics Guide - Performance monitoring and log analysis
  • Health Check Endpoints - System status and monitoring
  • Troubleshooting - Common issues and solutions

Development

  • API Documentation - REST and WebSocket endpoints
  • Configuration Reference - All environment variables and settings
  • Development Setup - Hot reload and debugging

πŸš€ Deployment Options

1. Development Mode

# Hot reload enabled, debug logging
./scripts/run-dev.sh
  • βœ… Hot reload for code changes
  • βœ… Enhanced debugging
  • βœ… Local volume mounting

2. Production Mode

# Optimized for production
./scripts/run-prod.sh
  • βœ… Performance optimizations
  • βœ… Resource limits
  • βœ… Production logging

3. Horizontal Scaling (Recommended)

# Multiple server instances with load balancing
./scripts/scale-deploy.sh start 5
  • βœ… Multiple server instances
  • βœ… Nginx load balancer
  • βœ… Auto-scaling capabilities
  • βœ… High availability

4. Manual Docker Compose

# Custom scaling
docker-compose up -d --scale server=3

πŸ”Œ API Reference

WebSocket API

ws://localhost:8000/ws/vosk/

Parameters:

  • client_id: Unique client identifier
  • session_id: Session identifier
  • transcript: Enable transcript delivery
  • translation: Enable translation delivery
  • language: Client language (en, vi, etc.)

Input: Binary audio data
Output: JSON transcript/translation results

REST API

Endpoint Method Description
/health GET Detailed health status
/health/simple GET Simple health check
/agent/join POST LiveKit agent dispatch
/ws/stats GET WebSocket statistics

Health Check Example

# Simple health check
curl http://localhost:8000/health/simple

# Detailed health information
curl http://localhost:8000/health

πŸ“Š Monitoring

Built-in Monitoring

  • Health Endpoints: Real-time system status
  • Metrics Collection: Performance and usage statistics
  • Log Analysis: Comprehensive logging with structured format
  • Worker Statistics: STT worker performance tracking

Key Metrics

  • Audio Processing Latency: Real-time performance tracking
  • Worker Load Distribution: Load balancing effectiveness
  • Session Management: Client connection statistics
  • Error Rates: System reliability monitoring

Monitoring Tools

# Check system status
./scripts/scale-deploy.sh status

# View real-time logs
docker-compose logs -f server

# Monitor resource usage
docker stats

For detailed metrics analysis, see the Metrics Guide.

πŸ”§ Configuration

Environment Variables

# Core Configuration
LIVEKIT_URL=wss://your-livekit-server.com
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
VOSK_MODEL_PATH=/app/models/vosk-model

# Performance Tuning
SERVER_HOST=0.0.0.0
SERVER_PORT=8000
LOG_LEVEL=INFO

Advanced Configuration

The system supports extensive configuration for:

  • Audio processing parameters
  • Worker pool management
  • Circuit breaker settings
  • Health check intervals
  • Scaling parameters

See Setup Guide for complete configuration options.

🀝 Contributing

Development Setup

  1. Fork the repository
  2. Follow the Setup Guide
  3. Use development mode: ./scripts/run-dev.sh
  4. Make changes with hot reload enabled
  5. Test thoroughly with multiple clients
  6. Submit pull request with documentation updates

Architecture Guidelines

  • Follow the existing service pattern
  • Maintain thread-safe operations
  • Add appropriate error handling
  • Include health check integration
  • Update documentation

πŸ› Troubleshooting

Common Issues

Server won't start:

# Check Vosk STT model exists
ls -la models/vosk-model/

# Download if missing
./scripts/download-vosk-model.sh

Agent TTS not working:

# Check Kokoro TTS model exists
ls -la models/kokoro_models/

# Download if missing
./scripts/download-kokoro-model.sh

# Or download with specific voices
./scripts/download-kokoro-model.sh -v "af_heart,am_adam"

Agent connection failed:

# Verify server is running
curl http://localhost:8000/health/simple

# Check environment variables
cat .env

Poor performance:

# Check resource usage
docker stats

# Scale up servers
./scripts/scale-deploy.sh scale 10

For comprehensive troubleshooting, see the Setup Guide.

πŸ“ž Support

Getting Help

  1. Check Documentation: Review relevant guides in /docs
  2. System Requirements: Ensure Docker & Docker Compose are installed
  3. Health Checks: Verify system status via health endpoints
  4. Log Analysis: Examine service logs for specific errors
  5. Resource Check: Ensure adequate CPU, memory, and disk space

Debug Information

# System status
docker-compose ps
docker stats

# Service health
curl http://localhost:8000/health

# Recent logs
docker-compose logs --tail=100 server

Resources


πŸ“„ License

This project is part of the Mezon platform ecosystem. See the project documentation for licensing information.

🏷️ Tags

speech-to-text real-time vosk fastapi livekit docker microservices websocket audio-processing scalable/degraded/unhealthy) β”‚ β”œβ”€β”€ Uptime information β”‚ β”œβ”€β”€ Component details β”‚ └── HTTP status codes (200/503) └── /health/simple: Simple boolean check


## Outstanding Technical Features

### 1. **Scalability**
- Multi-worker architecture cho STT processing
- Async/await pattern cho I/O operations
- Queue-based load balancing
- Adaptive processing based on system load

### 2. **Reliability**
- Circuit breaker pattern cho error handling
- Graceful degradation (VAD fallback)
- Resource cleanup vΓ  memory management
- Health monitoring vΓ  metrics

### 3. **Performance Optimization**
- VAD pre-filtering để giαΊ£m STT workload
- Chunk accumulation strategy
- Overlapping audio processing
- GPU acceleration support (VAD)

### 4. **Real-time Capabilities**
- WebSocket-based communication
- Non-blocking audio submission
- Async result dispatching
- Low-latency processing pipeline

### 5. **Horizontal Scaling with Load Balancer**
- Nginx load balancer for multiple server instances
- Docker Compose scaling capabilities
- Health check integration with load balancer
- Session-independent client routing
- Zero-downtime scaling operations

### 6. **Multi-tenant Support**
- Session-based client isolation
- Per-client language settings
- Flexible subscription model (transcript/translation)
- Resource sharing vα»›i isolation

## Overall Operational Flow

1. **Client connection** β†’ WebSocket with parameters
2. **Audio streaming** β†’ Continuous audio chunks
3. **VAD filtering** β†’ Silence elimination
4. **STT processing** β†’ Multi-worker Vosk recognition
5. **Result dispatching** β†’ Async delivery to subscribed clients
6. **Session management** β†’ Multi-client coordination
7. **Resource cleanup** β†’ Automatic maintenance

The system is designed to handle real-time speech-to-text for multiple clients simultaneously with low latency and high reliability.

## Horizontal Scaling with Load Balancer

### Scaling Architecture
The system supports horizontal scaling with Nginx load balancer:

Client β†’ Nginx Load Balancer β†’ Multiple Server Instances ↓ STT Processing Workers ↓ Shared Result Queue ↓ Agent Services


### How It Works
1. **Nginx Load Balancer**:

- Distribute traffic to multiple server instances
- Automatic health check for backend servers
- WebSocket proxy with timeout configuration
- Round-robin load balancing (other configurations are possible)

2. **Server Scaling**:

- Multiple FastAPI server instances running in parallel
- Each instance has its own set of STT workers
- Independent session management on each instance
- Agent connects via Nginx instead of direct connection

### Scaling Deployment

#### 1. Quick Start with Script
**Linux/macOS:**
```bash
# Start with 5 server instances
./scripts/scale-deploy.sh start 5

# Scale to 10 instances
./scripts/scale-deploy.sh scale 10

# Check status
./scripts/scale-deploy.sh status

Windows:

# Start with 5 server instances
.\scripts\scale-deploy.ps1 start 5

# Scale to 10 instances
.\scripts\scale-deploy.ps1 scale 10

# Check status
.\scripts\scale-deploy.ps1 status

2. Manual Docker Compose

# Build images
docker-compose build

# Start with 3 server instances
docker-compose up -d --scale server=3

# Scale to 5 instances
docker-compose up -d --scale server=5

# Check status
docker-compose ps

Health Check and Monitoring

  • Load Balancer Health: http://localhost:8000/health/simple
  • Nginx Status: Automatic health checks to backend servers
  • Container Status: docker-compose ps
  • Logs: docker-compose logs -f [service_name]

Nginx Configuration

The nginx.conf file is optimized for:

  • WebSocket proxy support
  • Health check integration
  • Timeout configuration for long-running connections
  • Load balancing strategy (adjustable)

Performance Benefits

  • Increased Throughput: Multiple servers handle concurrent requests
  • High Availability: Server failure does not affect the entire system
  • Zero Downtime Scaling: Add/remove instances without interrupting service
  • Resource Optimization: Distribute load evenly across multiple instances

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors