YADTQ is a distributed task queue system built with Python, Kafka, and Redis. It provides a scalable architecture for processing tasks across multiple workers with features like task retry, health monitoring, automatic worker recovery, and fault tolerance.
- Server: Central coordinator that manages task distribution and worker health
- Workers: Process tasks and report status via heartbeats
- Worker Manager: Monitors and manages worker processes, ensuring minimum worker count
- Logger: Maintains system logs and task history
- Clients: Submit tasks and receive results
- Storage: Redis for task state management
- Message Queue: Kafka for communication between components
- Fault Tolerance: Automatic task retries and worker recovery
- Scalability: Easily scale workers up or down
- Monitoring: Real-time monitoring of worker health
- Task Management: Queue, process, and track tasks throughout their lifecycle
- Graceful Shutdown: Properly handle shutdowns and resource cleanup
- Multiple Task Types: Supports compression, decompression, and encryption tasks
- Docker and Docker Compose
- Python 3.9+
- Available ports: 9092 (Kafka), 6379 (Redis)
- Clone the repository:
git clone https://github.com/prajwal678/YADTQ
cd yadtq- Start the system using Docker:
docker-compose up -d- Or start using the provided script:
python start.py- Run a client:
python client.pyThe recommended way to run YADTQ is with Docker, which takes care of all dependencies and service coordination.
- Start all services:
docker-compose up -d- Scale workers as needed:
docker-compose up -d --scale worker=6- View logs:
docker-compose logs -f- Shut down:
docker-compose downFor development or testing, you can run components directly:
- Start all components with the starter script:
python start.py --workers 4This will start:
- 1 server
- 1 logger
- 1 worker manager
- 4 workers (default, configurable)
All configuration is managed through environment variables with defaults in config.py. Key configurations:
KAFKA_BROKER: Kafka broker address (default: "kafka:9092")REDIS_HOST: Redis host (default: "redis")WORKER_TIMEOUT: Worker health check timeout (default: 30s)TASK_MAX_RETRIES: Maximum task retry attempts (default: 3)
Currently supported task types:
- Compression
- Decompression
- Encryption
YADTQ has several mechanisms for fault tolerance:
- Task Retries: Failed tasks are automatically retried up to a configurable limit
- Worker Recovery: Dead workers are detected and new workers started
- Heartbeat Monitoring: Regular health checks ensure the system knows worker status
- Graceful Shutdown: Components handle shutdown signals properly
The Worker Manager is a new component that:
- Ensures a minimum number of workers are always running
- Detects and restarts failed workers
- Scales workers up or down based on demand
- Monitors worker health via heartbeats
- Worker health is monitored via heartbeats
- Task status can be tracked through the logger
- System logs are available in the
logsdirectory - Worker Manager provides real-time worker status
- Manages task distribution
- Monitors worker health
- Handles task retries
- Maintains task state
- Processes assigned tasks
- Sends regular heartbeats
- Reports task completion/failure
- Handles graceful shutdown
- Starts and stops worker processes
- Ensures system has required workers
- Monitors worker health
- Restarts failed workers
- Records system events
- Maintains task history
- Provides audit trail
- Submits tasks
- Receives results
- Tracks task status
- Client submits task → Server
- Server assigns task → Worker
- Worker processes task → Returns result
- Server validates result → Forwards to Client
- All events → Logger
- Worker Manager ↔ Workers (health checks and management)
-
Kafka Connection Problems
- Ensure Kafka is running and the broker address is correct in config.py
-
Redis Connection Problems
- Verify Redis is running and the host/port settings are correct
-
Worker Not Starting
- Check worker logs for errors
- Ensure the worker manager is running properly
-
Task Stuck in Queue
- Check for available workers
- Verify task format is correct
- Look for errors in server logs