A production-ready, self-improving speech-to-text system with autonomous error detection, correction, and continuous learning capabilities. The system integrates baseline STT models (Whisper), intelligent agent-based error detection, comprehensive data management, and automated fine-tuning pipelines.
- Overview
- Features
- Project Structure
- Quick Start
- System Components
- Running the System
- API Reference
- Development Workflows
- Testing
- Documentation
- Tutorials
This system provides:
- Baseline STT Model: Optimized Whisper-based transcription with GPU acceleration
- Intelligent Agent: Autonomous error detection with 8+ heuristics
- Data Management: Comprehensive system for tracking failures, corrections, and performance
- Evaluation Framework: Multi-metric evaluation with visualization
- Fine-tuning Pipeline: Automated dataset preparation for model improvement
- Cloud Integration: Seamless GCP integration with cost monitoring
- โ Real-time transcription via REST API
- โ Multi-heuristic error detection (8+ error types)
- โ Automatic correction with learning feedback loop
- โ Failed case tracking and correction management
- โ Performance monitoring (WER, CER, latency, throughput)
- โ Fine-tuning dataset preparation with quality control
- โ Version control for datasets with checksums
- โ GCP integration with automated backup
- Empty/too short transcripts
- Length anomalies (too long/short ratio)
- Repeated character patterns
- Special character overload
- Low model confidence
- Unusual word patterns
- All caps text
- Missing punctuation
Adaptive-Self-Learning-Agentic-AI-System/
โโโ src/ # Core source code
โ โโโ baseline_model.py # Whisper STT model wrapper
โ โโโ inference_api.py # Baseline transcription API
โ โโโ agent_api.py # Agent-integrated API
โ โโโ model_selector.py # Model comparison utilities
โ โโโ benchmark.py # Performance benchmarking
โ โโโ agent/ # Agent system
โ โ โโโ agent.py # Main agent orchestrator
โ โ โโโ error_detector.py # Multi-heuristic error detection
โ โ โโโ self_learner.py # Learning and feedback system
โ โโโ data/ # Data management system
โ โ โโโ data_manager.py # Failed case storage
โ โ โโโ metadata_tracker.py # Performance tracking
โ โ โโโ finetuning_pipeline.py # Dataset preparation
โ โ โโโ version_control.py # Data versioning
โ โ โโโ integration.py # Unified interface
โ โโโ evaluation/ # Evaluation tools
โ โ โโโ metrics.py # WER/CER calculation
โ โโโ utils/ # Utilities
โ โโโ gcs_utils.py # Google Cloud Storage
โ
โโโ experiments/ # Testing and evaluation scripts
โ โโโ test_baseline.py # Test baseline model
โ โโโ test_agent.py # Test agent functionality
โ โโโ test_api.py # Test API endpoints
โ โโโ test_data_management.py # Test data management
โ โโโ kavya_evaluation_framework.py # Comprehensive evaluation
โ โโโ evaluate_models.py # Model evaluation
โ โโโ run_benchmark.py # Performance benchmarking
โ โโโ visualize_evaluation_results.py # Generate charts
โ โโโ example_usage.py # Usage examples
โ
โโโ scripts/ # Setup and deployment
โ โโโ setup_environment.py # Environment setup
โ โโโ verify_setup.py # Verify installation
โ โโโ quick_setup.sh # Quick setup script
โ โโโ setup_gcp_gpu.sh # GCP GPU VM creation
โ โโโ deploy_to_gcp.py # Deploy to GCP
โ โโโ monitor_gcp_costs.py # Cost monitoring
โ โโโ preprocess_data.py # Data preprocessing
โ โโโ download_datasets.py # Dataset downloads
โ
โโโ data/ # Data storage (created at runtime)
โ โโโ raw/ # Raw audio files
โ โโโ processed/ # Processed data
โ โโโ failed_cases/ # Error storage
โ โโโ metadata/ # Performance metrics
โ โโโ finetuning/ # Training datasets
โ โโโ versions/ # Dataset versions
โ
โโโ docs/ # Documentation
โ โโโ DATA_MANAGEMENT_SYSTEM.md # Data management guide
โ โโโ QUICK_START_DATA_MANAGEMENT.md # Quick start
โ โโโ GCP_SETUP_GUIDE.md # GCP setup instructions
โ
โโโ requirements.txt # Python dependencies
โโโ README.md # This file
โโโ docs/SETUP_INSTRUCTIONS.md # Detailed setup guide
โโโ docs/DATA_MANAGEMENT_SYSTEM.md # Data management guide
- Python 3.8+
- CUDA-capable GPU (optional, for faster inference)
- Google Cloud account (optional, for cloud integration)
# 1. Clone the repository
git clone <repository-url>
cd Adaptive-Self-Learning-Agentic-AI-System
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
# 3. Install dependencies
pip install -r requirements.txt
# 4. Verify installation
python scripts/verify_setup.pyfrom src.baseline_model import BaselineSTTModel
from src.agent import STTAgent
from src.data.integration import IntegratedDataManagementSystem
# 1. Initialize components
baseline_model = BaselineSTTModel(model_name="whisper")
agent = STTAgent(baseline_model=baseline_model)
data_system = IntegratedDataManagementSystem(base_dir="data/production")
# 2. Transcribe with agent
result = agent.transcribe_with_agent(
audio_path="data/test_audio/test_1.wav",
enable_auto_correction=True
)
print(f"Transcript: {result['transcript']}")
print(f"Errors detected: {result['error_detection']['error_count']}")
print(f"Error types: {result['error_detection']['error_types']}")
# 3. Record failures for learning
if result['error_detection']['has_errors']:
case_id = data_system.record_failed_transcription(
audio_path="data/test_audio/test_1.wav",
original_transcript=result['original_transcript'],
corrected_transcript=None, # Add correction later
error_types=list(result['error_detection']['error_types'].keys()),
error_score=result['error_detection']['error_score'],
inference_time=result.get('inference_time_seconds', 0)
)
print(f"Recorded case: {case_id}")GPU-optimized Whisper model wrapper for transcription.
Features:
- Automatic GPU/CPU detection
- TensorFloat-32 optimization for Ampere GPUs
- Beam search and KV cache optimization
- Model info and parameter reporting
Usage:
from src.baseline_model import BaselineSTTModel
model = BaselineSTTModel(model_name="whisper")
result = model.transcribe("audio.wav")
info = model.get_model_info()Intelligent agent with error detection and self-learning.
Components:
error_detector.py: Multi-heuristic error detectionself_learner.py: Pattern tracking and feedbackagent.py: Agent orchestration
Usage:
from src.agent import STTAgent
agent = STTAgent(baseline_model=model, error_threshold=0.3)
result = agent.transcribe_with_agent("audio.wav", enable_auto_correction=True)
# Provide feedback
agent.record_user_feedback(
transcript_id="123",
user_feedback="Good transcription",
is_correct=True
)
# Get statistics
stats = agent.get_agent_stats()Production-ready data management with cloud integration.
Components:
data_manager.py: Failed case storagemetadata_tracker.py: Performance trackingfinetuning_pipeline.py: Dataset preparationversion_control.py: Versioning and quality controlintegration.py: Unified interface
Usage:
from src.data.integration import IntegratedDataManagementSystem
system = IntegratedDataManagementSystem(
base_dir="data/production",
use_gcs=True # Enable GCP sync
)
# Record failed case
case_id = system.record_failed_transcription(...)
# Add correction
system.add_correction(case_id, "corrected transcript")
# Prepare fine-tuning dataset
dataset_info = system.prepare_finetuning_dataset(
min_error_score=0.5,
max_samples=1000,
create_version=True
)
# Track performance
system.record_training_performance(
model_version="whisper_v2",
wer=0.10,
cer=0.05
)
# Generate report
report = system.generate_comprehensive_report()Comprehensive evaluation with metrics and visualization.
Features:
- WER/CER calculation
- Error analysis
- Performance benchmarking
- Visualization generation
Usage:
from experiments.kavya_evaluation_framework import EvaluationFramework
framework = EvaluationFramework(model_name="whisper")
results = framework.run_comprehensive_evaluation(
eval_datasets=["data/processed/test_dataset"],
output_report=True
)Start the baseline API for simple transcription without agent features:
# Start API
uvicorn src.inference_api:app --reload --port 8000
# Test with curl
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@data/test_audio/test_1.wav"
# Get model info
curl "http://localhost:8000/model-info"
# Health check
curl "http://localhost:8000/health"Start the agent API with error detection and learning:
# Start API
uvicorn src.agent_api:app --reload --port 8000
# Transcribe with agent
curl -X POST "http://localhost:8000/agent/transcribe?auto_correction=true" \
-F "file=@data/test_audio/test_1.wav"
# Submit feedback
curl -X POST "http://localhost:8000/agent/feedback" \
-H "Content-Type: application/json" \
-d '{
"transcript_id": "123",
"user_feedback": "Good transcription",
"is_correct": true,
"corrected_transcript": "This is the correct transcript"
}'
# Get agent statistics
curl "http://localhost:8000/agent/stats"
# Get learning data
curl "http://localhost:8000/agent/learning-data"
# Baseline endpoints still work
curl -X POST "http://localhost:8000/transcribe" \
-F "file=@data/test_audio/test_1.wav"cd experiments
python kavya_evaluation_framework.pyOutput:
evaluation_outputs/evaluation_report.json- Detailed resultsevaluation_outputs/evaluation_summary.json- Summary metricsevaluation_outputs/EVALUATION_SUMMARY.md- Human-readable reportevaluation_outputs/visualizations/- Charts and graphs
python experiments/run_benchmark.pyOutput:
evaluation_outputs/benchmark_report.json- Performance metrics
python experiments/visualize_evaluation_results.pyGenerates:
- WER/CER comparison charts
- Error distribution histograms
- Comprehensive dashboards
python experiments/test_baseline.pypython experiments/test_agent.py# Start API in one terminal
uvicorn src.agent_api:app --reload --port 8000
# Test in another terminal
python experiments/test_api.pypython experiments/test_data_management.pyfrom src.data.integration import IntegratedDataManagementSystem
system = IntegratedDataManagementSystem()
stats = system.get_system_statistics()
print(f"Total failed cases: {stats['data_management']['total_failed_cases']}")
print(f"Correction rate: {stats['data_management']['correction_rate']:.2%}")# Prepare when you have enough corrected cases (500+)
dataset_info = system.prepare_finetuning_dataset(
min_error_score=0.5,
max_samples=1000,
balance_error_types=True,
create_version=True
)
print(f"Dataset created: {dataset_info['local_path']}")
print(f"Train samples: {dataset_info['stats']['train_size']}")
print(f"Val samples: {dataset_info['stats']['val_size']}")
print(f"Test samples: {dataset_info['stats']['test_size']}")report = system.generate_comprehensive_report(
output_path="data/production/reports/monthly_report.json"
)# Check system stats
python -c "
from src.data.integration import IntegratedDataManagementSystem
system = IntegratedDataManagementSystem()
stats = system.get_system_statistics()
print(f\"New cases today: {stats['data_management']['total_failed_cases']}\")
print(f\"Correction rate: {stats['data_management']['correction_rate']:.1%}\")
"# Generate report and prepare dataset if ready
python -c "
from src.data.integration import IntegratedDataManagementSystem
system = IntegratedDataManagementSystem()
report = system.generate_comprehensive_report()
stats = system.get_system_statistics()
if stats['data_management']['corrected_cases'] >= 500:
print('Ready to prepare fine-tuning dataset!')
dataset = system.prepare_finetuning_dataset(max_samples=1000)
"Transcribe audio file (baseline model only).
Request:
curl -X POST "http://localhost:8000/transcribe" \
-F "[email protected]"Response:
{
"transcript": "transcribed text",
"model": "whisper",
"inference_time_seconds": 0.5
}Get model information.
Response:
{
"name": "whisper",
"parameters": 72593920,
"device": "cpu",
"trainable_params": 71825920
}Health check endpoint.
Response:
{
"status": "healthy",
"model": "whisper"
}Transcribe with agent error detection.
Parameters:
auto_correction(optional): Enable automatic correction (default: false)
Request:
curl -X POST "http://localhost:8000/agent/transcribe?auto_correction=true" \
-F "[email protected]"Response:
{
"transcript": "corrected transcript",
"original_transcript": "original transcript",
"error_detection": {
"has_errors": true,
"error_count": 2,
"error_score": 0.65,
"errors": [...],
"error_types": {"all_caps": 1, "missing_punctuation": 1}
},
"corrections": {
"applied": true,
"count": 2,
"details": [...]
},
"inference_time_seconds": 0.5
}Submit user feedback for learning.
Request:
curl -X POST "http://localhost:8000/agent/feedback" \
-H "Content-Type: application/json" \
-d '{
"transcript_id": "unique-id",
"user_feedback": "feedback text",
"is_correct": true,
"corrected_transcript": "corrected version"
}'Response:
{
"status": "success",
"message": "Feedback recorded for learning"
}Get agent statistics.
Response:
{
"total_transcriptions": 100,
"error_detection": {
"threshold": 0.3,
"total_errors_detected": 25,
"error_rate": 0.25
},
"learning": {
"total_errors_learned": 25,
"total_corrections": 20,
"feedback_count": 15
}
}Get in-memory learning data (for external persistence).
Response:
{
"error_patterns": [...],
"correction_history": [...],
"feedback_records": [...]
}# Initialize system
from src.baseline_model import BaselineSTTModel
from src.agent import STTAgent
from src.data.integration import IntegratedDataManagementSystem
baseline = BaselineSTTModel()
agent = STTAgent(baseline)
data_system = IntegratedDataManagementSystem()
# Process audio
result = agent.transcribe_with_agent("audio.wav", enable_auto_correction=True)
# Automatically record if errors detected
if result['error_detection']['has_errors']:
case_id = data_system.record_failed_transcription(
audio_path="audio.wav",
original_transcript=result['original_transcript'],
error_types=list(result['error_detection']['error_types'].keys()),
error_score=result['error_detection']['error_score']
)
# User provides correction
if user_correction:
data_system.add_correction(case_id, user_correction)from experiments.kavya_evaluation_framework import EvaluationFramework
# Evaluate baseline model
framework = EvaluationFramework(model_name="whisper")
results = framework.run_comprehensive_evaluation(
eval_datasets=["data/processed/test_dataset"]
)
# Generate visualizations
framework.generate_visualizations()
# Get metrics
print(f"WER: {results['overall_metrics']['mean_wer']:.4f}")
print(f"CER: {results['overall_metrics']['mean_cer']:.4f}")from src.data.integration import IntegratedDataManagementSystem
system = IntegratedDataManagementSystem()
# 1. Check if ready for fine-tuning
stats = system.get_system_statistics()
if stats['data_management']['corrected_cases'] >= 500:
# 2. Prepare dataset
dataset_info = system.prepare_finetuning_dataset(
min_error_score=0.5,
max_samples=1000,
balance_error_types=True,
create_version=True
)
# 3. Fine-tune model (external script)
# train_model(dataset_info['local_path'])
# 4. Record training performance
system.record_training_performance(
model_version="whisper_finetuned_v1",
wer=0.08,
cer=0.04,
training_metadata={"epochs": 10, "batch_size": 16}
)
# 5. Compare versions
comparison = system.metadata_tracker.compare_model_versions(
"whisper_base",
"whisper_finetuned_v1"
)# Test baseline model
python experiments/test_baseline.py
# Test agent system
python experiments/test_agent.py
# Test data management
python experiments/test_data_management.py
# Test API (requires API to be running)
python experiments/test_api.pyโ
Testing baseline model transcription...
โ
Testing agent error detection...
โ
Testing data management system...
โ
Testing fine-tuning pipeline...
โ
All tests passed!
# 1. Install gcloud CLI
curl https://sdk.cloud.google.com | bash
# 2. Authenticate
gcloud auth login
gcloud config set project stt-agentic-ai-2025
# 3. Enable APIs
gcloud services enable compute.googleapis.com
gcloud services enable storage-api.googleapis.com
# 4. Create storage bucket
gsutil mb gs://stt-project-datasets
# 5. Verify setup
bash scripts/quick_setup.sh# Create GPU-enabled VM for faster inference
bash scripts/setup_gcp_gpu.sh# Deploy code and run on GCP
python scripts/deploy_to_gcp.py# Check GCP usage and costs
python scripts/monitor_gcp_costs.py- docs/SETUP_INSTRUCTIONS.md - Detailed setup guide
- docs/DATA_MANAGEMENT_SYSTEM.md - Complete data management guide
- docs/QUICK_START_DATA_MANAGEMENT.md - Quick start for data management
- docs/DATA_MANAGEMENT_SYSTEM.md - Complete data management API
- docs/QUICK_START_DATA_MANAGEMENT.md - Quick start for data management
- docs/GCP_SETUP_GUIDE.md - GCP setup instructions
Watch our comprehensive tutorial video that covers everything you need to get started:
๐น Watch Complete Tutorial Video
What's included in the video:
- ๐ฏ Project Presentation - Overview of the Adaptive Self-Learning Agentic AI System
- ๐ System Demo - Live demonstration of transcription, error detection, and data management
- ๐ ๏ธ Repository Setup - Step-by-step guide to setting up the repository and dependencies
- ๐ก Key Features Walkthrough - Deep dive into agent system, data management, and fine-tuning
- docs/UI_TUTORIAL.md - Control Panel UI walkthrough
- docs/CONTROL_PANEL_GUIDE.md - Complete control panel guide
- docs/FINETUNING_QUICK_START.md - Fine-tuning tutorial
- QUICK_REFERENCE.md - Quick command reference
- Model: 72.6M parameters
- WER: 0.10 (10%)
- CER: 0.0227 (2.27%)
- CPU Latency: 5.29s per sample
- GPU Latency: ~0.1-0.2s per sample (estimated)
- Throughput: 2.65 samples/second (CPU)
- Error Detection: 8+ heuristic types
- Detection Overhead: ~5-10% additional processing
- Correction Accuracy: Tracked via user feedback
- Storage: ~1KB per failed case
- Record Speed: ~10ms (local) + ~100ms (GCS)
- Scalability: Handles 100,000+ cases
- Dataset Prep: ~1-5 seconds per 1000 samples
โ
Week 1: Baseline model, evaluation framework, benchmarking, GCP integration
โ
Week 2: Agent system, error detection, data management, fine-tuning pipeline
- โ Baseline STT model with GPU optimization
- โ Real-time inference API (baseline + agent)
- โ Multi-heuristic error detection
- โ Self-learning feedback system
- โ Comprehensive data management
- โ Fine-tuning dataset preparation
- โ Version control and quality assurance
- โ Performance tracking and reporting
- โ Evaluation framework with visualizations
- โ GCP integration with cost monitoring
- ๐ Model fine-tuning (in progress)
- ๐ Automated retraining pipeline (planned)
- Automated model retraining based on collected data
- Multi-model support (Wav2Vec2, Conformer)
- Real-time streaming transcription
- Multi-language support
- Advanced error correction using LLMs
- Automated A/B testing framework
- Production deployment with load balancing
This project follows standard Python development practices:
- Code style: Black formatter
- Linting: Flake8
- Testing: Pytest
- Documentation: Docstrings with type hints
Part of the Adaptive Self-Learning Agentic AI System project.
- Team Member 1: Agent Integration & Error Detection
- Team Member 2: Data Management & Infrastructure
- Team Member 3: Evaluation Framework & Benchmarking
For issues, questions, or contributions:
- Check the documentation in
docs/ - Run example scripts in
experiments/ - Review weekly deliverable reports
Last Updated: November 24, 2025
Version: Week 2 Complete
Status: Production Ready โ