This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:
- Multi-Dataset Validation 🆕 - Evaluate on multiple validation sets simultaneously with macro averaging
- Comprehensive Metrics 🆕 - WER, CER, NLL, log-probability, entropy, and calibration (ECE)
- Production-Ready Tests 🆕 - Fast unit tests with pytest
- Timestamp training
- Prompt training
- Stochastic depth implementation for improved model generalization
- Correct implementation of SpecAugment for robust audio data augmentation
- Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
- Integration with Weights & Biases (wandb) for experiment tracking and model versioning
Evaluate your model on multiple validation datasets (e.g., clean speech, noisy environments, different microphones) with comprehensive metrics beyond WER:
- 6 metrics per dataset: WER, CER, NLL, log-prob, entropy, ECE
- Macro averaging: Unweighted mean across datasets (each dataset contributes equally)
- Per-utterance tracking: Detailed metrics for in-depth analysis
- Smart checkpointing: All models saved locally, manual W&B upload to avoid clutter
-
Clone the repository:
git clone https://github.com/i4ds/whisper-finetune.git cd whisper-finetune -
Create and activate a virtual environment (strongly recommended) with Python 3.11 or higher.
-
Install the package in editable mode:
pip install -e .Or using UV (very strongly recommended):
uv pip install -e .
Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the script.
-
Create a configuration file (see
configs/example_config.yamlfor a fully documented example) -
Run the fine-tuning script:
python src/whisper_finetune/scripts/finetune.py --config configs/example_config.yaml
-
(Optional) Merge LoRA weights into a standard Whisper checkpoint (saved via
save_model):python src/whisper_finetune/scripts/merge_lora_weights.py \ --input /path/to/best_model.pt \ --config configs/config_lora_only.yaml \ --output /path/to/last_model_merged.pt
Run the test suite to ensure everything is working:
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest
# Run with verbose output and coverage
pytest -v --cov=whisper_finetuneSee tests/README.md for more details.
We suggest to use faster-whisper. To convert your fine-tuned model, you can use the script located at src/whisper_finetune/scripts/convert_c2t.py.
Further improvement of quality can be archieved by serving the requests with whisperx.
Modify the YAML files in the configs/ directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.
The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning
We welcome contributions! Please feel free to submit a Pull Request.
If you encounter any problems, please file an issue along with a detailed description.
- Vincenzo Timmel ([email protected])
- Vincenzo Timmel ([email protected])
- Claudio Paonessa ([email protected])
This project is licensed under the MIT License - see the LICENSE file for details.