Whisper-Finetune

This repository contains code for fine-tuning the Whisper speech-to-text model. It utilizes Weights & Biases (wandb) for logging metrics and storing models. Key features include:

Multi-Dataset Validation 🆕 - Evaluate on multiple validation sets simultaneously with macro averaging
Comprehensive Metrics 🆕 - WER, CER, NLL, log-probability, entropy, and calibration (ECE)
Production-Ready Tests 🆕 - Fast unit tests with pytest
Timestamp training
Prompt training
Stochastic depth implementation for improved model generalization
Correct implementation of SpecAugment for robust audio data augmentation
Checkpointing functionality to save and resume training progress, crucial for handling long-running experiments and potential interruptions
Integration with Weights & Biases (wandb) for experiment tracking and model versioning

What's New

Multi-Dataset Validation System

Evaluate your model on multiple validation datasets (e.g., clean speech, noisy environments, different microphones) with comprehensive metrics beyond WER:

6 metrics per dataset: WER, CER, NLL, log-prob, entropy, ECE
Macro averaging: Unweighted mean across datasets (each dataset contributes equally)
Per-utterance tracking: Detailed metrics for in-depth analysis
Smart checkpointing: All models saved locally, manual W&B upload to avoid clutter

Installation

Clone the repository:

git clone https://github.com/i4ds/whisper-finetune.git
cd whisper-finetune

Create and activate a virtual environment (strongly recommended) with Python 3.11 or higher.
Install the package in editable mode:
```
pip install -e .
```
Or using UV (very strongly recommended):
```
uv pip install -e .
```

Data

Please have a look at https://github.com/i4Ds/whisper-prep. The data is passed as a 🤗 Datasets to the script.

Usage

Create a configuration file (see configs/example_config.yaml for a fully documented example)

Run the fine-tuning script:

python src/whisper_finetune/scripts/finetune.py --config configs/example_config.yaml

(Optional) Merge LoRA weights into a standard Whisper checkpoint (saved via save_model):

python src/whisper_finetune/scripts/merge_lora_weights.py \
    --input /path/to/best_model.pt \
    --config configs/config_lora_only.yaml \
  --output /path/to/last_model_merged.pt

Testing

Run the test suite to ensure everything is working:

# Install dev dependencies
pip install -e ".[dev]"

# Run tests
pytest

# Run with verbose output and coverage
pytest -v --cov=whisper_finetune

See tests/README.md for more details.

Deployment

We suggest to use faster-whisper. To convert your fine-tuned model, you can use the script located at src/whisper_finetune/scripts/convert_c2t.py.

Further improvement of quality can be archieved by serving the requests with whisperx.

Configuration

Modify the YAML files in the configs/ directory to customize your fine-tuning process. Refer to the existing configuration files for examples of available options.

Thank you

The starting point of this repository was the excellent repository by Jumon at https://github.com/jumon/whisper-finetuning

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

Support

If you encounter any problems, please file an issue along with a detailed description.

Maintainer

Vincenzo Timmel ([email protected])

Developers

Vincenzo Timmel ([email protected])
Claudio Paonessa ([email protected])

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 281 Commits
.github		.github
configs		configs
src/whisper_finetune		src/whisper_finetune
tests		tests
whisper_v3_turbo_utils		whisper_v3_turbo_utils
whisper_v3_utils		whisper_v3_utils
.env-template		.env-template
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_eda_augment.ipynb		_eda_augment.ipynb
_spec_tw_eda.ipynb		_spec_tw_eda.ipynb
download_dataset.bash		download_dataset.bash
multi_submit.sh		multi_submit.sh
pyproject.toml		pyproject.toml
sc_debug.sh		sc_debug.sh
sc_sbatch.sh		sc_sbatch.sh
setup.py		setup.py
to_hu.sh		to_hu.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Whisper-Finetune

What's New

Multi-Dataset Validation System

Installation

Data

Usage

Testing

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Whisper-Finetune

What's New

Multi-Dataset Validation System

Installation

Data

Usage

Testing

Deployment

Configuration

Thank you

Contributing

Support

Maintainer

Developers

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages