Releases: coqui-ai/TTS
v0.4.0
🐸 v0.4.0
-
Update multi-speaker training API.
-
VCTK recipes for all the TTS models.
-
Documentation for multi-speaker training.
-
Pre-trained Ukrainian GlowTTS model from 👑 https://github.com/robinhad/ukrainian-tts
-
Pre-trained FastPitch VCTK model
-
Dataset downloaders for LJSpeech and VCTK under
TTS.utils.downloaders
-
Documentation reformatting.
-
Trainer V2 and compact. updates in model implementations.
This update makes the Trainer V2 responsible for only the training of a model. The rest is excluded from the trainer and they need to be done either in the model or before calling the trainer.
Try out new models
- Pre-trained FastPitch VCTK model
tts --model_name tts_models/en/vctk/fast_pitch --text "This is my sample text to voice." --speaker_idx VCTK_p229
- Pre-trained Ukrainian GlowTTS model from 👑 https://github.com/robinhad/ukrainian-tts
tts --model_name tts_models/uk/mai/glow-tts --text "Це зразок тексту, щоб спробувати нашу модель."
v0.3.1
v0.3.0
🐸 v0.3.0
New ForwardTTS
implementation.
This version implements a new ForwardTTS
interface that can be configured as any feed-forward TTS model that uses a duration predictor at inference time. Currently, we provide 3 pre-configured models and plan to implement one more.
- SpeedySpeech
- FastSpeech
- FastPitch
- FastSpeech 2 (TODO)
Through this API, any model can be trained in two ways. Either using pre-computed durations from a pre-trained Tacotron model or using an alignment network to learn durations from the dataset. The alignment network is only used at training and discarded at inference. You can set which mode you want to use by just setting the use_aligner
field in the configuration.
This new API will help us to design more efficient inference run-time for all these models using ONNX like run-time optimizers.
Old FastPitch
and SpeedySpeech
implementations are deprecated for the sake of this new implementation.
Fine-Tuning Documentation
This version introduces documentation for model fine-tunning. You can see it under https://tts.readthedocs.io/ when this is merged.
New Model Releases
- English Speedy Speech model on LJSpeech
Try out:
tts --text "This is a sample text for my model to speak." --model_name tts_models/en/ljspeech/speedy-speech
- Fine-tuned UnivNet Vocoder
Try out:
tts --text "This is how it is." --model_name tts_models/en/ljspeech/tacotron2-DDC_ph
v0.2.2
🐸 v0.2.2
FastPitch model with an Aligner Network is implemented with other changes accompanying it.
- Alignment Network: https://arxiv.org/abs/2108.10447
- Fast Pitch Model: https://arxiv.org/abs/2006.06873
Thanks to 👑 @kaiidams for his Japanese g2p update.
Try FastPitch model:
tts --model_name tts_models/en/ljspeech/fast_pitch --text "This is my sample text to voice."
v0.2.1
🐸 v0.2.1
🐞Bug Fixes
- Fix distributed training and solve compact issues with the Trainer API.
- Fix bugs in the VITS model implementation that caused training instabilities.
- Fix some Abstract Class usage issues in WaveRNN and WaveGrad models.
💾 Code updates
- Use a single gradient scaler for all the optimizers in TrainerAPI. Previously, we used one scaler per optimizer.
🏃♀️Operational Updates
- Update to Pylint 2.10.2
Thanks to 👑 @fijipants for his fixes 🛠️
Thanks to 👑 @agrinh for his flag and discussion in DDP issues
v0.2.0
🐸 v0.2.0
🐞Bug Fixes
- Fix phoneme pre-compute issue.
- Fix multi-speaker setup in Tacotron models.
- Fix small issues in the Trainer regarding multi-optimizer training.
💾 Code updates
- W&B integration for model logging and experiment tracking, (👑 @AyushExel)
Code uses the Tensorboard by default. For W&B, you need to setlog_dashboard
option in the config and defineproject_name
andwandb_entity
. - Use ffsspec for model saving/loading (👑 @agrinh)
- Allow models to define their own symbol list with in-class
make_symbols()
- Allow choosing after epoch or after step LR scheduler update with
scheduler_after_epoch
. - Make converting spectrogram from amplitude to DB optional with
do_amp_to_db_linear
anddo_amp_to_db_linear
options.
🗒️ Docs updates
- Add GlowTTS and VITS docs.
🤖 Model implementations
- VITS implementation with pre-trained models (https://arxiv.org/abs/2106.06103)
🚀 Model releases
-
vocoder_models--ja--kokoro--hifigan_v1 (👑 @kaiidams)
HiFiGAN model trained on Kokoro dataset to complement the existing Japanese model.
Try it out:
tts --model_name tts_models/ja/kokoro/tacotron2-DDC --text "こんにちは、今日はいい天気ですか?"
-
tts_models--en--ljspeech--tacotronDDC_ph
TacotronDDC with phonemes trained on LJSpeech. It is to fix the pronunciation errors caused by the raw text
in the released TacotronDDC model.Try it out:
tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"
-
tts_models--en--ljspeech--vits
VITS model trained on LJSpeech.
Try it out:
tts --model_name tts_models/en/ljspeech/vits --text "hello, how are you today?"
-
tts_models--en--vctk--vits
VITS model trained on VCTK with multi-speaker support.
Try it out:
tts-server --model_name tts_models/en/vctk/vits
-
vocoder_models--en--ljspeech--univnet
UnivNet model trained on LJSpeech to complement the TacotronDDC model above.
Try it out:
tts --model_name tts_models/en/ljspeech/tacotronDDC_ph --text "hello, how are you today?"
v0.1.3
🐸 v0.1.3
🐞Bug Fixes
-
Fix Tacotron stopnet training
Models trained after v0.1 had the problem that the stopnet was not trained. It caused models not to generate audio
at evaluation and inference time. -
Fix
test_run
at training. (👑 @WeberJulian)In training 🐸 TTS would skip the
test_run
and not generate test audio samples. Now it is fixed :). -
Fix
server.py
for multi-speaker models.
💾 Code updates
- Refactoring in
compute_embeddings.py
for efficiency and compatibility with the latest speaker encoder. (👑 @Edresson)
🚀 Model releases
- New Fullband-MelGAN model for Thorsten German dataset. (👑 @thorstenMueller)
Try it:
tts --model_name tts_models/de/thorsten/tacotron2-DCA --text "Was geschehen ist geschehen, es ist geschichte."
v0.1.2
v0.1.1
v0.1.0
🐸 v0.1.0
In a nutshell, there are a ton of updates in this release. I don't know if we can cover them all here but let's try.
After this release, 🐸 TTS stands on the following architecture.
Trainer API
for training.Synthesizer API
for inference.ModelManager API
for managing 🐸TTS model zoo.SpeakerManager API
for managing speakers in a multi-speaker setting.- (TBI)
Exporter API
for exporting models to ONNX, TorchScript, etc. - (TBI)
Data Processing API
for making a dataset ready for training. Model API
for implementing models, compatible with all the other components above.
Updates
💾 Code updates
-
Brand new
Trainer API
We unified all the training code in a lightweight but feature complete
Trainer API
. From now on all the 🐸TTS
models will use this new API for training.It provides mixed precision (with Nvidia's APEX of
torch.amp
) and multi-gpu training for all the models. -
Brand new
Model API
Abstract
BaseModel
and itsBaseTTS
,BaseVocoder
child classes are used as the basis of the 🐸TTS models now.
Any model that implements one of these classes, works seamlessly with theTrainer
andSynthesizer
. -
Brand new 🐸TTS
recipes
.We decided to merge the recipes to the main project. Now we host recipes for the LJspeech dataset, covering all the implemented models.
So you can pick the model you want, change the parameters, and train your own model easily.Thanks to the new
Trainer API
and 👩✈️ Coqpit integration, we could implement these recipes with pure python. -
Updates
SpeakerManager API
TTS.utilsSpeakerManager
is now the core unit to manage speakers in a multi-speaker model and interface aSpeakerEncoder
model with thetts
andvocoder
models. -
Updated model training mechanics.
You can now use pure Python to define your model and run the training. It is useful to train models on a Jupyter
Notebook or the other python environments.We also keep the old mechanics by using
TTS/bin/train_tts.py
or ``TTS/bin/train_vocoder.py`. You just need to
change the previous training script name with one of these two based on your model.python TTS/bin/train_tacotron.py --config_path config.json
becomes
python TTS/bin/train_tts.py --config_path config.json
-
Use 👩
✈️ Coqpit for managing model class arguments.Now all the model arguments are defined in a
coqpit
class and imported by the model config. -
gruut
based character to phoneme conversion. (👑 @synesthesiam)As a drop-in replacement for the previous solution that is compatible with the released models. So now all these
models are functional again without version nitpicking. -
Set
test_sentences
in the config rather than providing a txt file. -
Set the maximum number of decoder steps of
Tacotron1-2
models in the config.
🏃♀️ Operational Updates
- FINALLY DOCUMENTATION!! https://tts.readthedocs.io
- Enable support for Python 3.9
- Changes for PyTorch 1.9.0
🏅 Model implementations
- Univnet GAN Vocoder: https://arxiv.org/pdf/2106.07889.pdf (👑 @rishikksh20)
🚀 Model releases
We solved the compat issues and re-release some of the models. You can see them in the released binaries section.
You don't need to change anything. If you use v0.1.0, by default, it uses these new models.