Skip to content

Commit 25347bb

Browse files
committed
rename tacotron2, test=tts
1 parent e0280ff commit 25347bb

30 files changed

+381
-51
lines changed

Diff for: CHANGELOG.md

+22-7
Original file line numberDiff line numberDiff line change
@@ -1,23 +1,38 @@
11
# Changelog
22

3-
Date: 2022-1-19, Author: yt605155624.
4-
Add features to: T2S:
5-
- Add csmsc Tacotron2.
3+
Date: 2022-1-29, Author: yt605155624.
4+
Add features to: T2S:
5+
- Update aishell3 vc0 with new Tacotron2.
6+
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1419
7+
8+
Date: 2022-1-29, Author: yt605155624.
9+
Add features to: T2S:
10+
- Add ljspeech Tacotron2.
11+
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1416
12+
13+
Date: 2022-1-24, Author: yt605155624.
14+
Add features to: T2S:
15+
- Add csmsc WaveRNN.
16+
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1379
17+
18+
Date: 2022-1-19, Author: yt605155624.
19+
Add features to: T2S:
20+
- Add csmsc Tacotron2.
621
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1314
722

823

924
Date: 2022-1-10, Author: Jackwaterveg.
10-
Add features to: CLI:
11-
- Support English (librispeech/asr1/transformer).
25+
Add features to: CLI:
26+
- Support English (librispeech/asr1/transformer).
1227
- Support choosing `decode_method` for conformer and transformer models.
1328
- Refactor the config, using the unified config.
1429
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1297
1530

1631
***
1732

1833
Date: 2022-1-17, Author: Jackwaterveg.
19-
Add features to: CLI:
20-
- Support deepspeech2 online/offline model(aishell).
34+
Add features to: CLI:
35+
- Support deepspeech2 online/offline model(aishell).
2136
- PRLink: https://github.com/PaddlePaddle/PaddleSpeech/pull/1356
2237

2338
***

Diff for: README.md

+13-7
Original file line numberDiff line numberDiff line change
@@ -317,14 +317,15 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
317317
</tr>
318318
<tr>
319319
<td rowspan="4">Acoustic Model</td>
320-
<td >Tacotron2</td>
321-
<td rowspan="2" >LJSpeech</td>
320+
<td>Tacotron2</td>
321+
<td>LJSpeech / CSMSC</td>
322322
<td>
323-
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a>
323+
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a> / <a href = "./examples/csmsc/tts0">tacotron2-csmsc</a>
324324
</td>
325325
</tr>
326326
<tr>
327327
<td>Transformer TTS</td>
328+
<td>LJSpeech</td>
328329
<td>
329330
<a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
330331
</td>
@@ -344,7 +345,7 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
344345
</td>
345346
</tr>
346347
<tr>
347-
<td rowspan="5">Vocoder</td>
348+
<td rowspan="6">Vocoder</td>
348349
<td >WaveFlow</td>
349350
<td >LJSpeech</td>
350351
<td>
@@ -378,7 +379,14 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
378379
<td>
379380
<a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a>
380381
</td>
381-
<tr>
382+
</tr>
383+
<tr>
384+
<td >WaveRNN</td>
385+
<td >CSMSC</td>
386+
<td>
387+
<a href = "./examples/csmsc/voc6">WaveRNN-csmsc</a>
388+
</td>
389+
</tr>
382390
<tr>
383391
<td rowspan="3">Voice Cloning</td>
384392
<td>GE2E</td>
@@ -416,7 +424,6 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
416424
</tr>
417425
</thead>
418426
<tbody>
419-
420427
<tr>
421428
<td>Audio Classification</td>
422429
<td>ESC-50</td>
@@ -440,7 +447,6 @@ PaddleSpeech supports a series of most popular models. They are summarized in [r
440447
</tr>
441448
</thead>
442449
<tbody>
443-
444450
<tr>
445451
<td>Punctuation Restoration</td>
446452
<td>IWLST2012_zh</td>

Diff for: README_cn.md

+13-8
Original file line numberDiff line numberDiff line change
@@ -315,14 +315,15 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
315315
</tr>
316316
<tr>
317317
<td rowspan="4">声学模型</td>
318-
<td >Tacotron2</td>
319-
<td rowspan="2" >LJSpeech</td>
318+
<td>Tacotron2</td>
319+
<td>LJSpeech / CSMSC</td>
320320
<td>
321-
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a>
321+
<a href = "./examples/ljspeech/tts0">tacotron2-ljspeech</a> / <a href = "./examples/csmsc/tts0">tacotron2-csmsc</a>
322322
</td>
323323
</tr>
324324
<tr>
325325
<td>Transformer TTS</td>
326+
<td>LJSpeech</td>
326327
<td>
327328
<a href = "./examples/ljspeech/tts1">transformer-ljspeech</a>
328329
</td>
@@ -342,7 +343,7 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
342343
</td>
343344
</tr>
344345
<tr>
345-
<td rowspan="5">声码器</td>
346+
<td rowspan="6">声码器</td>
346347
<td >WaveFlow</td>
347348
<td >LJSpeech</td>
348349
<td>
@@ -376,7 +377,14 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
376377
<td>
377378
<a href = "./examples/csmsc/voc5">HiFiGAN-csmsc</a>
378379
</td>
379-
<tr>
380+
</tr>
381+
<tr>
382+
<td >WaveRNN</td>
383+
<td >CSMSC</td>
384+
<td>
385+
<a href = "./examples/csmsc/voc6">WaveRNN-csmsc</a>
386+
</td>
387+
</tr>
380388
<tr>
381389
<td rowspan="3">声音克隆</td>
382390
<td>GE2E</td>
@@ -415,8 +423,6 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
415423
</tr>
416424
</thead>
417425
<tbody>
418-
419-
420426
<tr>
421427
<td>声音分类</td>
422428
<td>ESC-50</td>
@@ -440,7 +446,6 @@ PaddleSpeech 的 **语音合成** 主要包含三个模块:文本前端、声
440446
</tr>
441447
</thead>
442448
<tbody>
443-
444449
<tr>
445450
<td>标点恢复</td>
446451
<td>IWLST2012_zh</td>

Diff for: docs/source/released_model.md

+3-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
# Released Models
23

34
## Speech-to-Text Models
@@ -32,7 +33,8 @@ Language Model | Training Data | Token-based | Size | Descriptions
3233
### Acoustic Models
3334
Model Type | Dataset| Example Link | Pretrained Models|Static Models|Size (static)
3435
:-------------:| :------------:| :-----: | :-----:| :-----:| :-----:
35-
Tacotron2|LJSpeech|[tacotron2-vctk](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.3.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.3.zip)|||
36+
Tacotron2|LJSpeech|[tacotron2-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts0)|[tacotron2_ljspeech_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_ljspeech_ckpt_0.2.0.zip)|||
37+
Tacotron2|CSMSC|[tacotron2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts0)|[tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)|[tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip)|94.95MB|
3638
TransformerTTS| LJSpeech| [transformer-ljspeech](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/ljspeech/tts1)|[transformer_tts_ljspeech_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/transformer_tts/transformer_tts_ljspeech_ckpt_0.4.zip)|||
3739
SpeedySpeech| CSMSC | [speedyspeech-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts2) |[speedyspeech_nosil_baker_ckpt_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_ckpt_0.5.zip)|[speedyspeech_nosil_baker_static_0.5.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/speedyspeech/speedyspeech_nosil_baker_static_0.5.zip)|12MB|
3840
FastSpeech2| CSMSC |[fastspeech2-csmsc](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/csmsc/tts3)|[fastspeech2_nosil_baker_ckpt_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_ckpt_0.4.zip)|[fastspeech2_nosil_baker_static_0.4.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/fastspeech2/fastspeech2_nosil_baker_static_0.4.zip)|157MB|

Diff for: docs/source/tts/quick_start_cn.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -202,4 +202,4 @@ sf.write(
202202
audio_path,
203203
wav.numpy(),
204204
samplerate=fastspeech2_config.fs)
205-
```
205+
```

Diff for: examples/aishell3/vc0/README.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# Tacotron2 + AISHELL-3 Voice Cloning
32
This example contains code used to train a [Tacotron2](https://arxiv.org/abs/1712.05884) model with [AISHELL-3](http://www.aishelltech.com/aishell_3). The trained model can be used in Voice Cloning Task, We refer to the model structure of [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf). The general steps are as follows:
43
1. Speaker Encoder: We use Speaker Verification to train a speaker encoder. Datasets used in this task are different from those used in `Tacotron2` because the transcriptions are not needed, we use more datasets, refer to [ge2e](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/ge2e).
@@ -17,7 +16,7 @@ mkdir data_aishell3
1716
tar zxvf data_aishell3.tgz -C data_aishell3
1817
```
1918
### Get MFA Result and Extract
20-
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get durations for aishell3_fastspeech2.
19+
We use [MFA2.x](https://github.com/MontrealCorpusTools/Montreal-Forced-Aligner) to get phonemes for Tacotron2, the durations of MFA are not needed here.
2120
You can download from here [aishell3_alignment_tone.tar.gz](https://paddlespeech.bj.bcebos.com/MFA/AISHELL-3/with_tone/aishell3_alignment_tone.tar.gz), or train your MFA model reference to [mfa example](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/mfa) (use MFA1.x now) of our repo.
2221

2322
## Pretrained GE2E Model

Diff for: examples/aishell3/vc0/path.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@ export PYTHONDONTWRITEBYTECODE=1
99
export PYTHONIOENCODING=UTF-8
1010
export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
1111

12-
MODEL=new_tacotron2
12+
MODEL=tacotron2
1313
export BIN_DIR=${MAIN_ROOT}/paddlespeech/t2s/exps/${MODEL}

Diff for: examples/aishell3/vc1/README.md

-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,3 @@
1-
21
# FastSpeech2 + AISHELL-3 Voice Cloning
32
This example contains code used to train a [FastSpeech2](https://arxiv.org/abs/2006.04558) model with [AISHELL-3](http://www.aishelltech.com/aishell_3). The trained model can be used in Voice Cloning Task, We refer to the model structure of [Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis](https://arxiv.org/pdf/1806.04558.pdf). The general steps are as follows:
43
1. Speaker Encoder: We use Speaker Verification to train a speaker encoder. Datasets used in this task are different from those used in `FastSpeech2` because the transcriptions are not needed, we use more datasets, refer to [ge2e](https://github.com/PaddlePaddle/PaddleSpeech/tree/develop/examples/other/ge2e).

Diff for: examples/csmsc/tts0/README.md

+2
Original file line numberDiff line numberDiff line change
@@ -212,6 +212,8 @@ optional arguments:
212212
Pretrained Tacotron2 model with no silence in the edge of audios:
213213
- [tacotron2_csmsc_ckpt_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_ckpt_0.2.0.zip)
214214

215+
The static model can be downloaded here [tacotron2_csmsc_static_0.2.0.zip](https://paddlespeech.bj.bcebos.com/Parakeet/released_models/tacotron2/tacotron2_csmsc_static_0.2.0.zip).
216+
215217

216218
Model | Step | eval/loss | eval/l1_loss | eval/mse_loss | eval/bce_loss| eval/attn_loss
217219
:-------------:| :------------:| :-----: | :-----: | :--------: |:--------:|:---------:

Diff for: examples/csmsc/tts0/local/synthesize_e2e.sh

+1
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ ckpt_name=$3
77
stage=0
88
stop_stage=0
99

10+
# TODO: tacotron2 动转静的结果没有静态图的响亮, 可能还是 decode 的时候某个函数动静不对齐
1011
if [ ${stage} -le 0 ] && [ ${stop_stage} -ge 0 ]; then
1112
FLAGS_allocator_strategy=naive_best_fit \
1213
FLAGS_fraction_of_gpu_memory_to_use=0.01 \

Diff for: examples/csmsc/tts0/path.sh

+1-1
Original file line numberDiff line numberDiff line change
@@ -9,5 +9,5 @@ export PYTHONDONTWRITEBYTECODE=1
99
export PYTHONIOENCODING=UTF-8
1010
export PYTHONPATH=${MAIN_ROOT}:${PYTHONPATH}
1111

12-
MODEL=new_tacotron2
12+
MODEL=tacotron2
1313
export BIN_DIR=${MAIN_ROOT}/paddlespeech/t2s/exps/${MODEL}

Diff for: examples/csmsc/tts0/run.sh

+5
Original file line numberDiff line numberDiff line change
@@ -35,3 +35,8 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
3535
# synthesize_e2e, vocoder is pwgan
3636
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
3737
fi
38+
39+
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
40+
# inference with static model
41+
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
42+
fi

Diff for: examples/csmsc/tts3/run.sh

+5
Original file line numberDiff line numberDiff line change
@@ -36,3 +36,8 @@ if [ ${stage} -le 3 ] && [ ${stop_stage} -ge 3 ]; then
3636
CUDA_VISIBLE_DEVICES=${gpus} ./local/synthesize_e2e.sh ${conf_path} ${train_output_path} ${ckpt_name} || exit -1
3737
fi
3838

39+
if [ ${stage} -le 4 ] && [ ${stop_stage} -ge 4 ]; then
40+
# inference with static model
41+
CUDA_VISIBLE_DEVICES=${gpus} ./local/inference.sh ${train_output_path} || exit -1
42+
fi
43+

0 commit comments

Comments
 (0)