hoarse pronunciation

First of all, I would like to express my sincere gratitude to the authors. This is an excellent piece of work! I have used ConvNext_TTS, and its synthesis quality is impressive, with very fast inference speed.

I trained the model on a roughly 300-hour dataset of both Chinese and English. However, the synthesized speech occasionally has a sudden hoarseness on individual words, and increasing the number of training epochs does not seem to resolve the issue. I've trained for approximately 4M steps, but the problem persists.

![image](https://github.com/user-attachments/assets/6ff6bbe7-e0c4-44de-9bd3-7f2a29995851)
For example, the last word of this speech segment seems to lack properly generated harmonics.
[baker_004.zip](https://github.com/user-attachments/files/17671959/baker_004.zip)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

hoarse pronunciation #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

hoarse pronunciation #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions