Skip to content

hoarse pronunciation #14

@Shenkailai

Description

@Shenkailai

First of all, I would like to express my sincere gratitude to the authors. This is an excellent piece of work! I have used ConvNext_TTS, and its synthesis quality is impressive, with very fast inference speed.

I trained the model on a roughly 300-hour dataset of both Chinese and English. However, the synthesized speech occasionally has a sudden hoarseness on individual words, and increasing the number of training epochs does not seem to resolve the issue. I've trained for approximately 4M steps, but the problem persists.

image
For example, the last word of this speech segment seems to lack properly generated harmonics.
baker_004.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions