Harshness of speech

Hi there,

Thank you all for the great work on Optispeech. @[w11wo](https://github.com/w11wo) and I have been getting some great results.

I have a sound engineering background, and I’ve got a good idea of the different kinds of issues related to voice quality. I’d like to help and contribute from that perspective.

Optispeech doesn’t have the usual artefacts we’ve heard before, which has been great.

I’ve noticed the “harshness” of our trained voices. This screenshot from the Audacity spectrogram shows this. The bottom window is the original recorded voice, and the top is from Optispeech.

The screenshot points out one example of the ’s’ sound or sibilant sound. In the original, you can see that there is a gentle rise and fall of higher frequencies from 4-18k. Optispeech’s character is loud sibilance across the spectrum with a pronounced start and stop. You can make out the sibilant sounds in the speech.

Please let me know if you have any questions or if there is any way that I can help the project in general.

![screenshot_2024-09-23_at_10 35 30___am](https://github.com/user-attachments/assets/bc2761ac-e3fb-4329-9d84-33a1d2bdf2e4)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Harshness of speech #10

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Harshness of speech #10

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions