Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First 3 words getting cut off #21

Open
iskng opened this issue Jan 22, 2025 · 5 comments
Open

First 3 words getting cut off #21

iskng opened this issue Jan 22, 2025 · 5 comments

Comments

@iskng
Copy link

iskng commented Jan 22, 2025

Try this sentence with any voice.

We have entered a moment in history where artificial intelligence stands poised to redefine our very conception of progress.

@lucasjinreal
Copy link
Owner

Some style actually might have issues. However some style is not. I think this might caused by kokoro model training dataset of certain speaker.

Can u take a deep look at different speakers?

@iskng
Copy link
Author

iskng commented Jan 23, 2025

First complete word from the sentence
"We have entered a moment in history where artificial intelligence stands poised to redefine our very conception of progress."

af: 6 history
af_bella: 6 history
af_nicole: 3 a
af_sarah: 5 in
af_sky: 4 moment
am_adam: 4 moment
am_michael: 4 moment
bf_emma: 2 entered
bf_isabella: 2 entered
bm_george: 2 entered
bm_lewis: 3 a

@lucasjinreal
Copy link
Owner

the first several words is muted or relatively small volume? Does all sentences like this

@mrorigo
Copy link
Contributor

mrorigo commented Jan 28, 2025

I tried a hack in ort_koko.rs, and prepended the token sequence with 3 'silent' tokens, and it seems to get rid of this problem:

    pub fn infer(
        &self,
        tokens: Vec<Vec<i64>>,
        styles: Vec<Vec<f32>>,
    ) -> Result<ArrayBase<OwnedRepr<f32>, IxDyn>, Box<dyn std::error::Error>> {

        let mut tokens = tokens;
        let mut first_entry = tokens[0].clone();
        for _ in 0..3 {
            first_entry.insert(0, 30);  // token 30 seems to be kinda silent..
        }
        tokens[0] = first_entry;

@lucasjinreal idk if this is a viable solution? wfm, but might introduce a short pause when inferring multiple sentences..

@lucasjinreal
Copy link
Owner

@mrorigo This is a solution, however, I would suggestion append outside ort, since it is for universal inference only. The tokens can be appended outside as well.

Would u consider make a PR for the feature?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants