-
Notifications
You must be signed in to change notification settings - Fork 415
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
repetition_penalty detuning and degrading the audio #45
Comments
Thanks for the reminder. We adjusted the repetition penalty in the early 0.5B checkpoints because they were producing repetitive outputs. I’ll try your suggestion soon and see if it makes sense to turn it off. |
We have found that setting repetition_penalty to 1.0 makes the generation results better. We have updated the script to make this the default setting. Thank you! |
Sorry to revive this, but I've been noticing that the generations are less creative without the repetition_penalty, which I suppose makes sense. Do you think this might be remedied by increasing the temperature, top_p or maybe decreasing the guidance? I'll keep experimenting, this model is occupying all my time, it's just so good! |
How many samples did you evaluate on? |
I've only recently been able to generate longer samples between 1:00 and 1:30, so I've been able to complete maybe 5-10 tracks at that length. I'm randomizing the prompts, so it's very unscientific, just a feeling I had. Edit: I'll try to do some controlled tests w/ a fixed key tomorrow. :) |
We did a 20 sample AB test: rp=1.0 is generally more stable, but the backing track sounds repetitive sometimes. We will change default rp to 1.1, and add --rp to args. |
Oh you've really tested this, that would melt my computer! Thanks for your time! 🍻 |
I've been trying to tune the parameters to my taste and have had incredible luck with this setup:
Apparently top_k+top_p is magic sauce, at least for me... I suspect that reducing the model's choices to a few good ones helps stabilize the generation. The high temperature increases diversity without using repetition_penalty. |
Interesting. Thank you for spending time on tuning this and telling us the suggested setup. Let us test it~ @hf-lin |
Oh cool, I hope it's reproducible since my inference setup is completely hacked apart at this point.
Edit: I'm realizing I may have had too much caffeine, sorry for the crazy posts! |
Using YuE Exllamav2, I'm running batches of 40 generations per prompt, ~45 second music each, and these settings made a Huge different in making the outcome more structured and predictable (i.e. usable). |
First, thank you for this creating wonderful model, I've been waiting for something like this since Suno/Udio launched last year. It's just amazing!
While tinkering with it, I've noticed repetition_penalty tends to put instruments out of tune and reduce the overall clarity. Repeating chords or bass lines will drift up and down, presumably to avoid repeating tokens too much. Setting repetition_penalty=1 fixes this for me
If I understand YuE architecture correctly, it's generating interleaved tokens to represent the 2 channels: [inst][vocal][inst][vocal][inst][vocal]
The two channels are speaking completely different "languages" as they use different codecs, but repetition_penalty has no concept of this, and treats them all as a single "sentence". That would be another reason to avoid using it.
Anyways, I could be way off base here, just thought I'd share this little observation.
The text was updated successfully, but these errors were encountered: