Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

repetition_penalty detuning and degrading the audio #45

Open
PolyVector opened this issue Feb 2, 2025 · 11 comments
Open

repetition_penalty detuning and degrading the audio #45

PolyVector opened this issue Feb 2, 2025 · 11 comments
Labels
enhancement New feature or request

Comments

@PolyVector
Copy link

PolyVector commented Feb 2, 2025

First, thank you for this creating wonderful model, I've been waiting for something like this since Suno/Udio launched last year. It's just amazing!

While tinkering with it, I've noticed repetition_penalty tends to put instruments out of tune and reduce the overall clarity. Repeating chords or bass lines will drift up and down, presumably to avoid repeating tokens too much. Setting repetition_penalty=1 fixes this for me

If I understand YuE architecture correctly, it's generating interleaved tokens to represent the 2 channels: [inst][vocal][inst][vocal][inst][vocal]
The two channels are speaking completely different "languages" as they use different codecs, but repetition_penalty has no concept of this, and treats them all as a single "sentence". That would be another reason to avoid using it.

Anyways, I could be way off base here, just thought I'd share this little observation.

@a43992899
Copy link
Collaborator

Thanks for the reminder. We adjusted the repetition penalty in the early 0.5B checkpoints because they were producing repetitive outputs. I’ll try your suggestion soon and see if it makes sense to turn it off.

@hf-lin
Copy link
Collaborator

hf-lin commented Feb 3, 2025

We have found that setting repetition_penalty to 1.0 makes the generation results better. We have updated the script to make this the default setting. Thank you!

@hf-lin hf-lin closed this as completed Feb 3, 2025
@PolyVector
Copy link
Author

Sorry to revive this, but I've been noticing that the generations are less creative without the repetition_penalty, which I suppose makes sense. Do you think this might be remedied by increasing the temperature, top_p or maybe decreasing the guidance?

I'll keep experimenting, this model is occupying all my time, it's just so good!

@hf-lin
Copy link
Collaborator

hf-lin commented Feb 5, 2025

How many samples did you evaluate on?

@PolyVector
Copy link
Author

PolyVector commented Feb 5, 2025

I've only recently been able to generate longer samples between 1:00 and 1:30, so I've been able to complete maybe 5-10 tracks at that length. I'm randomizing the prompts, so it's very unscientific, just a feeling I had.

Edit: I'll try to do some controlled tests w/ a fixed key tomorrow. :)

@a43992899
Copy link
Collaborator

We did a 20 sample AB test:

rp=1.0 is generally more stable, but the backing track sounds repetitive sometimes.
rp=1.2 will result in less repetitive backing track pattern, but some silence bad cases.

We will change default rp to 1.1, and add --rp to args.

@a43992899 a43992899 reopened this Feb 5, 2025
@a43992899 a43992899 added the enhancement New feature or request label Feb 5, 2025
@PolyVector
Copy link
Author

Oh you've really tested this, that would melt my computer! Thanks for your time! 🍻

@PolyVector
Copy link
Author

PolyVector commented Feb 8, 2025

I've been trying to tune the parameters to my taste and have had incredible luck with this setup:

  • top_k = 4
  • temperature = ~1.25
  • top_p = ~0.9
  • repetition_penalty = 1

Apparently top_k+top_p is magic sauce, at least for me... I suspect that reducing the model's choices to a few good ones helps stabilize the generation. The high temperature increases diversity without using repetition_penalty.

@a43992899
Copy link
Collaborator

a43992899 commented Feb 9, 2025

Interesting. Thank you for spending time on tuning this and telling us the suggested setup. Let us test it~ @hf-lin

@PolyVector
Copy link
Author

PolyVector commented Feb 9, 2025

Oh cool, I hope it's reproducible since my inference setup is completely hacked apart at this point.

I'm not sure that top_k is entirely necessary, maybe if top_p was perfectly tuned it would accomplish the same thing..? You know, it probably is required due to temperature skewing the distribution.

Feel free to ignore this, but in case anyone is interested, I threw together a LogitsProcessor for repetition penalty that's YuE-friendly. It only looks at the tokens on the same track, so instruments don't affect vocals. I haven't had time to actually test it yet. I assume that as it looks back across song sections they won't be guaranteed to stay on the same even/odd track, so it would require a padding token for any lyric section with an odd number of tokens.
Edit: I'll remove the code since there's no way this works correctly, it should be trivial for someone to implement who knows what they're doing. 🤷

Edit: I'm realizing I may have had too much caffeine, sorry for the crazy posts!

@WrongProtocol
Copy link

WrongProtocol commented Feb 9, 2025

I've been trying to tune the parameters to my taste and have had incredible luck with this setup:

  • top_k = 4
  • temperature = ~1.25
  • top_p = ~0.9
  • repetition_penalty = 1

Apparently top_k+top_p is magic sauce, at least for me... I suspect that reducing the model's choices to a few good ones helps stabilize the generation. The high temperature increases diversity without using repetition_penalty.

Using YuE Exllamav2, I'm running batches of 40 generations per prompt, ~45 second music each, and these settings made a Huge different in making the outcome more structured and predictable (i.e. usable).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants