repetition_penalty detuning and degrading the audio #45

PolyVector · 2025-02-02T16:48:21Z

First, thank you for this creating wonderful model, I've been waiting for something like this since Suno/Udio launched last year. It's just amazing!

While tinkering with it, I've noticed repetition_penalty tends to put instruments out of tune and reduce the overall clarity. Repeating chords or bass lines will drift up and down, presumably to avoid repeating tokens too much. Setting repetition_penalty=1 fixes this for me

If I understand YuE architecture correctly, it's generating interleaved tokens to represent the 2 channels: [inst][vocal][inst][vocal][inst][vocal]
The two channels are speaking completely different "languages" as they use different codecs, but repetition_penalty has no concept of this, and treats them all as a single "sentence". That would be another reason to avoid using it.

Anyways, I could be way off base here, just thought I'd share this little observation.

a43992899 · 2025-02-02T18:38:55Z

Thanks for the reminder. We adjusted the repetition penalty in the early 0.5B checkpoints because they were producing repetitive outputs. I’ll try your suggestion soon and see if it makes sense to turn it off.

hf-lin · 2025-02-03T10:09:58Z

We have found that setting repetition_penalty to 1.0 makes the generation results better. We have updated the script to make this the default setting. Thank you!

PolyVector · 2025-02-05T01:40:10Z

Sorry to revive this, but I've been noticing that the generations are less creative without the repetition_penalty, which I suppose makes sense. Do you think this might be remedied by increasing the temperature, top_p or maybe decreasing the guidance?

I'll keep experimenting, this model is occupying all my time, it's just so good!

hf-lin · 2025-02-05T03:01:14Z

How many samples did you evaluate on?

PolyVector · 2025-02-05T03:12:11Z

I've only recently been able to generate longer samples between 1:00 and 1:30, so I've been able to complete maybe 5-10 tracks at that length. I'm randomizing the prompts, so it's very unscientific, just a feeling I had.

Edit: I'll try to do some controlled tests w/ a fixed key tomorrow. :)

a43992899 · 2025-02-05T03:35:25Z

We did a 20 sample AB test:

rp=1.0 is generally more stable, but the backing track sounds repetitive sometimes.
rp=1.2 will result in less repetitive backing track pattern, but some silence bad cases.

We will change default rp to 1.1, and add --rp to args.

PolyVector · 2025-02-05T03:44:07Z

Oh you've really tested this, that would melt my computer! Thanks for your time! 🍻

PolyVector · 2025-02-08T23:01:31Z

I've been trying to tune the parameters to my taste and have had incredible luck with this setup:

top_k = 4
temperature = ~1.25
top_p = ~0.9
repetition_penalty = 1

Apparently top_k+top_p is magic sauce, at least for me... I suspect that reducing the model's choices to a few good ones helps stabilize the generation. The high temperature increases diversity without using repetition_penalty.

a43992899 · 2025-02-09T01:05:36Z

Interesting. Thank you for spending time on tuning this and telling us the suggested setup. Let us test it~ @hf-lin

PolyVector · 2025-02-09T02:06:38Z

Oh cool, I hope it's reproducible since my inference setup is completely hacked apart at this point.

~~I'm not sure that top_k is entirely necessary, maybe if top_p was perfectly tuned it would accomplish the same thing..?~~ You know, it probably is required due to temperature skewing the distribution.

Feel free to ignore this, but in case anyone is interested, I threw together a LogitsProcessor for repetition penalty that's YuE-friendly. It only looks at the tokens on the same track, so instruments don't affect vocals. I haven't had time to actually test it yet. I assume that as it looks back across song sections they won't be guaranteed to stay on the same even/odd track, so it would require a padding token for any lyric section with an odd number of tokens.
Edit: I'll remove the code since there's no way this works correctly, it should be trivial for someone to implement who knows what they're doing. 🤷

Edit: I'm realizing I may have had too much caffeine, sorry for the crazy posts!

WrongProtocol · 2025-02-09T05:57:57Z

I've been trying to tune the parameters to my taste and have had incredible luck with this setup:

top_k = 4

temperature = ~1.25

top_p = ~0.9

repetition_penalty = 1

Apparently top_k+top_p is magic sauce, at least for me... I suspect that reducing the model's choices to a few good ones helps stabilize the generation. The high temperature increases diversity without using repetition_penalty.

Using YuE Exllamav2, I'm running batches of 40 generations per prompt, ~45 second music each, and these settings made a Huge different in making the outcome more structured and predictable (i.e. usable).

hf-lin closed this as completed Feb 3, 2025

a43992899 reopened this Feb 5, 2025

a43992899 added the enhancement New feature or request label Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

repetition_penalty detuning and degrading the audio #45

repetition_penalty detuning and degrading the audio #45

PolyVector commented Feb 2, 2025 •

edited

Loading

a43992899 commented Feb 2, 2025

hf-lin commented Feb 3, 2025

PolyVector commented Feb 5, 2025

hf-lin commented Feb 5, 2025

PolyVector commented Feb 5, 2025 •

edited

Loading

a43992899 commented Feb 5, 2025

PolyVector commented Feb 5, 2025

PolyVector commented Feb 8, 2025 •

edited

Loading

a43992899 commented Feb 9, 2025 •

edited

Loading

PolyVector commented Feb 9, 2025 •

edited

Loading

WrongProtocol commented Feb 9, 2025 •

edited

Loading

repetition_penalty detuning and degrading the audio #45

repetition_penalty detuning and degrading the audio #45

Comments

PolyVector commented Feb 2, 2025 • edited Loading

a43992899 commented Feb 2, 2025

hf-lin commented Feb 3, 2025

PolyVector commented Feb 5, 2025

hf-lin commented Feb 5, 2025

PolyVector commented Feb 5, 2025 • edited Loading

a43992899 commented Feb 5, 2025

PolyVector commented Feb 5, 2025

PolyVector commented Feb 8, 2025 • edited Loading

a43992899 commented Feb 9, 2025 • edited Loading

PolyVector commented Feb 9, 2025 • edited Loading

WrongProtocol commented Feb 9, 2025 • edited Loading

PolyVector commented Feb 2, 2025 •

edited

Loading

PolyVector commented Feb 5, 2025 •

edited

Loading

PolyVector commented Feb 8, 2025 •

edited

Loading

a43992899 commented Feb 9, 2025 •

edited

Loading

PolyVector commented Feb 9, 2025 •

edited

Loading

WrongProtocol commented Feb 9, 2025 •

edited

Loading