Faster generation times #38

bluenucleus · 2025-01-31T10:42:36Z

Inference is taking too long. Are there any plans to optimise the processing times? Was anyone successful in bringing the generate times lower? A100 80gb takes upto 8-13 mins for 30 second clip. That's very expensive for a mono model.

a43992899 · 2025-01-31T13:00:46Z

You can increase --stage2_batch_size to speed up the inference, since you have large VRAM.

Try --stage2_batch_size 16.

This issue #8 is also working on quantization.

alisson-anjos · 2025-01-31T14:29:13Z

I think that if you use the NF4 models you can easily increase the batch size value to 16, I'm going to do this experiment to see if using the NF4, which ends up taking up a smaller amount of VRAM, impacts the possibility of increasing the batch size value, as opposed to using the BF16.

austin2035 · 2025-02-01T14:25:57Z

It is of little significance to optimize only the stage2 , as the reasoning in the stage 1 is very time-consuming and the GPU is not fully utilized at all.

According to multimodal-art-projection/YuE#38, this could help with inference times.

alisson-anjos · 2025-02-01T22:28:55Z

There was someone who made a fork, added the possibility of using sdpa instead of flash attention and applied a patch to transformers to double the speed.

https://github.com/deepbeepmeep/YuEGP

jrked · 2025-02-02T03:52:57Z

There was someone who made a fork, added the possibility of using sdpa instead of flash attention and applied a patch to transformers to double the speed.

https://github.com/deepbeepmeep/YuEGP

interesting, thanks for sharing this!

Mozer · 2025-02-02T07:32:39Z

There's also sage attention (using triton), which is 2 times faster than flash attention. Maybe someone can implement this:
thu-ml/SageAttention#21 (comment)

tvararu added a commit to tvararu/YuE that referenced this issue Feb 1, 2025

Bump stage2_batch_size to 16

aab7091

According to multimodal-art-projection/YuE#38, this could help with inference times.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster generation times #38

Faster generation times #38

bluenucleus commented Jan 31, 2025

a43992899 commented Jan 31, 2025 •

edited

Loading

alisson-anjos commented Jan 31, 2025 •

edited

Loading

austin2035 commented Feb 1, 2025 •

edited

Loading

alisson-anjos commented Feb 1, 2025

jrked commented Feb 2, 2025

Mozer commented Feb 2, 2025 •

edited

Loading

Faster generation times #38

Faster generation times #38

Comments

bluenucleus commented Jan 31, 2025

a43992899 commented Jan 31, 2025 • edited Loading

alisson-anjos commented Jan 31, 2025 • edited Loading

austin2035 commented Feb 1, 2025 • edited Loading

alisson-anjos commented Feb 1, 2025

jrked commented Feb 2, 2025

Mozer commented Feb 2, 2025 • edited Loading

a43992899 commented Jan 31, 2025 •

edited

Loading

alisson-anjos commented Jan 31, 2025 •

edited

Loading

austin2035 commented Feb 1, 2025 •

edited

Loading

Mozer commented Feb 2, 2025 •

edited

Loading