An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

sgsdxzy · 2025-02-02T16:34:43Z

YuE-exllamav2 uses exllamav2 as an inference backend for stage1 and stage2. Exllamav2 can load the huggingface bf16 checkpoint directly and we saw 2x speedup in stage1 and 10x speedup in stage2. Additionally exllamav2 allows easy quantization of the model, and we find at 4.25bpw stage1 models still work well, thus enabling song generation with a minimum of 8GB vram.

Usage is the same, python src/yue/infer.py --stage1_use_exl2 --stage2_use_exl2 --stage2_cache_size 32768 [original args]

The text was updated successfully, but these errors were encountered:

Abhinay1997 · 2025-02-02T16:54:33Z

Can you share some benchmark numbers ? Like time taken for 30secs of generation on a specific GPU with and without quantization ?

So far, I'm getting 17tok/sec on stage_1 with the defaults in the repo. Using vllm gives me around 54tok/sec on a 4090. Using n_segments = 3 so I hit a max context length of ~9000

sgsdxzy · 2025-02-02T17:12:42Z

Speed on my 2080ti-22GB using the original bf16 model:

Stage	Original Stage1	Original Stage2	Exllamav2 Stage1	Exllamav2 Stage2
s/it	300	600	90	50

I updated relevent info in the repo.

bennmann · 2025-02-02T22:25:27Z

I am testing in a colab notebook, and will update this comment with news.

EDIT:

Cells so far:

Cell 1:
`# Refer to https://github.com/turboderp-org/exllamav2?tab=readme-ov-file#installation

# Note installing from pypi uses JIT version and requires nvcc+compiler

!pip install exllamav2

# Make sure you have git-lfs installed (https://git-lfs.com)

# if you don't have root, see git-lfs/git-lfs#4134 (comment)

!sudo apt update
!sudo apt install git-lfs
!git lfs install
!git clone https://github.com/sgsdxzy/YuE-exllamav2.git
!cd YuE-exllamav2; git clone https://huggingface.co/m-a-p/xcodec_mini_infer`

Cell 2:
!cd YuE-exllamav2; pip install --upgrade -r requirements.txt !pip install --upgrade protobuf

Next step:
not sure where best to put these for use with the infer.py: Doctor-Shotgun/YuE-s1-7B-anneal-en-cot-exl2

Subarasheese · 2025-02-03T06:42:20Z

I can confirm I tested that repo and had significant performance improvements over the upstream solution. I recommend merging the changes made there.

DocShotgun · 2025-02-03T08:51:42Z

I managed to get this to run on my laptop with RTX 3060 mobile 6gb using exl2 quantized models and quantized k/v cache, and it seems to at least produce coherent output. Updated the page with perf stats for anyone interested lol.

BahamutRU · 2025-02-03T09:33:22Z

2080ti-22GB

Oh boy. =)

laptop with RTX 3060

Have the same. Interesting!

———

Great repo, thx!

jrked · 2025-02-03T10:06:28Z

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

sgsdxzy · 2025-02-03T10:49:10Z

@rednessisaffair it seems you might have an empty lyrics after preprocessing, and thus no output. Can you verify if your prompt works in the official pipeline?

Mozer · 2025-02-03T13:47:32Z

I added ICL model in exl2 quants (3, 4, 8 bpw): https://huggingface.co/Ftfyhh/YuE-s1-7B-anneal-en-icl-exl2
5 and 6 are in the process.

a43992899 · 2025-02-03T18:57:03Z

Hey guys, exllamav2 seems to be working well, saving memory and improving speed. Lost some musicality but still sounds coherent.

alisson-anjos · 2025-02-03T23:05:57Z

I made a gradio interface for this fork, docker and a template for runpod (https://github.com/alisson-anjos/YuE-exllamav2-UI)

savank7 · 2025-02-04T06:13:55Z

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

@jrked did you get any solution for this issue?

jrked · 2025-02-04T06:48:52Z

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

@jrked did you get any solution for this issue?

nope

Mozer · 2025-02-04T07:29:13Z

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

try to add more sections into lyrics.txt. 3-4 should be ok

savank7 · 2025-02-04T11:04:09Z

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

try to add more sections into lyrics.txt. 3-4 should be ok

when we use provided genre.txt, lyrics.txt, and provided mp3 then it works but when we try to give custom genre.txt , lyrics.txt, and mp3 then get this error "UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value"

sgsdxzy · 2025-02-04T17:10:56Z

Make sure your lyrics strictly follow the example, for example [verse] above the first line of each segment.

alisson-anjos · 2025-02-04T17:27:01Z

RTX 4090 24 VRAM, 61gb RAM, model base BF16

Test 1

Stage 1: cache size 16384, cache mode Q4, 6 segments, 10 minutes 15 seconds
Stage 2: cache size 8192, cache mode Q4, 19 batch size (auto calculated), 6 minutes 16 seconds
Stage 1 + Stage 2: 16 minutes 31 seconds, audio result 2 minutes 47 seconds

Test 2

Stage 1: cache size 16384, cache mode Q6, 6 segments, 9 minutes 35 seconds
Stage 2: cache size 8192, cache mode Q6, 19 batch size (auto calculated), 6 minutes 6 seconds
Stage 1 + Stage 2: 15 minutes 42 seconds, audio result 2 minutes 40 seconds

Test 3

Stage 1: cache size 16384, cache mode Q8, 6 segments, 10 minutes 5 seconds
Stage 2: cache size 8192, cache mode Q8, 19 batch size (auto calculated), 6 minutes 13 seconds
Stage 1 + Stage 2: 16 minutes 18 seconds, audio result 2 minutes 45 seconds

Test 4

Stage 1: cache size 16384, cache mode BF16, 6 segments, 9 minutes 51 seconds
Stage 2: cache size 8192, cache mode BF16, 19 batch size (auto calculated), 5 minutes 6 seconds
Stage 1 + Stage 2: 14 minutes 56 seconds, audio result 2 minutes 47 seconds

Note: it probably had an impact on the VRAM used but I didn't monitor that data.

Now I need to quantize the models and test on quantized models.

Audio files here

DocShotgun · 2025-02-07T06:59:52Z

how to fix this?
Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value
try to add more sections into lyrics.txt. 3-4 should be ok
when we use provided genre.txt, lyrics.txt, and provided mp3 then it works but when we try to give custom genre.txt , lyrics.txt, and mp3 then get this error "UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value"

Check and make sure that your custom files are formatted the same way as the examples. The way that the script processes the prompt is... very particular.

bennmann · 2025-02-10T19:26:17Z

Could someone re-write a practical step-by-step guide for idiots (read: Me) ?

that goes through from linux system setup almost from scratch (bash history)
downloading 4.25bpw and folder structure
lyric nuances and specific txt format requirements
specific full python usage example (instead of generic [original args] )

Thanks for any idiot proof guide.

wwbnjsace · 2025-02-11T07:26:12Z

When using the exact same parameters, the results generated by this method are inconsistent compared to those generated by the official method. The gender and music style differ, and sometimes the quality is worse than that of the official method. Why is this happening? i use FP16

bennmann · 2025-02-13T14:17:48Z

When using the exact same parameters, the results generated by this method are inconsistent compared to those generated by the official method. The gender and music style differ, and sometimes the quality is worse than that of the official method. Why is this happening? i use FP16

i can't get either to finish running at all yet, but if i were you, it might be good to systematically provide your sample size and testing parameters - top p, temp, same random seed both repos and compare more directly

because unless you have a large sample size i suspect the top p, temp, and random seed may be the culprit (just a large standard deviation of quality).

sgsdxzy changed the title ~~An exllamav2 implementation of YuE~~ An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM Feb 2, 2025

bennmann mentioned this issue Feb 2, 2025

Feature Request: YuE (music gen) ggml-org/llama.cpp#11467

Open

4 tasks

a43992899 added enhancement New feature or request good first issue Good for newcomers labels Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

sgsdxzy commented Feb 2, 2025 •

edited

Loading

Abhinay1997 commented Feb 2, 2025 •

edited

Loading

sgsdxzy commented Feb 2, 2025

bennmann commented Feb 2, 2025 •

edited

Loading

Subarasheese commented Feb 3, 2025

DocShotgun commented Feb 3, 2025

BahamutRU commented Feb 3, 2025

jrked commented Feb 3, 2025

sgsdxzy commented Feb 3, 2025

Mozer commented Feb 3, 2025

a43992899 commented Feb 3, 2025

alisson-anjos commented Feb 3, 2025 •

edited

Loading

savank7 commented Feb 4, 2025

jrked commented Feb 4, 2025

Mozer commented Feb 4, 2025

savank7 commented Feb 4, 2025

sgsdxzy commented Feb 4, 2025

alisson-anjos commented Feb 4, 2025 •

edited

Loading

DocShotgun commented Feb 7, 2025

bennmann commented Feb 10, 2025

wwbnjsace commented Feb 11, 2025

bennmann commented Feb 13, 2025

An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

Comments

sgsdxzy commented Feb 2, 2025 • edited Loading

Abhinay1997 commented Feb 2, 2025 • edited Loading

sgsdxzy commented Feb 2, 2025

bennmann commented Feb 2, 2025 • edited Loading

Subarasheese commented Feb 3, 2025

DocShotgun commented Feb 3, 2025

BahamutRU commented Feb 3, 2025

jrked commented Feb 3, 2025

sgsdxzy commented Feb 3, 2025

Mozer commented Feb 3, 2025

a43992899 commented Feb 3, 2025

alisson-anjos commented Feb 3, 2025 • edited Loading

savank7 commented Feb 4, 2025

jrked commented Feb 4, 2025

Mozer commented Feb 4, 2025

savank7 commented Feb 4, 2025

sgsdxzy commented Feb 4, 2025

alisson-anjos commented Feb 4, 2025 • edited Loading

DocShotgun commented Feb 7, 2025

bennmann commented Feb 10, 2025

wwbnjsace commented Feb 11, 2025

bennmann commented Feb 13, 2025

sgsdxzy commented Feb 2, 2025 •

edited

Loading

Abhinay1997 commented Feb 2, 2025 •

edited

Loading

bennmann commented Feb 2, 2025 •

edited

Loading

alisson-anjos commented Feb 3, 2025 •

edited

Loading

alisson-anjos commented Feb 4, 2025 •

edited

Loading