Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM #44

Open
sgsdxzy opened this issue Feb 2, 2025 · 21 comments
Open
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@sgsdxzy
Copy link

sgsdxzy commented Feb 2, 2025

YuE-exllamav2 uses exllamav2 as an inference backend for stage1 and stage2. Exllamav2 can load the huggingface bf16 checkpoint directly and we saw 2x speedup in stage1 and 10x speedup in stage2. Additionally exllamav2 allows easy quantization of the model, and we find at 4.25bpw stage1 models still work well, thus enabling song generation with a minimum of 8GB vram.

Usage is the same, python src/yue/infer.py --stage1_use_exl2 --stage2_use_exl2 --stage2_cache_size 32768 [original args]

@sgsdxzy sgsdxzy changed the title An exllamav2 implementation of YuE An exllamav2 implementation of YuE: 2-10x speedup and 8GB minimum VRAM Feb 2, 2025
@Abhinay1997
Copy link

Abhinay1997 commented Feb 2, 2025

Can you share some benchmark numbers ? Like time taken for 30secs of generation on a specific GPU with and without quantization ?

So far, I'm getting 17tok/sec on stage_1 with the defaults in the repo. Using vllm gives me around 54tok/sec on a 4090. Using n_segments = 3 so I hit a max context length of ~9000

@sgsdxzy
Copy link
Author

sgsdxzy commented Feb 2, 2025

Speed on my 2080ti-22GB using the original bf16 model:

Stage Original Stage1 Original Stage2 Exllamav2 Stage1 Exllamav2 Stage2
s/it 300 600 90 50

I updated relevent info in the repo.

@bennmann
Copy link

bennmann commented Feb 2, 2025

I am testing in a colab notebook, and will update this comment with news.

EDIT:

Cells so far:

Cell 1:
`# Refer to https://github.com/turboderp-org/exllamav2?tab=readme-ov-file#installation

# Note installing from pypi uses JIT version and requires nvcc+compiler

!pip install exllamav2

# Make sure you have git-lfs installed (https://git-lfs.com)

# if you don't have root, see git-lfs/git-lfs#4134 (comment)

!sudo apt update
!sudo apt install git-lfs
!git lfs install
!git clone https://github.com/sgsdxzy/YuE-exllamav2.git
!cd YuE-exllamav2; git clone https://huggingface.co/m-a-p/xcodec_mini_infer`

Cell 2:
!cd YuE-exllamav2; pip install --upgrade -r requirements.txt !pip install --upgrade protobuf

Next step:
not sure where best to put these for use with the infer.py: Doctor-Shotgun/YuE-s1-7B-anneal-en-cot-exl2

@Subarasheese
Copy link

I can confirm I tested that repo and had significant performance improvements over the upstream solution. I recommend merging the changes made there.

@DocShotgun
Copy link

I managed to get this to run on my laptop with RTX 3060 mobile 6gb using exl2 quantized models and quantized k/v cache, and it seems to at least produce coherent output. Updated the page with perf stats for anyone interested lol.

@BahamutRU
Copy link

2080ti-22GB

Oh boy. =)

laptop with RTX 3060

Have the same. Interesting!

———

Great repo, thx!

@jrked
Copy link

jrked commented Feb 3, 2025

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

@sgsdxzy
Copy link
Author

sgsdxzy commented Feb 3, 2025

@rednessisaffair it seems you might have an empty lyrics after preprocessing, and thus no output. Can you verify if your prompt works in the official pipeline?

@Mozer
Copy link

Mozer commented Feb 3, 2025

I added ICL model in exl2 quants (3, 4, 8 bpw): https://huggingface.co/Ftfyhh/YuE-s1-7B-anneal-en-icl-exl2
5 and 6 are in the process.

@a43992899
Copy link
Collaborator

Hey guys, exllamav2 seems to be working well, saving memory and improving speed. Lost some musicality but still sounds coherent.

@alisson-anjos
Copy link

alisson-anjos commented Feb 3, 2025

I made a gradio interface for this fork, docker and a template for runpod (https://github.com/alisson-anjos/YuE-exllamav2-UI)

Image

@savank7
Copy link

savank7 commented Feb 4, 2025

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

@jrked did you get any solution for this issue?

@jrked
Copy link

jrked commented Feb 4, 2025

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

@jrked did you get any solution for this issue?

nope

@Mozer
Copy link

Mozer commented Feb 4, 2025

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

try to add more sections into lyrics.txt. 3-4 should be ok

@savank7
Copy link

savank7 commented Feb 4, 2025

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

try to add more sections into lyrics.txt. 3-4 should be ok

when we use provided genre.txt, lyrics.txt, and provided mp3 then it works but when we try to give custom genre.txt , lyrics.txt, and mp3 then get this error "UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value"

@sgsdxzy
Copy link
Author

sgsdxzy commented Feb 4, 2025

Make sure your lyrics strictly follow the example, for example [verse] above the first line of each segment.

@alisson-anjos
Copy link

alisson-anjos commented Feb 4, 2025

RTX 4090 24 VRAM, 61gb RAM, model base BF16

Test 1

Stage 1: cache size 16384, cache mode Q4, 6 segments, 10 minutes 15 seconds
Stage 2: cache size 8192, cache mode Q4, 19 batch size (auto calculated), 6 minutes 16 seconds
Stage 1 + Stage 2: 16 minutes 31 seconds, audio result 2 minutes 47 seconds

Test 2

Stage 1: cache size 16384, cache mode Q6, 6 segments, 9 minutes 35 seconds
Stage 2: cache size 8192, cache mode Q6, 19 batch size (auto calculated), 6 minutes 6 seconds
Stage 1 + Stage 2: 15 minutes 42 seconds, audio result 2 minutes 40 seconds

Test 3

Stage 1: cache size 16384, cache mode Q8, 6 segments, 10 minutes 5 seconds
Stage 2: cache size 8192, cache mode Q8, 19 batch size (auto calculated), 6 minutes 13 seconds
Stage 1 + Stage 2: 16 minutes 18 seconds, audio result 2 minutes 45 seconds

Test 4

Stage 1: cache size 16384, cache mode BF16, 6 segments, 9 minutes 51 seconds
Stage 2: cache size 8192, cache mode BF16, 19 batch size (auto calculated), 5 minutes 6 seconds
Stage 1 + Stage 2: 14 minutes 56 seconds, audio result 2 minutes 47 seconds

Note: it probably had an impact on the VRAM used but I didn't monitor that data.

Now I need to quantize the models and test on quantized models.

Audio files here

@a43992899 a43992899 added enhancement New feature or request good first issue Good for newcomers labels Feb 5, 2025
@DocShotgun
Copy link

how to fix this?

Traceback (most recent call last):
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 489, in <module>
    main()
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 466, in main
    raw_output = pipeline.generate(
                 ^^^^^^^^^^^^^^^^^^
  File "/content/YuE-exllamav2/src/yue/infer_stage1.py", line 273, in generate
    return raw_output
           ^^^^^^^^^^
UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value

try to add more sections into lyrics.txt. 3-4 should be ok

when we use provided genre.txt, lyrics.txt, and provided mp3 then it works but when we try to give custom genre.txt , lyrics.txt, and mp3 then get this error "UnboundLocalError: cannot access local variable 'raw_output' where it is not associated with a value"

Check and make sure that your custom files are formatted the same way as the examples. The way that the script processes the prompt is... very particular.

@bennmann
Copy link

Could someone re-write a practical step-by-step guide for idiots (read: Me) ?

  • that goes through from linux system setup almost from scratch (bash history)
  • downloading 4.25bpw and folder structure
  • lyric nuances and specific txt format requirements
  • specific full python usage example (instead of generic [original args] )

Thanks for any idiot proof guide.

@wwbnjsace
Copy link

When using the exact same parameters, the results generated by this method are inconsistent compared to those generated by the official method. The gender and music style differ, and sometimes the quality is worse than that of the official method. Why is this happening? i use FP16

@bennmann
Copy link

When using the exact same parameters, the results generated by this method are inconsistent compared to those generated by the official method. The gender and music style differ, and sometimes the quality is worse than that of the official method. Why is this happening? i use FP16

i can't get either to finish running at all yet, but if i were you, it might be good to systematically provide your sample size and testing parameters - top p, temp, same random seed both repos and compare more directly

because unless you have a large sample size i suspect the top p, temp, and random seed may be the culprit (just a large standard deviation of quality).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests