Fixing vLLM: Incorrect Generation Results #66

a43992899 · 2025-02-13T03:57:50Z

Description

vLLM accelerates generation by 5× on H800, but the output quality degrades significantly.

Observed Issues

Stage 1: As the sequence length increases, the generated audio gradually turns into noise (e.g., after ~30s).
Stage 2: More invalid token IDs are observed when using vLLM.

Expected Behavior

The generated audio should maintain quality just like huggingface transformers default implentation, regardless of sequence length.
No increase in invalid token IDs in Stage 2.

Possible Causes

The issue is likely in the LM part, not the audio tokenizer or GAN.
Potential causes:

Positional encoding misalignment?
Page attention inaccurate?
Decoding hyperparameter misalignment?

Steps to Reproduce

A vllm branch has been created. @hf-lin will adapt reproducible vLLM inference code based on Hugging Face.
A command to compare vLLM COT (vllm branch) vs HF COT (main branch) implementations will be added here. @hf-lin

Additional Context

YuE System Overview: We generate lyrics-to-song sequences with interleaved text conditions and audio tokens.
Dual-Token Strategy:
- One token represents vocal at the current frame.
- One token represents instrumental accompaniment at the current frame.

See system diagram.

The text was updated successfully, but these errors were encountered:

stevenxyz · 2025-02-13T06:33:45Z

I also encountered this issue

hf-lin · 2025-02-13T13:29:44Z

I created a script to compare tokens generated by VLLM and HF transformers. Let me explain the problem again. When decoding strategy is enabled, the first tokens generated by the two APIs are different.

Tested on:

transformers 4.42.0
vllm 0.4.0
torch 2.1.2

a43992899 added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Feb 13, 2025

a43992899 assigned hf-lin Feb 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixing vLLM: Incorrect Generation Results #66

Fixing vLLM: Incorrect Generation Results #66

a43992899 commented Feb 13, 2025 •

edited

Loading

stevenxyz commented Feb 13, 2025

hf-lin commented Feb 13, 2025 •

edited

Loading

Fixing vLLM: Incorrect Generation Results #66

Fixing vLLM: Incorrect Generation Results #66

Comments

a43992899 commented Feb 13, 2025 • edited Loading

Description

Observed Issues

Expected Behavior

Possible Causes

Steps to Reproduce

Additional Context

stevenxyz commented Feb 13, 2025

hf-lin commented Feb 13, 2025 • edited Loading

a43992899 commented Feb 13, 2025 •

edited

Loading

hf-lin commented Feb 13, 2025 •

edited

Loading