Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixing vLLM: Incorrect Generation Results #66

Open
a43992899 opened this issue Feb 13, 2025 · 2 comments
Open

Fixing vLLM: Incorrect Generation Results #66

a43992899 opened this issue Feb 13, 2025 · 2 comments
Assignees
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed

Comments

@a43992899
Copy link
Collaborator

a43992899 commented Feb 13, 2025

Description

vLLM accelerates generation by 5× on H800, but the output quality degrades significantly.

Observed Issues

  • Stage 1: As the sequence length increases, the generated audio gradually turns into noise (e.g., after ~30s).
  • Stage 2: More invalid token IDs are observed when using vLLM.

Expected Behavior

  • The generated audio should maintain quality just like huggingface transformers default implentation, regardless of sequence length.
  • No increase in invalid token IDs in Stage 2.

Possible Causes

The issue is likely in the LM part, not the audio tokenizer or GAN.
Potential causes:

  • Positional encoding misalignment?
  • Page attention inaccurate?
  • Decoding hyperparameter misalignment?

Steps to Reproduce

  1. A vllm branch has been created. @hf-lin will adapt reproducible vLLM inference code based on Hugging Face.
  2. A command to compare vLLM COT (vllm branch) vs HF COT (main branch) implementations will be added here. @hf-lin

Additional Context

  • YuE System Overview: We generate lyrics-to-song sequences with interleaved text conditions and audio tokens.
  • Dual-Token Strategy:
    • One token represents vocal at the current frame.
    • One token represents instrumental accompaniment at the current frame.

See system diagram.

Image
@a43992899 a43992899 added bug Something isn't working enhancement New feature or request help wanted Extra attention is needed labels Feb 13, 2025
@stevenxyz
Copy link

I also encountered this issue

@hf-lin
Copy link
Collaborator

hf-lin commented Feb 13, 2025

I created a script to compare tokens generated by VLLM and HF transformers. Let me explain the problem again. When decoding strategy is enabled, the first tokens generated by the two APIs are different.

Tested on:

  • transformers 4.42.0
  • vllm 0.4.0
  • torch 2.1.2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants