Skip to content

Conversation

@ansh-info
Copy link

• Summary

Issue: #472

  • Prevent vLLM from computing a zero KV-cache budget by aligning engine defaults with Unsloth init defaults.
  • Set gpu_memory_utilization and max_model_len in engine_args to the corresponding Unsloth init_args values so large-GPU setups don’t stall.

Motivation

  • Training with Qwen2.5-14B + Unsloth/vLLM on H100 was stalling at 0/steps; logs showed “vLLM’s KV Cache can use up to 0.0 GB” despite ample VRAM. Missing engine defaults led vLLM to size the cache to zero.

Details

  • In get_model_config, initialize vLLM engine_args with:
    • gpu_memory_utilization = init_args["gpu_memory_utilization"]
    • max_model_len = init_args["max_seq_length"]
  • Still allow user overrides via _internal_config["engine_args"].

…x_model_len to mirror the Unsloth init_args

Co-authored-by: Apoorva Gupta <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant