Skip to content

[Bug]: parity with cuda: rocm image missing amd quark kimi k2.5 mxfp4 #35633

@functionstackx

Description

@functionstackx

Your current environment

v0.16 kimi mi355

🐛 Describe the bug

hi @powderluv @chunfangamd @andyluo7

One of the main features of AMD's latest flagship MI355X GPU is MXFP4, the ability to have better perf at similar accuracy. Today i was trying to test out amd's mxfp4 checkpoint amd/Kimi-K2.5-MXFP4 using amd's vllm/vllm-openai-rocm:v0.16.0 but unfortunately it doesnt work out of the box due to missing amd-quark package in the docker image along with other errors. Can u take a look?

seems like adding amd-quark to dockerfile.rocm would fix this

(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1436, in outplace_fused_experts
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     return fused_experts_impl(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1726, in fused_experts_impl
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     w1 = dequant_mxfp4(w1, w1_scale, hidden_states.dtype)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     return self._op(*args, **kwargs)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/mxfp4_utils.py", line 125, in _dequant_mxfp4
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     raise ImportError(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ImportError: The package `amd-quark` is required to use MX-FP4 models. Please install it with `pip install amd-quark`.
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] 
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] EngineCore failed to start.
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] Traceback (most recent call last):

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22532718866/job/65274669260?pr=825

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingrocmRelated to AMD ROCm

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions