[Bug]: parity with cuda: rocm image missing amd quark kimi k2.5 mxfp4

### Your current environment

v0.16 kimi mi355

### 🐛 Describe the bug

hi @powderluv @chunfangamd @andyluo7


One of the main features of AMD's latest flagship MI355X GPU is MXFP4, the ability to have better perf at similar accuracy. Today i was trying to test out amd's mxfp4 checkpoint `amd/Kimi-K2.5-MXFP4`  using amd's `vllm/vllm-openai-rocm:v0.16.0` but unfortunately it doesnt work out of the box due to missing amd-quark package in the docker image along with other errors. Can u take a look?

seems like adding amd-quark to dockerfile.rocm would fix this


```
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1436, in outplace_fused_experts
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     return fused_experts_impl(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1726, in fused_experts_impl
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     w1 = dequant_mxfp4(w1, w1_scale, hidden_states.dtype)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     return self._op(*args, **kwargs)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]            ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/mxfp4_utils.py", line 125, in _dequant_mxfp4
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]     raise ImportError(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ImportError: The package `amd-quark` is required to use MX-FP4 models. Please install it with `pip install amd-quark`.
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] 
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] EngineCore failed to start.
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] Traceback (most recent call last):
```

https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22532718866/job/65274669260?pr=825


### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the [documentation page](https://docs.vllm.ai/en/latest/), which can answer lots of frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: parity with cuda: rocm image missing amd quark kimi k2.5 mxfp4 #35633

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

[Bug]: parity with cuda: rocm image missing amd quark kimi k2.5 mxfp4 #35633

Description

Your current environment

🐛 Describe the bug

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions