-
-
Notifications
You must be signed in to change notification settings - Fork 13.8k
Open
Labels
Description
Your current environment
v0.16 kimi mi355
🐛 Describe the bug
hi @powderluv @chunfangamd @andyluo7
One of the main features of AMD's latest flagship MI355X GPU is MXFP4, the ability to have better perf at similar accuracy. Today i was trying to test out amd's mxfp4 checkpoint amd/Kimi-K2.5-MXFP4 using amd's vllm/vllm-openai-rocm:v0.16.0 but unfortunately it doesnt work out of the box due to missing amd-quark package in the docker image along with other errors. Can u take a look?
seems like adding amd-quark to dockerfile.rocm would fix this
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1436, in outplace_fused_experts
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] return fused_experts_impl(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/fused_moe/fused_moe.py", line 1726, in fused_experts_impl
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] w1 = dequant_mxfp4(w1, w1_scale, hidden_states.dtype)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1255, in __call__
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] return self._op(*args, **kwargs)
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/quantization/utils/mxfp4_utils.py", line 125, in _dequant_mxfp4
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] raise ImportError(
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863] ImportError: The package `amd-quark` is required to use MX-FP4 models. Please install it with `pip install amd-quark`.
(Worker_TP0 pid=2820752) ERROR 03-01 01:15:53 [multiproc_executor.py:863]
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] EngineCore failed to start.
(EngineCore_DP0 pid=2820525) ERROR 03-01 01:15:53 [core.py:1006] Traceback (most recent call last):
https://github.com/SemiAnalysisAI/InferenceX/actions/runs/22532718866/job/65274669260?pr=825
Before submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Todo