Skip to content

Encounter illegal memory access in FlashInfer backend under contiguous layout #321

@lianghao208

Description

@lianghao208

GPU type: H20 * 8
FlashInfer version: 0.5.3
vLLM version: 0.11.1

env illegal memory access is encoutered
KVCACHED_CONTIGUOUS_LAYOUT=true KVCACHED_MIN_RESERVED_PAGES=32 KVCACHED_MAX_RESERVED_PAGES=64
KVCACHED_CONTIGUOUS_LAYOUT=true KVCACHED_MIN_RESERVED_PAGES=5 KVCACHED_MAX_RESERVED_PAGES=10
KVCACHED_CONTIGUOUS_LAYOUT=false KVCACHED_MIN_RESERVED_PAGES=32 KVCACHED_MAX_RESERVED_PAGES=64

It seems that only when the number reserved pages >= 17 and enable contiguous layout, the illegal memory access error will be encountered.

related error log:

TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            3
gmem_address   0x7fef2db48a00
globalDim      (128,4,4,1,1)
globalStrides  (2,1024,256,0,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            4
gmem_address   0x1f0000000000
globalDim      (128,64,1,49408,1)
globalStrides  (2,256,256,2064384,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            4
gmem_address   0x1f0000004000
globalDim      (128,64,1,49408,1)
globalStrides  (2,256,256,2064384,0)
boxDim         (64,8,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr:   0x7ffdf52f9fc0
format         9
dim            3
gmem_address   0x7fef2db47a00
globalDim      (128,4,4,1,1)
globalStrides  (2,1024,256,0,0)
boxDim         (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave     0
swizzle        3
l2Promotion    2
oobFill        0
Error: Failed to initialize the TMA descriptor 700
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] WorkerProc hit an exception.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                          ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                    ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                 ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                    ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                          ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                                    ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]                 ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]   File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]     return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]            ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions