It seems that only when the number reserved pages >= 17 and enable contiguous layout, the illegal memory access error will be encountered.
TMA Desc Addr: 0x7ffdf52f9fc0
format 9
dim 3
gmem_address 0x7fef2db48a00
globalDim (128,4,4,1,1)
globalStrides (2,1024,256,0,0)
boxDim (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave 0
swizzle 3
l2Promotion 2
oobFill 0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr: 0x7ffdf52f9fc0
format 9
dim 4
gmem_address 0x1f0000000000
globalDim (128,64,1,49408,1)
globalStrides (2,256,256,2064384,0)
boxDim (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave 0
swizzle 3
l2Promotion 2
oobFill 0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr: 0x7ffdf52f9fc0
format 9
dim 4
gmem_address 0x1f0000004000
globalDim (128,64,1,49408,1)
globalStrides (2,256,256,2064384,0)
boxDim (64,8,1,1,1)
elementStrides (1,1,1,1,1)
interleave 0
swizzle 3
l2Promotion 2
oobFill 0
Error: Failed to initialize the TMA descriptor 700
TMA Desc Addr: 0x7ffdf52f9fc0
format 9
dim 3
gmem_address 0x7fef2db47a00
globalDim (128,4,4,1,1)
globalStrides (2,1024,256,0,0)
boxDim (64,64,1,1,1)
elementStrides (1,1,1,1,1)
interleave 0
swizzle 3
l2Promotion 2
oobFill 0
Error: Failed to initialize the TMA descriptor 700
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] WorkerProc hit an exception.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Traceback (most recent call last):
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 701, in worker_busy_loop
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 480, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = self.model_runner.execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2719, in execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return _execute_model()
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/lib/python3.12/contextlib.py", line 81, in inner
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwds)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2718, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._execute_model(scheduler_output, intermediate_tensors)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return func(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2824, in _execute_model
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] model_output = self._model_forward(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 2700, in _model_forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 1164, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] model_output = self.model(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/compilation/decorators.py", line 228, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.forward(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 844, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] hidden_states, residual, kv_states = layer(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 741, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] hidden_states, ori_kv_states = self.self_attn(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/models/hunyuan_v1.py", line 313, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output, _ = self.o_proj(attn_output)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1773, in _wrapped_call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._call_impl(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/nn/modules/module.py", line 1784, in _call_impl
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return forward_call(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/layers/linear.py", line 1426, in forward
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] output = tensor_model_parallel_all_reduce(output_parallel)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/communication_op.py", line 14, in tensor_model_parallel_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return get_tp_group().all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 378, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return torch.ops.vllm.all_reduce(input_, group_name=self.unique_name)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 119, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return group._all_reduce_out_place(tensor)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/parallel_state.py", line 385, in _all_reduce_out_place
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.device_communicator.all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/cuda_communicator.py", line 154, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] out = ca_comm.custom_all_reduce(input_)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 279, in custom_all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self.all_reduce(input, registered=False)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/distributed/device_communicators/custom_all_reduce.py", line 258, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ops.all_reduce(
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/vllm/_custom_ops.py", line 2153, in all_reduce
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.ops._C_custom_ar.all_reduce(fa, inp, out, reg_buffer, reg_buffer_sz_bytes)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] File "/usr/local/lib/python3.12/dist-packages/torch/_ops.py", line 1243, in __call__
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] return self._op(*args, **kwargs)
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] ^^^^^^^^^^^^^^^^^^^^^^^^^
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] torch.AcceleratorError: CUDA error: an illegal memory access was encountered
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706] Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]
�[1;36m(Worker_TP7 pid=380351)�[0;0m ERROR 04-28-2026-15:50:31.931 [multiproc_executor.py:706]
GPU type: H20 * 8
FlashInfer version: 0.5.3
vLLM version: 0.11.1
It seems that only when the number reserved pages >= 17 and enable contiguous layout, the illegal memory access error will be encountered.
related error log: