Checklist
Describe the bug
[2026-04-23 08:06:43 DP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3504, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_prefill()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 443, in event_loop_overlap_disagg_prefill
self.process_batch_result(tmp_batch, tmp_result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2822, in process_batch_result
self.process_batch_result_disagg_prefill(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 529, in process_batch_result_disagg_prefill
self.send_kv_chunk(req, last_chunk=True)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 817, in send_kv_chunk
req.disagg_kv_sender.send(page_indices, state_indices)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 925, in send
new_xfer_handles = self.kv_mgr.add_transfer_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 786, in add_transfer_request
state_xfer_handle = self.maybe_send_extra(
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 710, in maybe_send_extra
return self._send_kvcache_generic(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 430, in _send_kvcache_generic
xfer_handle = self.agent.initialize_xfer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/nixl_cu13/_api.py", line 584, in initialize_xfer
handle = self.agent.createXferReq(
^^^^^^^^^^^^^^^^^^^^^^^^^
nixl_cu13._bindings.nixlNotFoundError: NIXL_ERR_NOT_FOUND
When running with large input, an error will occur; however, it works fine with small input. If small input is entered multiple times, the same error will be reported.
Reproduction
P server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--disaggregation-mode prefill
--disaggregation-transfer-backend nixl
--page-size 64
--host 172.21.17.82
--port 30000
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--disable-cuda-graph
--skip-server-warmup
--attention-backend fa3
D server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--disaggregation-mode decode
--disaggregation-transfer-backend nixl
--load-balance-method round_robin
--prefill-round-robin-balance
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--page-size 64
--host 172.21.17.89
--port 30001
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--skip-server-warmup
--attention-backend fa3
router:
python -m sglang_router.launch_router --pd-disaggregation --prefill http://172.21.17.82:30000 --decode http://172.21.17.89:30001 --host 172.21.17.82 --port 8001
clinet:
python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-range-ratio 1.0 --num-prompts 64 --random-input-len 16384 --random-output-len 1024 --host 172.21.17.82 --port 8001 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
Environment
image:lmsysorg/sglang:v0.5.10.post1-cu130
ucx_info -v
Library version: 1.20.0
Library path: /lib/libucs.so.0
API headers version: 1.20.0
Git branch '', revision 4b7a6ca
Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --enable-shared --disable-static --disable-doxygen-doc --enable-optimizations --enable-cma --enable-devel-headers --with-cuda=/usr/local/cuda --with-verbs --with-dm --with-efa --enable-mt
Checklist
Describe the bug
[2026-04-23 08:06:43 DP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3504, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_prefill()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 443, in event_loop_overlap_disagg_prefill
self.process_batch_result(tmp_batch, tmp_result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2822, in process_batch_result
self.process_batch_result_disagg_prefill(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 529, in process_batch_result_disagg_prefill
self.send_kv_chunk(req, last_chunk=True)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 817, in send_kv_chunk
req.disagg_kv_sender.send(page_indices, state_indices)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 925, in send
new_xfer_handles = self.kv_mgr.add_transfer_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 786, in add_transfer_request
state_xfer_handle = self.maybe_send_extra(
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 710, in maybe_send_extra
return self._send_kvcache_generic(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 430, in _send_kvcache_generic
xfer_handle = self.agent.initialize_xfer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/nixl_cu13/_api.py", line 584, in initialize_xfer
handle = self.agent.createXferReq(
^^^^^^^^^^^^^^^^^^^^^^^^^
nixl_cu13._bindings.nixlNotFoundError: NIXL_ERR_NOT_FOUND
When running with large input, an error will occur; however, it works fine with small input. If small input is entered multiple times, the same error will be reported.
Reproduction
P server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--disaggregation-mode prefill
--disaggregation-transfer-backend nixl
--page-size 64
--host 172.21.17.82
--port 30000
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--disable-cuda-graph
--skip-server-warmup
--attention-backend fa3
D server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--disaggregation-mode decode
--disaggregation-transfer-backend nixl
--load-balance-method round_robin
--prefill-round-robin-balance
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--page-size 64
--host 172.21.17.89
--port 30001
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--skip-server-warmup
--attention-backend fa3
router:
python -m sglang_router.launch_router --pd-disaggregation --prefill http://172.21.17.82:30000 --decode http://172.21.17.89:30001 --host 172.21.17.82 --port 8001
clinet:
python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-range-ratio 1.0 --num-prompts 64 --random-input-len 16384 --random-output-len 1024 --host 172.21.17.82 --port 8001 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json
Environment
image:lmsysorg/sglang:v0.5.10.post1-cu130
ucx_info -v
Library version: 1.20.0
Library path: /lib/libucs.so.0
API headers version: 1.20.0
Git branch '', revision 4b7a6ca
Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --enable-shared --disable-static --disable-doxygen-doc --enable-optimizations --enable-cma --enable-devel-headers --with-cuda=/usr/local/cuda --with-verbs --with-dm --with-efa --enable-mt