Skip to content

[Bug] 0.5.10 PD disaggregated nixl-cu13 error:NIXL_ERR_NOT_FOUND #23551

@DingYinfan

Description

@DingYinfan

Checklist

  • I searched related issues but found no solution.
  • The bug persists in the latest version.
  • Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
  • If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
  • Please use English. Otherwise, it will be closed.

Describe the bug

[2026-04-23 08:06:43 DP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
scheduler.run_event_loop()
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
dispatch_event_loop(self)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3504, in dispatch_event_loop
scheduler.event_loop_overlap_disagg_prefill()
File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 443, in event_loop_overlap_disagg_prefill
self.process_batch_result(tmp_batch, tmp_result)
File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2822, in process_batch_result
self.process_batch_result_disagg_prefill(batch, result)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 529, in process_batch_result_disagg_prefill
self.send_kv_chunk(req, last_chunk=True)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 817, in send_kv_chunk
req.disagg_kv_sender.send(page_indices, state_indices)
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 925, in send
new_xfer_handles = self.kv_mgr.add_transfer_request(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 786, in add_transfer_request
state_xfer_handle = self.maybe_send_extra(
^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 710, in maybe_send_extra
return self._send_kvcache_generic(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 430, in _send_kvcache_generic
xfer_handle = self.agent.initialize_xfer(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/nixl_cu13/_api.py", line 584, in initialize_xfer
handle = self.agent.createXferReq(
^^^^^^^^^^^^^^^^^^^^^^^^^
nixl_cu13._bindings.nixlNotFoundError: NIXL_ERR_NOT_FOUND

When running with large input, an error will occur; however, it works fine with small input. If small input is entered multiple times, the same error will be reported.

Reproduction

P server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--disaggregation-mode prefill
--disaggregation-transfer-backend nixl
--page-size 64
--host 172.21.17.82
--port 30000
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--disable-cuda-graph
--skip-server-warmup
--attention-backend fa3

D server:
python3 -m sglang.launch_server
--model-path /datasets/MiMo-V2-Flash/
--disaggregation-mode decode
--disaggregation-transfer-backend nixl
--load-balance-method round_robin
--prefill-round-robin-balance
--pp-size 1
--dp-size 2
--tp-size 8
--enable-dp-attention
--page-size 64
--host 172.21.17.89
--port 30001
--trust-remote-code
--mem-fraction-static 0.75
--max-running-requests 128
--chunked-prefill-size 16384
--reasoning-parser qwen3
--tool-call-parser mimo
--context-length 262144
--skip-server-warmup
--attention-backend fa3

router:
python -m sglang_router.launch_router --pd-disaggregation --prefill http://172.21.17.82:30000 --decode http://172.21.17.89:30001 --host 172.21.17.82 --port 8001

clinet:
python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-range-ratio 1.0 --num-prompts 64 --random-input-len 16384 --random-output-len 1024 --host 172.21.17.82 --port 8001 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

Environment

image:lmsysorg/sglang:v0.5.10.post1-cu130

ucx_info -v
Library version: 1.20.0
Library path: /lib/libucs.so.0
API headers version: 1.20.0
Git branch '', revision 4b7a6ca
Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --enable-shared --disable-static --disable-doxygen-doc --enable-optimizations --enable-cma --enable-devel-headers --with-cuda=/usr/local/cuda --with-verbs --with-dm --with-efa --enable-mt

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions