[Bug] 0.5.10 PD disaggregated nixl-cu13 error：NIXL_ERR_NOT_FOUND

### Checklist

- [x] I searched related issues but found no solution.
- [x] The bug persists in the latest version.
- [x] Issues without environment info and a minimal reproducible demo are hard to resolve and may receive no feedback.
- [x] If this is not a bug report but a general question, please start a discussion at https://github.com/sgl-project/sglang/discussions. Otherwise, it will be closed.
- [x] Please use English. Otherwise, it will be closed.

### Describe the bug

[2026-04-23 08:06:43 DP0 TP0] Scheduler hit an exception: Traceback (most recent call last):
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3616, in run_scheduler_process
    scheduler.run_event_loop()
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 1300, in run_event_loop
    dispatch_event_loop(self)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 3504, in dispatch_event_loop
    scheduler.event_loop_overlap_disagg_prefill()
  File "/usr/local/lib/python3.12/dist-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 443, in event_loop_overlap_disagg_prefill
    self.process_batch_result(tmp_batch, tmp_result)
  File "/sgl-workspace/sglang/python/sglang/srt/managers/scheduler.py", line 2822, in process_batch_result
    self.process_batch_result_disagg_prefill(batch, result)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 529, in process_batch_result_disagg_prefill
    self.send_kv_chunk(req, last_chunk=True)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/prefill.py", line 817, in send_kv_chunk
    req.disagg_kv_sender.send(page_indices, state_indices)
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 925, in send
    new_xfer_handles = self.kv_mgr.add_transfer_request(
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 786, in add_transfer_request
    state_xfer_handle = self.maybe_send_extra(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 710, in maybe_send_extra
    return self._send_kvcache_generic(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/sgl-workspace/sglang/python/sglang/srt/disaggregation/nixl/conn.py", line 430, in _send_kvcache_generic
    xfer_handle = self.agent.initialize_xfer(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.12/dist-packages/nixl_cu13/_api.py", line 584, in initialize_xfer
    handle = self.agent.createXferReq(
             ^^^^^^^^^^^^^^^^^^^^^^^^^
nixl_cu13._bindings.nixlNotFoundError: NIXL_ERR_NOT_FOUND

When running with large input, an error will occur; however, it works fine with small input. If small input is entered multiple times, the same error will be reported.

### Reproduction


P server：
python3 -m sglang.launch_server \
    --model-path /datasets/MiMo-V2-Flash/ \
    --pp-size 1 \
    --dp-size 2 \
    --tp-size 8 \
    --enable-dp-attention \
    --disaggregation-mode prefill \
    --disaggregation-transfer-backend nixl \
    --page-size 64 \
    --host 172.21.17.82 \
    --port 30000 \
    --trust-remote-code \
    --mem-fraction-static 0.75 \
    --max-running-requests 128 \
    --chunked-prefill-size 16384 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --disable-cuda-graph \
    --skip-server-warmup \
    --attention-backend fa3

D server：
python3 -m sglang.launch_server \
    --model-path /datasets/MiMo-V2-Flash/ \
    --disaggregation-mode decode \
    --disaggregation-transfer-backend nixl \
    --load-balance-method round_robin \
    --prefill-round-robin-balance \
    --pp-size 1 \
    --dp-size 2 \
    --tp-size 8 \
    --enable-dp-attention \
    --page-size 64 \
    --host 172.21.17.89 \
    --port 30001 \
    --trust-remote-code \
    --mem-fraction-static 0.75 \
    --max-running-requests 128 \
    --chunked-prefill-size 16384 \
    --reasoning-parser qwen3 \
    --tool-call-parser mimo \
    --context-length 262144 \
    --skip-server-warmup \
    --attention-backend fa3

router：
python -m sglang_router.launch_router --pd-disaggregation --prefill http://172.21.17.82:30000 --decode http://172.21.17.89:30001 --host 172.21.17.82 --port 8001

clinet：
python3 -m sglang.bench_serving --backend sglang --dataset-name random --random-range-ratio 1.0 --num-prompts 64 --random-input-len 16384 --random-output-len 1024 --host 172.21.17.82 --port 8001 --dataset-path ShareGPT_V3_unfiltered_cleaned_split.json

### Environment

image：lmsysorg/sglang:v0.5.10.post1-cu130

ucx_info -v
Library version: 1.20.0
Library path: /lib/libucs.so.0
API headers version: 1.20.0
Git branch '', revision 4b7a6ca
Configured with: --disable-logging --disable-debug --disable-assertions --disable-params-check --enable-mt --enable-shared --disable-static --disable-doxygen-doc --enable-optimizations --enable-cma --enable-devel-headers --with-cuda=/usr/local/cuda --with-verbs --with-dm --with-efa --enable-mt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] 0.5.10 PD disaggregated nixl-cu13 error：NIXL_ERR_NOT_FOUND #23551

Checklist

Describe the bug

Reproduction

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug] 0.5.10 PD disaggregated nixl-cu13 error：NIXL_ERR_NOT_FOUND #23551

Description

Checklist

Describe the bug

Reproduction

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions