Skip to content

[TRTLLM-12721][fix] Bound V2 context transfer polling#15356

Draft
chienchunhung wants to merge 2 commits into
NVIDIA:mainfrom
chienchunhung:codex/disagg-v2-bounded-transfer-poll
Draft

[TRTLLM-12721][fix] Bound V2 context transfer polling#15356
chienchunhung wants to merge 2 commits into
NVIDIA:mainfrom
chienchunhung:codex/disagg-v2-bounded-transfer-poll

Conversation

@chienchunhung

Copy link
Copy Markdown
Collaborator

Summary

Follow-up draft PR for V2 Python transceiver bounded polling.

This keeps the V2 change separate from:

Behavior

  • Add an explicit blocking argument to V2 TxSession.wait_complete().
  • Preserve the raw TxSession API default as blocking, so existing direct transfer tests and callers keep their previous completion-barrier behavior.
  • Make KvCacheTransceiverV2.check_context_transfer_status(at_least_request_num=...) use nonblocking TxSession polling for bounded scheduler polls.
  • Treat None from the nonblocking TxSession poll as "not ready yet"; the request remains queued and is polled again later.
  • Preserve blockAll / at_least_request_num=None as a blocking wait-all path.

Relationship to #15181

This PR is related to #15181's bounded-polling contract, but it is intentionally V2/Python-transceiver only and is opened against main to keep the diff focused. It can be reviewed after or alongside #15181 without carrying the C++ transceiver changes in this PR.

Local validation

  • PYTHONPYCACHEPREFIX=/private/tmp/trtllm-pycache python -m py_compile tensorrt_llm/_torch/disaggregation/base/transfer.py tensorrt_llm/_torch/disaggregation/transceiver.py tensorrt_llm/_torch/disaggregation/native/transfer.py
  • git diff --check upstream/main..HEAD
  • PATH=/opt/miniconda3/bin:$PATH PRE_COMMIT_HOME=/private/tmp/trtllm-pre-commit pre-commit run --files tensorrt_llm/_torch/disaggregation/base/transfer.py tensorrt_llm/_torch/disaggregation/transceiver.py tensorrt_llm/_torch/disaggregation/native/transfer.py

Focused pytest attempt was blocked locally by missing dependency transformers.

Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>
Signed-off-by: Chien-Chun Hung <2679986+chienchunhung@users.noreply.github.com>

Copy link
Copy Markdown
Collaborator Author

/bot run --disable-fail-fast

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54156 [ run ] triggered by Bot. Commit: cdb81d1 Link to invocation

@tensorrt-cicd

Copy link
Copy Markdown
Collaborator

PR_Github #54156 [ run ] completed with state SUCCESS. Commit: cdb81d1
/LLM/main/L0_MergeRequest_PR pipeline #43239 completed with status: 'FAILURE'

CI Report

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

CI Agent Failure Analysis

Link to invocation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants