Skip to content

Conversation

@natureofnature
Copy link
Contributor

@natureofnature natureofnature commented Jan 28, 2026

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Refer to #955, provide a RDMA based transfer implementation based on mooncake transfer engine for CPU<->CPU and GPU<->GPU.

Progress

  1. Added d2d connector
  2. Enable rdma test
  3. Added cross node testing and benchmark
  4. Support Bagel (AR->DIT) disaggregation (relevant modifications will be release in next PR)

TODO

Integrate with Bagel/Qwen2.5/3 model inference

Test Plan

  1. Inter node test, 3 modes (serialization/deserialization, cpu pin memory, gpu pin memory for CI)
  2. Cross node test, 3 modes. (for performance testing)
  3. Inter node model test (bagel, qwen2.5 omni, qwen3 omni etc.)
  4. Cross node model test (bagel, qwen2.5 omni, qwen3 omni etc.)

Test Result

Internode functionality

  1. test_buffer_management.py passed
  2. test_mooncake_rdma.py passed

Cross nodes performance

Case 1: Simulated test

Using 1GB data, repeated 20 times, zero copy (using managed buffer), gpu (gpu direct transfer), copy (data -> buffer -> RDMA). Tested on H800 clusters.

Mode Throughput Efficiency (vs 45 GB/s, the maximum bandwidth on tested servers)
zero copy 33.7 GB/s 75%
gpu 25.6 GB/s 57%
copy 13.2 GB/s 29%

Case2: Bagel AR/DIT disaggregation test

Using a text to image with prompt around 3400 tokens, which generates around 190MB KV cache between AR->DIT stages, below is the performance results.

Stage Mooncake Store RDMA with serialize/deserialize/memory copy RDMA CPU zero-copy RDMA GPU RDMA Direct
AR (Stage-0) 102 ms 99 ms 100 ms 106 ms
AR→DIT Data transmission 810 ms 300 ms 14 ms 14 ms
DIT (Stage-1) 10,083 ms 10,240 ms 10,077 ms 10,039 ms
Other time cost 65 ms 104 ms 103 ms 93 ms
E2E time 11,060 ms 10,743 ms 10,294 ms 10,252 ms
Overal performance gain baseline ~3% >7% >7%

Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2cc320601

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature natureofnature force-pushed the d2d_connector branch 2 times, most recently from ae91253 to 743d268 Compare January 28, 2026 10:07
@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 743d268753

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 684 to 1166
if ret == 0:
self.cleanup(meta.request_id)
response_queue.put((identity, TRANS_DONE))
else:
response_queue.put((identity, TRANS_ERROR))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Release RDMA buffers when transfers fail

When batch_transfer_sync_write fails (non‑zero ret) you respond with TRANS_ERROR but never call cleanup to release the pooled buffer. That leaves the request’s ManagedBuffer and allocation in _local_buffers, so any transient RDMA error (peer down, wrong device, timeout) leaks pool space and can exhaust the pool, causing subsequent transfers to fail even after the network issue is fixed. Consider calling cleanup(meta.request_id) on the error paths (and in the exception handler) once you decide not to retry.

Useful? React with 👍 / 👎.

@hsliuustc0106
Copy link
Collaborator

Can we rename it? I do not remember RDMAConnector is used vllm upstream

@natureofnature natureofnature force-pushed the d2d_connector branch 2 times, most recently from ae85693 to 791369e Compare February 10, 2026 02:03
@Gaohan123 Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026
@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 365f163118

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efb3b316cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36717281d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature
Copy link
Contributor Author

@codex review

@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b13984aed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

2. enable rdma test
3. add cross node testing
4. add cross node benchmark
5. update for threading issues
6. Verified support bagel using mooncake rdma (bagel relevant support
   will be submited in next PR)
7. update connector name/doc

Signed-off-by: natureofnature <[email protected]>
@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2158eeee99

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

@natureofnature
Copy link
Contributor Author

@codex review

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 467d032aed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines 695 to 699
if not self.sender_host or not self.sender_zmq_port or str(self.sender_host).lower() == "auto":
raise RuntimeError(
f"get(metadata=None) requires sender info to be resolved, "
f"but sender_host={self.sender_host!r}, sender_zmq_port={self.sender_zmq_port!r}. "
f"Call update_sender_info(host, port) before using get() without metadata."

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve sender endpoint before metadata-less gets

get(metadata=None) now hard-fails in _query_metadata_from_sender when sender_host/sender_zmq_port are unset, but the KV receive path (receive_kv_cache_for_request in kv_transfer_manager.py) always calls connector.get(...) without metadata and this commit does not wire update_sender_info() anywhere (repo search shows only the method definition/docs). As a result, receiver stages using MooncakeTransferEngineConnector for KV-cache polling will fail immediately instead of retrying/receiving unless users manually inject sender endpoint fields.

Useful? React with 👍 / 👎.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will be used in kv transfer manager

@natureofnature natureofnature changed the title [WIP]vLLM-Omni RDMA connector vLLM-Omni RDMA connector Feb 11, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: natureofnature <[email protected]>
Signed-off-by: natureofnature <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants