vLLM-Omni RDMA connector #1019

natureofnature · 2026-01-28T09:18:02Z

PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.

Purpose

Refer to #955, provide a RDMA based transfer implementation based on mooncake transfer engine for CPU<->CPU and GPU<->GPU.

Progress

Added d2d connector
Enable rdma test
Added cross node testing and benchmark
Support Bagel (AR->DIT) disaggregation (relevant modifications will be release in next PR)

TODO

Integrate with Bagel/Qwen2.5/3 model inference

Test Plan

Inter node test, 3 modes (serialization/deserialization, cpu pin memory, gpu pin memory for CI)
Cross node test, 3 modes. (for performance testing)
Inter node model test (bagel, qwen2.5 omni, qwen3 omni etc.)
Cross node model test (bagel, qwen2.5 omni, qwen3 omni etc.)

Test Result

Internode functionality

test_buffer_management.py passed
test_mooncake_rdma.py passed

Cross nodes performance

Case 1: Simulated test

Using 1GB data, repeated 20 times, zero copy (using managed buffer), gpu (gpu direct transfer), copy (data -> buffer -> RDMA). Tested on H800 clusters.

Mode	Throughput	Efficiency (vs 45 GB/s, the maximum bandwidth on tested servers)
zero copy	33.7 GB/s	75%
gpu	25.6 GB/s	57%
copy	13.2 GB/s	29%

Case2: Bagel AR/DIT disaggregation test

Using a text to image with prompt around 3400 tokens, which generates around 190MB KV cache between AR->DIT stages, below is the performance results.

Stage	Mooncake Store	RDMA with serialize/deserialize/memory copy	RDMA CPU zero-copy	RDMA GPU RDMA Direct
AR (Stage-0)	102 ms	99 ms	100 ms	106 ms
AR→DIT Data transmission	810 ms	300 ms	14 ms	14 ms
DIT (Stage-1)	10,083 ms	10,240 ms	10,077 ms	10,039 ms
Other time cost	65 ms	104 ms	103 ms	93 ms
E2E time	11,060 ms	10,743 ms	10,294 ms	10,252 ms
Overal performance gain	baseline	~3%	>7%	>7%

Essential Elements of an Effective PR Description Checklist

The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
The test plan, such as providing test command.
The test results, such as pasting the results comparison before and after, or e2e results
(Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
(Optional) Release notes update. If your change is user facing, please update the release notes draft.

BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b2cc320601

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

natureofnature · 2026-01-28T10:08:08Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 743d268753

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-01-28T10:16:27Z

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

+            if ret == 0:
+                self.cleanup(meta.request_id)
+                response_queue.put((identity, TRANS_DONE))
+            else:
+                response_queue.put((identity, TRANS_ERROR))


Release RDMA buffers when transfers fail

When batch_transfer_sync_write fails (non‑zero ret) you respond with TRANS_ERROR but never call cleanup to release the pooled buffer. That leaves the request’s ManagedBuffer and allocation in _local_buffers, so any transient RDMA error (peer down, wrong device, timeout) leaks pool space and can exhaust the pool, causing subsequent transfers to fail even after the network issue is fixed. Consider calling cleanup(meta.request_id) on the error paths (and in the exception handler) once you decide not to retry.

Useful? React with 👍 / 👎.

hsliuustc0106 · 2026-02-08T14:06:40Z

Can we rename it? I do not remember RDMAConnector is used vllm upstream

natureofnature · 2026-02-10T03:45:38Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 365f163118

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

natureofnature · 2026-02-11T04:54:28Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: efb3b316cf

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

natureofnature · 2026-02-11T05:33:40Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 36717281d6

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/__init__.py

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

natureofnature · 2026-02-11T07:53:37Z

@codex review

natureofnature · 2026-02-11T07:57:02Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0b13984aed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/utils/initialization.py

vllm_omni/distributed/__init__.py

2. enable rdma test 3. add cross node testing 4. add cross node benchmark 5. update for threading issues 6. Verified support bagel using mooncake rdma (bagel relevant support will be submited in next PR) 7. update connector name/doc Signed-off-by: natureofnature <[email protected]>

natureofnature · 2026-02-11T09:47:23Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2158eeee99

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

natureofnature · 2026-02-11T10:50:33Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 467d032aed

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-02-11T10:57:54Z

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py

+        if not self.sender_host or not self.sender_zmq_port or str(self.sender_host).lower() == "auto":
+            raise RuntimeError(
+                f"get(metadata=None) requires sender info to be resolved, "
+                f"but sender_host={self.sender_host!r}, sender_zmq_port={self.sender_zmq_port!r}. "
+                f"Call update_sender_info(host, port) before using get() without metadata."


Resolve sender endpoint before metadata-less gets

get(metadata=None) now hard-fails in _query_metadata_from_sender when sender_host/sender_zmq_port are unset, but the KV receive path (receive_kv_cache_for_request in kv_transfer_manager.py) always calls connector.get(...) without metadata and this commit does not wire update_sender_info() anywhere (repo search shows only the method definition/docs). As a result, receiver stages using MooncakeTransferEngineConnector for KV-cache polling will fail immediately instead of retrying/receiving unless users manually inject sender endpoint fields.

Useful? React with 👍 / 👎.

will be used in kv transfer manager

Copilot

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Signed-off-by: natureofnature <[email protected]>

natureofnature requested a review from hsliuustc0106 as a code owner January 28, 2026 09:18

chatgpt-codex-connector bot reviewed Jan 28, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Show resolved Hide resolved

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Show resolved Hide resolved

natureofnature force-pushed the d2d_connector branch 2 times, most recently from ae91253 to 743d268 Compare January 28, 2026 10:07

chatgpt-codex-connector bot reviewed Jan 28, 2026

View reviewed changes

princepride mentioned this pull request Jan 29, 2026

[RFC]: Bagel deployment #936

Open

14 tasks

natureofnature force-pushed the d2d_connector branch from 743d268 to 88c1ed5 Compare February 1, 2026 13:02

This was referenced Feb 4, 2026

[RFC]: Omni Connector for Full Disaggregation Architecture 2026 Q1 Roadmap #1192

Open

[RFC]: vLLM-Omni RDMA connector Feature Design JiusiServe/vllm-omni#91

Open

natureofnature force-pushed the d2d_connector branch 2 times, most recently from ae85693 to 791369e Compare February 10, 2026 02:03

Gaohan123 added this to the v0.16.0 milestone Feb 10, 2026

natureofnature force-pushed the d2d_connector branch from 4618a39 to 365f163 Compare February 10, 2026 03:34

chatgpt-codex-connector bot reviewed Feb 10, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Outdated Show resolved Hide resolved

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Show resolved Hide resolved

natureofnature force-pushed the d2d_connector branch from 365f163 to ef6c4cc Compare February 10, 2026 10:23

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Outdated Show resolved Hide resolved

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Outdated Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/__init__.py Outdated Show resolved Hide resolved

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Outdated Show resolved Hide resolved

natureofnature force-pushed the d2d_connector branch from 3671728 to dd976c6 Compare February 11, 2026 07:50

natureofnature force-pushed the d2d_connector branch from dd976c6 to 0b13984 Compare February 11, 2026 07:56

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/utils/initialization.py Outdated Show resolved Hide resolved

vllm_omni/distributed/__init__.py Show resolved Hide resolved

natureofnature force-pushed the d2d_connector branch from 0b13984 to 2158eee Compare February 11, 2026 09:46

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Outdated Show resolved Hide resolved

vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py Show resolved Hide resolved

chatgpt-codex-connector bot reviewed Feb 11, 2026

View reviewed changes

natureofnature changed the title ~~[WIP]vLLM-Omni RDMA connector~~ vLLM-Omni RDMA connector Feb 11, 2026

hsliuustc0106 requested review from Copilot and princepride February 11, 2026 12:22

Copilot AI reviewed Feb 11, 2026

View reviewed changes

update connector

9eab4de

Signed-off-by: natureofnature <[email protected]>

natureofnature force-pushed the d2d_connector branch from 467d032 to 9eab4de Compare February 11, 2026 14:24

added a todo, update documents

14b397f

Signed-off-by: natureofnature <[email protected]>

natureofnature force-pushed the d2d_connector branch from 2876db8 to 14b397f Compare February 11, 2026 14:52

Copilot started reviewing on behalf of hsliuustc0106 February 11, 2026 17:50 View session

vLLM-Omni RDMA connector #1019

Are you sure you want to change the base?

vLLM-Omni RDMA connector #1019

Conversation

natureofnature commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Progress

TODO

Test Plan

Test Result

Internode functionality

Cross nodes performance

Case 1: Simulated test

Case2: Bagel AR/DIT disaggregation test

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Jan 28, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

hsliuustc0106 commented Feb 8, 2026

Uh oh!

natureofnature commented Feb 10, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

natureofnature commented Feb 11, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

natureofnature commented Jan 28, 2026 •

edited

Loading