-
Notifications
You must be signed in to change notification settings - Fork 414
vLLM-Omni RDMA connector #1019
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
vLLM-Omni RDMA connector #1019
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: b2cc320601
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Show resolved
Hide resolved
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Show resolved
Hide resolved
ae91253 to
743d268
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 743d268753
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if ret == 0: | ||
| self.cleanup(meta.request_id) | ||
| response_queue.put((identity, TRANS_DONE)) | ||
| else: | ||
| response_queue.put((identity, TRANS_ERROR)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Release RDMA buffers when transfers fail
When batch_transfer_sync_write fails (non‑zero ret) you respond with TRANS_ERROR but never call cleanup to release the pooled buffer. That leaves the request’s ManagedBuffer and allocation in _local_buffers, so any transient RDMA error (peer down, wrong device, timeout) leaks pool space and can exhaust the pool, causing subsequent transfers to fail even after the network issue is fixed. Consider calling cleanup(meta.request_id) on the error paths (and in the exception handler) once you decide not to retry.
Useful? React with 👍 / 👎.
743d268 to
88c1ed5
Compare
|
Can we rename it? I do not remember RDMAConnector is used vllm upstream |
ae85693 to
791369e
Compare
4618a39 to
365f163
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 365f163118
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Outdated
Show resolved
Hide resolved
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Show resolved
Hide resolved
365f163 to
ef6c4cc
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: efb3b316cf
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Outdated
Show resolved
Hide resolved
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Outdated
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 36717281d6
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Outdated
Show resolved
Hide resolved
3671728 to
dd976c6
Compare
|
@codex review |
dd976c6 to
0b13984
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 0b13984aed
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
2. enable rdma test 3. add cross node testing 4. add cross node benchmark 5. update for threading issues 6. Verified support bagel using mooncake rdma (bagel relevant support will be submited in next PR) 7. update connector name/doc Signed-off-by: natureofnature <[email protected]>
0b13984 to
2158eee
Compare
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 2158eeee99
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Outdated
Show resolved
Hide resolved
vllm_omni/distributed/omni_connectors/connectors/mooncake_transfer_engine_connector.py
Show resolved
Hide resolved
|
@codex review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 467d032aed
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".
| if not self.sender_host or not self.sender_zmq_port or str(self.sender_host).lower() == "auto": | ||
| raise RuntimeError( | ||
| f"get(metadata=None) requires sender info to be resolved, " | ||
| f"but sender_host={self.sender_host!r}, sender_zmq_port={self.sender_zmq_port!r}. " | ||
| f"Call update_sender_info(host, port) before using get() without metadata." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Resolve sender endpoint before metadata-less gets
get(metadata=None) now hard-fails in _query_metadata_from_sender when sender_host/sender_zmq_port are unset, but the KV receive path (receive_kv_cache_for_request in kv_transfer_manager.py) always calls connector.get(...) without metadata and this commit does not wire update_sender_info() anywhere (repo search shows only the method definition/docs). As a result, receiver stages using MooncakeTransferEngineConnector for KV-cache polling will fail immediately instead of retrying/receiving unless users manually inject sender endpoint fields.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will be used in kv transfer manager
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Signed-off-by: natureofnature <[email protected]>
467d032 to
9eab4de
Compare
Signed-off-by: natureofnature <[email protected]>
2876db8 to
14b397f
Compare
PLEASE FILL IN THE PR DESCRIPTION HERE ENSURING ALL CHECKLIST ITEMS (AT THE BOTTOM) HAVE BEEN CONSIDERED.
Purpose
Refer to #955, provide a RDMA based transfer implementation based on mooncake transfer engine for CPU<->CPU and GPU<->GPU.
Progress
TODO
Integrate with Bagel/Qwen2.5/3 model inference
Test Plan
Test Result
Internode functionality
Cross nodes performance
Case 1: Simulated test
Using 1GB data, repeated 20 times, zero copy (using managed buffer), gpu (gpu direct transfer), copy (data -> buffer -> RDMA). Tested on H800 clusters.
Case2: Bagel AR/DIT disaggregation test
Using a text to image with prompt around 3400 tokens, which generates around 190MB KV cache between AR->DIT stages, below is the performance results.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.BEFORE SUBMITTING, PLEASE READ https://github.com/vllm-project/vllm-omni/blob/main/CONTRIBUTING.md (anything written below this line will be removed by GitHub Actions)