Skip to content

[Miles] Update P2P RDMA weight sync with flashinfer compatibility and NIC affinity#964

Open
wduan-hai wants to merge 1 commit intoradixark:mainfrom
wduan-hai:wduan/p2p-flashinfer-restore
Open

[Miles] Update P2P RDMA weight sync with flashinfer compatibility and NIC affinity#964
wduan-hai wants to merge 1 commit intoradixark:mainfrom
wduan-hai:wduan/p2p-flashinfer-restore

Conversation

@wduan-hai
Copy link
Copy Markdown

@wduan-hai wduan-hai commented Apr 9, 2026

Update P2P RDMA weight sync with flashinfer compatibility and NIC affinity.

Tested on 4 node with Qwen3-30B-A3B, 16 node with Qwen3-235B-A22B over RoCE.

This depends on sgl-project/sglang#22468

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances the P2P weight transfer mechanism by introducing GPU-affine InfiniBand NIC binding, increasing the transfer timeout, and improving CPU-side model loading to prevent OOM issues. It also adds necessary weight post-processing steps and robust parameter mapping. Regarding the review feedback, I recommend verifying the idempotency of post_process_weights to ensure that calling it in both _pause_and_prepare_engines and _finalize_and_resume_engines is safe, and considering whether silent skipping of unmapped parameters in _get_transfer_ready_params should be replaced with a warning or error to prevent silent model state inconsistencies.

Comment on lines +121 to +125
if dist.get_rank() == 0:
post_process_weights(
rollout_engines=self.rollout_engines,
restore_weights_before_load=True,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The post_process_weights call is performed inside _pause_and_prepare_engines and again in _finalize_and_resume_engines. If post_process_weights is idempotent, this is fine, but if it performs stateful operations, this could lead to unexpected behavior. Please ensure that the logic is safe to call multiple times or consider if the call in _pause_and_prepare_engines is necessary given the one in _finalize_and_resume_engines.

Comment on lines +335 to +336
if mapped_result is None:
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The check if mapped_result is None: continue is added to handle cases where mapping fails. Ensure that this silent skipping of parameters does not lead to incomplete weight updates or inconsistencies in the model state, as it might be safer to log a warning or raise an error if a parameter is expected to be mapped.

@JD-ETH
Copy link
Copy Markdown
Contributor

JD-ETH commented Apr 10, 2026

you are making a lot of changes at once.

  • flashinfer double process; can you verify this is still needed? we added post processing call now
  • _get_transfer_ready_params skip is unsafe
  • it's unclear to me why you are only calling some of the processing on rank 0
  • GPU affinity looks reasonable though.

All my previous experiments were run on H100, so I do expect issues to come up for GB.
can you separate the issues and maybe start by getting a config that can work for GB?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants