[fix] handle FSDP DTensor in broadcast_from_megatron_pp by yxs · Pull Request #113 · ISEEKYAN/mbridge

yxs · 2026-04-02T05:29:35Z

What does this PR do?

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When export_weights broadcasts params across PP ranks, torch.distributed.broadcast() triggers DTensor dispatch and fails:

AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!

Fix: call DTensor.full_tensor() to materialize the full parameter before broadcasting. Backward compatible, no-op for non-FSDP parameters.

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When export_weights broadcasts params across PP ranks, torch.distributed.broadcast() triggers DTensor dispatch and fails because the PP group is not in the DTensor's DeviceMesh. Fix: call DTensor.full_tensor() to materialize the full parameter before broadcasting.

yxs · 2026-04-03T08:48:31Z

@ISEEKYAN could you please take a look.

yxs mentioned this pull request Apr 9, 2026

[megatron] feat: enable Megatron FSDP for SFT training verl-project/verl#5854

Open

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113
yxs wants to merge 1 commit intoISEEKYAN:mainfrom
yxs:fix/fsdp-dtensor-broadcast

yxs commented Apr 2, 2026

Uh oh!

yxs commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

yxs commented Apr 2, 2026

What does this PR do?

Uh oh!

yxs commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant