Skip to content

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113

Open
yxs wants to merge 1 commit intoISEEKYAN:mainfrom
yxs:fix/fsdp-dtensor-broadcast
Open

[fix] handle FSDP DTensor in broadcast_from_megatron_pp#113
yxs wants to merge 1 commit intoISEEKYAN:mainfrom
yxs:fix/fsdp-dtensor-broadcast

Conversation

@yxs
Copy link
Copy Markdown

@yxs yxs commented Apr 2, 2026

What does this PR do?

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When export_weights broadcasts params across PP ranks, torch.distributed.broadcast() triggers DTensor dispatch and fails:

AssertionError: found no DeviceMesh from dtensor args for c10d.broadcast_.default!

Fix: call DTensor.full_tensor() to materialize the full parameter before broadcasting. Backward compatible, no-op for non-FSDP parameters.

Megatron FSDP (ZeRO-3) stores parameters as DTensors. When
export_weights broadcasts params across PP ranks,
torch.distributed.broadcast() triggers DTensor dispatch and fails
because the PP group is not in the DTensor's DeviceMesh.

Fix: call DTensor.full_tensor() to materialize the full parameter
before broadcasting.
@yxs
Copy link
Copy Markdown
Author

yxs commented Apr 3, 2026

@ISEEKYAN could you please take a look.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant