Skip to content

Conversation

@mori360
Copy link

@mori360 mori360 commented Nov 4, 2025

Estimate redistribute_cost by _TransformInfo rather than comparing source and destination state
Here are the changes:

  1. use _TransformInfo to collect shard order
  2. use _TransformInfo to explore the path during redistribute. The previous approach only considers the source and destination state and a one-step redistribute for each placement combination, however, it could be incorrect in some cases:
    a. S(0)S(0) -> S(0)R, need 1 allgather
    b. S(0)S(0) -> RS(0), need 2 allgather, which could not be found if only care S(0)->R
  3. use _TransformInfo.logical_shape to estimate comm_byte. The current comm_bytes_gb is based on tensor shape and number of shards. In case 2.b, the comm_byte for 2 allgather is different.

TODO:
There are some compute_cost with comm_bytes_gb, need to verify whether they could return the expected cost.

@meta-cla meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025
@mori360 mori360 requested a review from fmassa November 5, 2025 21:00
if current == target:
continue
num_devices_on_mesh_dim = mesh_topo.mesh_dim_devices[i]
for transform_info in transform_infos:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general, what we would like to have here I think is the minimal redistribution cost over all possible input/output orderings.

This is to ensure that we don't have to increase the search space for AutoParallel when performing the optimization, as we can focus only on the shardings (without order) and then optimize the ordering afterwards.

Does it make sense?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants