Estimate redistribute_cost by _TransformInfo #230

mori360 · 2025-11-04T22:27:07Z

Estimate redistribute_cost by _TransformInfo rather than comparing source and destination state
Here are the changes:

use _TransformInfo to collect shard order
use _TransformInfo to explore the path during redistribute. The previous approach only considers the source and destination state and a one-step redistribute for each placement combination, however, it could be incorrect in some cases:
a. S(0)S(0) -> S(0)R, need 1 allgather
b. S(0)S(0) -> RS(0), need 2 allgather, which could not be found if only care S(0)->R
use _TransformInfo.logical_shape to estimate comm_byte. The current comm_bytes_gb is based on tensor shape and number of shards. In case 2.b, the comm_byte for 2 allgather is different.

TODO:
There are some compute_cost with comm_bytes_gb, need to verify whether they could return the expected cost.

fmassa · 2025-11-06T17:19:01Z

autoparallel/collective_runtime_estimation.py

-        if current == target:
-            continue
-        num_devices_on_mesh_dim = mesh_topo.mesh_dim_devices[i]
+    for transform_info in transform_infos:


In general, what we would like to have here I think is the minimal redistribution cost over all possible input/output orderings.

This is to ensure that we don't have to increase the search space for AutoParallel when performing the optimization, as we can focus only on the shardings (without order) and then optimize the ordering afterwards.

Does it make sense?

mori360 added 2 commits November 4, 2025 14:18

redistribtue cost

f42b103

add is_contiguous

673ae1e

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Nov 4, 2025

lint

768891e

mori360 requested a review from fmassa November 5, 2025 21:00

fmassa reviewed Nov 6, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Estimate redistribute_cost by _TransformInfo #230

Estimate redistribute_cost by _TransformInfo #230

Uh oh!

mori360 commented Nov 4, 2025 •

edited

Loading

Uh oh!

fmassa Nov 6, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Estimate redistribute_cost by _TransformInfo #230

Are you sure you want to change the base?

Estimate redistribute_cost by _TransformInfo #230

Uh oh!

Conversation

mori360 commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fmassa Nov 6, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mori360 commented Nov 4, 2025 •

edited

Loading