Skip to content

aggregators in a distributed future #987

@mahf708

Description

@mahf708

Originally posted by @mahf708 in #975 (comment)

thoughts from @mcgibbon

  • would we consider "implementing" spatial aggregators for now by doing a spatial gather at the top level of the aggregator? And then just running the existing aggregators with reduction along the data-parallel dim (so that the spatial roots gather together to the global root naturally)?
  • That's basically what 975 does for grad mag percent diff, but we could use it as a temporary solution for all aggregators to work, and should be enough for this case we're talking about where we can store global data but can't store global data for the entire latent space and gradients (i.e. for training)
  • Honestly, we may want to do this long-term, except that we may want to have it spatial gather one variable at a time, run all the aggregators on it, and iterate over the variables. Even at 3km resolution, it's only ~300MB per variable per timestep at float32 precision.
  • The bigger issue would be, some (many?) metrics are probably faster to get by doing compute on all the nodes instead of having root do all the compute

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions