-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
Originally posted by @mahf708 in #975 (comment)
thoughts from @mcgibbon
- would we consider "implementing" spatial aggregators for now by doing a spatial gather at the top level of the aggregator? And then just running the existing aggregators with reduction along the data-parallel dim (so that the spatial roots gather together to the global root naturally)?
- That's basically what 975 does for grad mag percent diff, but we could use it as a temporary solution for all aggregators to work, and should be enough for this case we're talking about where we can store global data but can't store global data for the entire latent space and gradients (i.e. for training)
- Honestly, we may want to do this long-term, except that we may want to have it spatial gather one variable at a time, run all the aggregators on it, and iterate over the variables. Even at 3km resolution, it's only ~300MB per variable per timestep at float32 precision.
- The bigger issue would be, some (many?) metrics are probably faster to get by doing compute on all the nodes instead of having root do all the compute
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels