Update gp utils #1507

misko · 2025-09-22T17:42:19Z

Refactor and update gputil functions

Exclude as much possible to outside of our custom functions, this allows compiler to optimize
Optimize padding operation
Use reduce_scatter which is faster than all_reduce in all_gather backward
Implement special all_gather backwards for GLOO only , for CPU GP + GP tests
Update test cases to use these functions
Add asynchronous version of all_gather in preparation of any overlapped compute + comms implementation
Combine two sequential all_reduce calls in EFS head into one all_reduce call

Change layers to always output atom embeddings for the full system. Currently in GP mode we output a tensor the size of only the local atoms. This makes it hard to implement anything to overlap communication and computation.

If we keep the current implementation you cannot chunk the communication and do compute in between.
In this new implementation there is an additional all_reduce required to synchronize energy across systems (systems x 1) float

misko added 2 commits September 21, 2025 02:43

passing gp_utils test, removed old sum_grad and all_gather

973095c

progress for gp port

c81ffbe

meta-cla bot added the cla signed label Sep 22, 2025

misko added 2 commits September 22, 2025 18:49

fixes

a028b19

fixes

9c4611f

misko added enhancement New feature or request minor Minor version release labels Sep 22, 2025

misko added 8 commits September 22, 2025 23:35

fixes

f3b2ff6

fix ac

b684116

Merge branch 'main' into update_gp_utils

65341c4

cleanup unused functions

e9d5c0e

revert changes to umaspeed yaml

6bfe735

cleanup

1941e4c

cleanup

a0a21cf

cleanup

0ec2290

misko marked this pull request as ready for review September 23, 2025 23:17

Merge branch 'main' into update_gp_utils

24376f9

misko requested review from lbluque and rayg1234 September 23, 2025 23:17

mergemain

81e1b85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Update gp utils #1507

Update gp utils #1507

misko commented Sep 22, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Update gp utils #1507

Are you sure you want to change the base?

Update gp utils #1507

Conversation

misko commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

misko commented Sep 22, 2025 •

edited

Loading