Implement Zero Bubble (ZB-1H) scheduling into FlagScale, which splits BW into B and W. #405

Corle-hyz · 2025-03-11T12:03:22Z

To utilize ZB-H1, these changes need to be specified in the demo.yaml file:

- sequence_parallel: True
+ sequence_parallel: False

- transformer_impl: transformer_engine
+ transformer_impl: local

+ enable_zero_bubble: True

- normalization: RMSNorm
+ normalization: LayerNorm

The figure below shows the profiling result of ZB-H1, where the weight_grad_store: pop()/pop_all() kernels indicate the backward process of weight.

… BW into B and W.

megatron/megatron/core/weight_grad_store.py

megatron/megatron/core/pipeline_parallel/schedules.py

megatron/megatron/core/weight_grad_store.py

megatron/megatron/training/arguments.py

Implement Zero Bubble (ZB-1H) scheduling into FlagScale, which splits…

1a78809

… BW into B and W.

Corle-hyz requested review from heavyrain-lzy, aoyulong and zhaoyinglia as code owners March 11, 2025 12:03

heavyrain-lzy reviewed Mar 12, 2025

View reviewed changes

Corle-hyz and others added 2 commits March 19, 2025 10:21

Merge branch 'FlagOpen:main' into zb-pr

a4808b6

Improve some impls of zero bubble

6253efa

lxd-cumt requested a review from a team as a code owner March 19, 2025 02:53

fix to pass megatron unit test

0d9494e

Provide feedback