Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions examples/auto_parallel/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,8 @@ The CUDA driver on your machine should be ‌≥525.60.13, and the CUDA toolkit
## Runtime Environment Preparation
`mpirun python -m pip install -r requirements.txt --force-reinstall`

Note: paddlepaddle-gpu version requirement: 3.2.0 or later. [install Paddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)

## Start Pre-Training
After the environment is ready, pre-training on 56 GPUs can be launched by:
`mpirun bash train_4p5_300B_A47B.sh`,
Expand All @@ -26,3 +28,9 @@ should be replaced according to the real environment.


The toolkit provides an auto-parallel solution for ERNIE-4.5 pre-training, including the hybrid parallelism training strategy. More advanced optimizations are on the way.


Currently, the auto-parallel intermediate API has some limitations under ongoing development:

- Limited support for MOE
- Limited support for VPP in pipeline parallelism (default USE_VPP=0 in scripts; when USE_VPP=1, basic API are used for modeling)
6 changes: 6 additions & 0 deletions examples/auto_parallel/README_zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,16 @@
## 环境准备
`mpirun python -m pip install -r requirements.txt --force-reinstall`

注意:paddlepaddle-gpu 需要使用 3.2 版本,安装可使用[参考](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined)

## 开始训练
在准备好环境后。您可以通过执行以下命令来进行56卡预训练:
`mpirun bash train_4p5_300B_A47B.sh`,

- 注意,您需要将 `train_4p5_300B_A47B.sh` 中的 `master_ip` 与 `port` 根据您的环境进行替换。

该工具包提供了使用自动并行完成 ERNIE-4.5 预训练的方法,包括多维混合并行训练策略,更多的优化点和功能会基于此版本持续更新。

现在自动并行中层API存在一些局限性,正在进一步支持:
- 对 MOE 的支持不完备
- 对流水线并行中的 VPP 优化支持不完备(脚本中默认 USE_VPP=0;当设置 USE_VPP=1 时,采用基础API完成组网)
5 changes: 4 additions & 1 deletion examples/auto_parallel/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
paddlepaddle-gpu
paddleformers
paddleformers>=0.2.0
tensorboardX>=2.6.4
decord>=0.6.0
moviepy>=2.2.1