From 2680296fec77bf6a4f8b17e3f376e530f848acc2 Mon Sep 17 00:00:00 2001 From: xuexixi Date: Fri, 5 Sep 2025 01:37:51 +0800 Subject: [PATCH 1/2] add requirements.txt --- examples/auto_parallel/requirements.txt | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/examples/auto_parallel/requirements.txt b/examples/auto_parallel/requirements.txt index b8214d254..e7a68b28c 100644 --- a/examples/auto_parallel/requirements.txt +++ b/examples/auto_parallel/requirements.txt @@ -1,2 +1,5 @@ paddlepaddle-gpu -paddleformers +paddleformers>=0.2.0 +tensorboardX>=2.6.4 +decord>=0.6.0 +moviepy>=2.2.1 From fbf8dbd81ebbe0de35604dee896ad0f797ad73ce Mon Sep 17 00:00:00 2001 From: xuexixi Date: Fri, 5 Sep 2025 02:15:20 +0800 Subject: [PATCH 2/2] modify readme --- examples/auto_parallel/README.md | 8 ++++++++ examples/auto_parallel/README_zh.md | 6 ++++++ 2 files changed, 14 insertions(+) diff --git a/examples/auto_parallel/README.md b/examples/auto_parallel/README.md index 0564c1257..b04f68b4f 100644 --- a/examples/auto_parallel/README.md +++ b/examples/auto_parallel/README.md @@ -17,6 +17,8 @@ The CUDA driver on your machine should be ‌≥525.60.13, and the CUDA toolkit ## Runtime Environment Preparation `mpirun python -m pip install -r requirements.txt --force-reinstall` +Note: paddlepaddle-gpu version requirement: 3.2.0 or later. [install Paddle](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) + ## Start Pre-Training After the environment is ready, pre-training on 56 GPUs can be launched by: `mpirun bash train_4p5_300B_A47B.sh`, @@ -26,3 +28,9 @@ should be replaced according to the real environment. The toolkit provides an auto-parallel solution for ERNIE-4.5 pre-training, including the hybrid parallelism training strategy. More advanced optimizations are on the way. + + +Currently, the auto-parallel intermediate API has some limitations under ongoing development: + +- Limited support for MOE +- Limited support for VPP in pipeline parallelism (default USE_VPP=0 in scripts; when USE_VPP=1, basic API are used for modeling) diff --git a/examples/auto_parallel/README_zh.md b/examples/auto_parallel/README_zh.md index a86948690..477bbd4ab 100644 --- a/examples/auto_parallel/README_zh.md +++ b/examples/auto_parallel/README_zh.md @@ -17,6 +17,8 @@ ## 环境准备 `mpirun python -m pip install -r requirements.txt --force-reinstall` +注意:paddlepaddle-gpu 需要使用 3.2 版本,安装可使用[参考](https://www.paddlepaddle.org.cn/install/quick?docurl=undefined) + ## 开始训练 在准备好环境后。您可以通过执行以下命令来进行56卡预训练: `mpirun bash train_4p5_300B_A47B.sh`, @@ -24,3 +26,7 @@ - 注意,您需要将 `train_4p5_300B_A47B.sh` 中的 `master_ip` 与 `port` 根据您的环境进行替换。 该工具包提供了使用自动并行完成 ERNIE-4.5 预训练的方法,包括多维混合并行训练策略,更多的优化点和功能会基于此版本持续更新。 + +现在自动并行中层API存在一些局限性,正在进一步支持: +- 对 MOE 的支持不完备 +- 对流水线并行中的 VPP 优化支持不完备(脚本中默认 USE_VPP=0;当设置 USE_VPP=1 时,采用基础API完成组网)