Skip to content

Commit 6e9e0f0

Browse files
authored
1 parent 23ef2a7 commit 6e9e0f0

File tree

13 files changed

+92
-36
lines changed

13 files changed

+92
-36
lines changed

applications/question_answering/unsupervised_qa/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -356,7 +356,7 @@ python -u -m paddle.distributed.launch --gpus "1,2" --log_dir log/question_gener
356356
--epochs=20 \
357357
--batch_size=16 \
358358
--learning_rate=1e-5 \
359-
--warmup_propotion=0.02 \
359+
--warmup_proportion=0.02 \
360360
--weight_decay=0.01 \
361361
--max_seq_len=512 \
362362
--max_target_len=30 \
@@ -391,7 +391,7 @@ python -u -m paddle.distributed.launch --gpus "1,2" --log_dir log/question_gener
391391
- `batch_size` 表示每次迭代**每张卡**上的样本数目。
392392
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
393393
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
394-
- `warmup_propotion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例。
394+
- `warmup_proportion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例。
395395
- `max_seq_len` 模型输入序列的最大长度。
396396
- `max_target_len` 模型训练时标签的最大长度。
397397
- `min_dec_len` 模型生成序列的最小长度。

applications/question_answering/unsupervised_qa/finetune/question_generation/train.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -48,7 +48,7 @@ def parse_args():
4848
parser.add_argument('--learning_rate', type=float, default=5e-5, help='The initial learning rate.')
4949
parser.add_argument('--weight_decay', type=float, default=0.01, help='The weight decay for optimizer.')
5050
parser.add_argument('--epochs', type=int, default=3, help='Total number of training epochs to perform.')
51-
parser.add_argument('--warmup_propotion', type=float, default=0.02, help='The number of warmup steps.')
51+
parser.add_argument('--warmup_proportion', type=float, default=0.02, help='The number of warmup steps.')
5252
parser.add_argument('--max_grad_norm', type=float, default=1.0, help='The max value of grad norm.')
5353
parser.add_argument('--beta1', type=float, default=0.9, help='beta1')
5454
parser.add_argument('--beta2', type=float, default=0.98, help='beta2')
@@ -153,7 +153,7 @@ def run(args):
153153
if args.do_train:
154154
num_training_steps = args.epochs * len(train_data_loader)
155155

156-
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_propotion)
156+
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_proportion)
157157
# Generate parameter names needed to perform weight decay.
158158
# All bias and LayerNorm parameters are excluded.
159159

applications/text_summarization/finetune/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -206,7 +206,7 @@ python -m paddle.distributed.launch --gpus "2,3,4,5,6,7" train.py \
206206
- `eval_batch_size` 表示每次验证**每张卡**上的样本数目。
207207
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
208208
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
209-
- `warmup_propotion`
209+
- `warmup_proportion`
210210
表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
211211
212212
- `max_source_length` 模型输入序列的最大长度。

applications/text_summarization/pretrain/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -152,7 +152,7 @@ python -m paddle.distributed.launch --gpus "2,3,4,5,6,7" train.py \
152152
- `eval_batch_size` 表示每次验证**每张卡**上的样本数目。
153153
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
154154
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
155-
- `warmup_propotion`
155+
- `warmup_proportion`
156156
表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
157157
- `max_source_length` 模型输入序列的最大长度。
158158
- `max_target_length` 模型训练时标签的最大长度。

examples/code_generation/codegen/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -237,7 +237,7 @@ python -m paddle.distributed.launch --gpus 0,1 run_clm.py \
237237
- `train_batch_size` 表示训练时**每张卡**上的样本数目。
238238
- `eval_batch_size` 表示测试时**每张卡**上的样本数目。
239239
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
240-
- `warmup_propotion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)。
240+
- `warmup_proportion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)。
241241
- `device` 表示使用的设备,从gpu和cpu中选择。
242242
243243
可通过`bash run_clm.sh`启动训练,更多参数详情和参数的默认值请参考`run_clm.py`

examples/question_generation/unimo-text/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -198,7 +198,7 @@ python -m paddle.distributed.launch --gpus "1,2" --log_dir ./unimo/finetune/log
198198
--epochs=20 \
199199
--batch_size=16 \
200200
--learning_rate=1e-5 \
201-
--warmup_propotion=0.02 \
201+
--warmup_proportion=0.02 \
202202
--weight_decay=0.01 \
203203
--max_seq_len=512 \
204204
--max_target_len=30 \
@@ -239,7 +239,7 @@ python -m paddle.distributed.launch --gpus "1,2" --log_dir ./unimo/finetune/log
239239
- `batch_size` 表示每次迭代**每张卡**上的样本数目。
240240
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
241241
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
242-
- `warmup_propotion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例。
242+
- `warmup_proportion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例。
243243
- `max_seq_len` 模型输入序列的最大长度。
244244
- `max_target_len` 模型训练时标签的最大长度。
245245
- `min_dec_len` 模型生成序列的最小长度。

examples/question_generation/unimo-text/train.py

+11-11
Original file line numberDiff line numberDiff line change
@@ -12,25 +12,25 @@
1212
# See the License for the specific language governing permissions and
1313
# limitations under the License.
1414

15-
import os
16-
import time
17-
import math
1815
import argparse
1916
import json
20-
import copy
17+
import os
18+
import time
2119

2220
import paddle
2321
import paddle.distributed as dist
24-
import paddle.nn as nn
2522
import paddle.nn.functional as F
26-
from paddlenlp.transformers import LinearDecayWithWarmup
23+
from gen_utils import create_data_loader, print_args, select_sum, set_seed
2724
from paddle.optimizer import AdamW
2825

2926
from paddlenlp.datasets import load_dataset
30-
from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer, BasicTokenizer
3127
from paddlenlp.metrics import BLEU
32-
33-
from gen_utils import print_args, set_seed, create_data_loader, select_sum
28+
from paddlenlp.transformers import (
29+
BasicTokenizer,
30+
LinearDecayWithWarmup,
31+
UNIMOLMHeadModel,
32+
UNIMOTokenizer,
33+
)
3434

3535

3636
# yapf: disable
@@ -48,7 +48,7 @@ def parse_args():
4848
parser.add_argument('--learning_rate', type=float, default=5e-5, help='The initial learning rate.')
4949
parser.add_argument('--weight_decay', type=float, default=0.01, help='The weight decay for optimizer.')
5050
parser.add_argument('--epochs', type=int, default=3, help='Total number of training epochs to perform.')
51-
parser.add_argument('--warmup_propotion', type=float, default=0.02, help='The number of warmup steps.')
51+
parser.add_argument('--warmup_proportion', type=float, default=0.02, help='The number of warmup steps.')
5252
parser.add_argument('--max_grad_norm', type=float, default=1.0, help='The max value of grad norm.')
5353
parser.add_argument('--beta1', type=float, default=0.9, help='beta1')
5454
parser.add_argument('--beta2', type=float, default=0.98, help='beta2')
@@ -153,7 +153,7 @@ def run(args):
153153
if args.do_train:
154154
num_training_steps = args.epochs * len(train_data_loader)
155155

156-
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_propotion)
156+
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_proportion)
157157
# Generate parameter names needed to perform weight decay.
158158
# All bias and LayerNorm parameters are excluded.
159159

examples/text_generation/unimo-text/README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -62,7 +62,7 @@ python -m paddle.distributed.launch --gpus "0" --log_dir ./log run_gen.py \
6262
--epochs=6 \
6363
--batch_size=16 \
6464
--learning_rate=5e-5 \
65-
--warmup_propotion=0.02 \
65+
--warmup_proportion=0.02 \
6666
--weight_decay=0.01 \
6767
--max_seq_len=512 \
6868
--max_target_len=30 \
@@ -91,7 +91,7 @@ python -m paddle.distributed.launch --gpus "0" --log_dir ./log run_gen.py \
9191
- `batch_size` 表示每次迭代**每张卡**上的样本数目。
9292
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
9393
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
94-
- `warmup_propotion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
94+
- `warmup_proportion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
9595
- `max_seq_len` 模型输入序列的最大长度。
9696
- `max_target_len` 模型训练时标签的最大长度。
9797
- `min_dec_len` 模型生成序列的最小长度。

examples/text_generation/unimo-text/run_gen.py

+24-10
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,35 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
15+
import argparse
116
import os
217
import time
3-
import math
4-
import argparse
5-
import json
618

719
import paddle
820
import paddle.distributed as dist
9-
import paddle.nn as nn
1021
import paddle.nn.functional as F
11-
from paddlenlp.transformers import LinearDecayWithWarmup
22+
from gen_utils import create_data_loader, print_args, select_sum, set_seed
1223
from paddle.optimizer import AdamW
1324

1425
from paddlenlp.datasets import load_dataset
15-
from paddlenlp.transformers import UNIMOLMHeadModel, UNIMOTokenizer, BasicTokenizer
1626
from paddlenlp.metrics import BLEU
17-
18-
from gen_utils import print_args, set_seed, create_data_loader, select_sum
27+
from paddlenlp.transformers import (
28+
BasicTokenizer,
29+
LinearDecayWithWarmup,
30+
UNIMOLMHeadModel,
31+
UNIMOTokenizer,
32+
)
1933

2034

2135
# yapf: disable
@@ -33,7 +47,7 @@ def parse_args():
3347
parser.add_argument('--learning_rate', type=float, default=5e-5, help='The initial learning rate.')
3448
parser.add_argument('--weight_decay', type=float, default=0.01, help='The weight decay for optimizer.')
3549
parser.add_argument('--epochs', type=int, default=3, help='Total number of training epochs to perform.')
36-
parser.add_argument('--warmup_propotion', type=float, default=0.02, help='The number of warmup steps.')
50+
parser.add_argument('--warmup_proportion', type=float, default=0.02, help='The number of warmup steps.')
3751
parser.add_argument('--max_grad_norm', type=float, default=1.0, help='The max value of grad norm.')
3852
parser.add_argument('--beta1', type=float, default=0.9, help='beta1')
3953
parser.add_argument('--beta2', type=float, default=0.98, help='beta2')
@@ -112,7 +126,7 @@ def run(args):
112126
if args.do_train:
113127
num_training_steps = args.epochs * len(train_data_loader)
114128

115-
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_propotion)
129+
lr_scheduler = LinearDecayWithWarmup(args.learning_rate, num_training_steps, args.warmup_proportion)
116130
# Generate parameter names needed to perform weight decay.
117131
# All bias and LayerNorm parameters are excluded.
118132

examples/text_generation/unimo-text/scripts/lcsts_train.sh

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
115
# GPU启动,参数`--gpus`指定训练所用的GPU卡号,可以是单卡,也可以多卡
216
unset CUDA_VISIBLE_DEVICES
317

@@ -14,7 +28,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3" --log_dir ${log_dir} run_ge
1428
--epochs=6 \
1529
--batch_size=64 \
1630
--learning_rate=5e-5 \
17-
--warmup_propotion=0.02 \
31+
--warmup_proportion=0.02 \
1832
--weight_decay=0.01 \
1933
--max_seq_len=360 \
2034
--max_target_len=30 \

examples/text_generation/unimo-text/scripts/qg_train.sh

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
115
# GPU启动,参数`--gpus`指定训练所用的GPU卡号,可以是单卡,也可以多卡
216
unset CUDA_VISIBLE_DEVICES
317

@@ -14,7 +28,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3" --log_dir ${log_dir} run_ge
1428
--epochs=6 \
1529
--batch_size=8 \
1630
--learning_rate=5e-5 \
17-
--warmup_propotion=0.02 \
31+
--warmup_proportion=0.02 \
1832
--weight_decay=0.01 \
1933
--max_seq_len=360 \
2034
--max_target_len=30 \

examples/text_generation/unimo-text/scripts/table_train.sh

+15-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,17 @@
1+
# Copyright (c) 2023 PaddlePaddle Authors. All Rights Reserved.
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
115
# GPU启动,参数`--gpus`指定训练所用的GPU卡号,可以是单卡,也可以多卡
216
unset CUDA_VISIBLE_DEVICES
317

@@ -14,7 +28,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3" --log_dir ${log_dir} run_ge
1428
--epochs=6 \
1529
--batch_size=8 \
1630
--learning_rate=5e-5 \
17-
--warmup_propotion=0.02 \
31+
--warmup_proportion=0.02 \
1832
--weight_decay=0.01 \
1933
--max_seq_len=512 \
2034
--max_target_len=200 \

examples/text_summarization/unimo-text/README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -202,7 +202,7 @@ python -m paddle.distributed.launch --gpus "0,1,2,3" --log_dir ${log_dir} train.
202202
- `batch_size` 表示每次迭代**每张卡**上的样本数目。
203203
- `learning_rate` 表示基础学习率大小,将于learning rate scheduler产生的值相乘作为当前学习率。
204204
- `weight_decay` 表示AdamW优化器中使用的weight_decay的系数。
205-
- `warmup_propotion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
205+
- `warmup_proportion` 表示学习率逐渐升高到基础学习率(即上面配置的learning_rate)所需要的迭代数占总步数的比例,最早的使用可以参考[这篇论文](https://arxiv.org/pdf/1706.02677.pdf)
206206
- `max_seq_len` 模型输入序列的最大长度。
207207
- `max_target_len` 模型训练时标签的最大长度。
208208
- `min_dec_len` 模型生成序列的最小长度。

0 commit comments

Comments
 (0)