Skip to content

gradient explosion:NaN or Inf found in input tensor. #130

@carpenterChina

Description

@carpenterChina

I encountered a problem of gradient explosion, which is detailed below. I checked the input and found no Inf or NaN. Please help me!
optimizer settings: {'lr': 5.46875e-05, 'weight_decay': 0.0, 'eps': 1e-08, 'betas': [0.9, 0.999]}
Use step level LR scheduler!
Set warmup steps = 86135
Set warmup steps = 0
Max WD = 0.0500000, Min WD = 0.0500000
criterion = SoftTargetCrossEntropy()
Auto resume checkpoint:
Start training for 100 epochs
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
NaN or Inf found in input tensor.
Epoch: [0] [ 0/17227] eta: 3 days, 8:24:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9916) loss_scale: 32768.0000 (32768.0000) weight_decay: 0.0500 (0.0500) grad_norm: inf (inf) time: 16.8043 data: 14.7567 max mem: 38837
Epoch: [0] [ 10/17227] eta: 14:30:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9916 (5.9915) loss_scale: 16384.0000 (22341.8182) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 3.0352 data: 2.0506 max mem: 39504
Epoch: [0] [ 20/17227] eta: 11:46:58 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (19504.7619) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.7482 data: 0.9025 max mem: 39504
Epoch: [0] [ 30/17227] eta: 10:44:42 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 16384.0000 (18233.8065) weight_decay: 0.0500 (0.0500) grad_norm: 6.5185 (inf) time: 1.8172 data: 1.0061 max mem: 39504
Epoch: [0] [ 40/17227] eta: 10:42:47 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (15784.5854) weight_decay: 0.0500 (0.0500) grad_norm: 6.5201 (inf) time: 2.0117 data: 1.2029 max mem: 39504
Epoch: [0] [ 50/17227] eta: 9:55:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (14295.8431) weight_decay: 0.0500 (0.0500) grad_norm: 6.5203 (inf) time: 1.8215 data: 1.0110 max mem: 39504
Epoch: [0] [ 60/17227] eta: 9:22:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (13295.2131) weight_decay: 0.0500 (0.0500) grad_norm: 6.5211 (inf) time: 1.4009 data: 0.5881 max mem: 39504
Epoch: [0] [ 70/17227] eta: 9:24:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12576.4507) weight_decay: 0.0500 (0.0500) grad_norm: 6.5454 (inf) time: 1.6939 data: 0.8822 max mem: 39504
Epoch: [0] [ 80/17227] eta: 9:10:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (12035.1605) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.7946 data: 0.9825 max mem: 39504
Epoch: [0] [ 90/17227] eta: 8:57:07 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11612.8352) weight_decay: 0.0500 (0.0500) grad_norm: 6.5304 (inf) time: 1.5547 data: 0.7444 max mem: 39504
Epoch: [0] [ 100/17227] eta: 8:41:19 lr: 0.000000 min_lr: 0.000000 loss: 5.9915 (5.9915) loss_scale: 8192.0000 (11274.1386) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.4273 data: 0.6158 max mem: 39504
Epoch: [0] [ 110/17227] eta: 8:49:36 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10996.4685) weight_decay: 0.0500 (0.0500) grad_norm: 6.5197 (inf) time: 1.7466 data: 0.9347 max mem: 39504
Epoch: [0] [ 120/17227] eta: 8:39:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10764.6942) weight_decay: 0.0500 (0.0500) grad_norm: 6.5323 (inf) time: 1.8099 data: 1.0032 max mem: 39504
Epoch: [0] [ 130/17227] eta: 8:39:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10568.3053) weight_decay: 0.0500 (0.0500) grad_norm: 6.5418 (inf) time: 1.6443 data: 0.8360 max mem: 39504
Epoch: [0] [ 140/17227] eta: 8:38:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9915) loss_scale: 8192.0000 (10399.7730) weight_decay: 0.0500 (0.0500) grad_norm: 6.5378 (inf) time: 1.8150 data: 1.0048 max mem: 39504
Epoch: [0] [ 150/17227] eta: 8:31:00 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10253.5629) weight_decay: 0.0500 (0.0500) grad_norm: 6.5324 (inf) time: 1.6078 data: 0.7990 max mem: 39504
Epoch: [0] [ 160/17227] eta: 8:27:28 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10125.5155) weight_decay: 0.0500 (0.0500) grad_norm: 6.5337 (inf) time: 1.5137 data: 0.7066 max mem: 39504
Epoch: [0] [ 170/17227] eta: 8:23:23 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (10012.4444) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.5844 data: 0.7751 max mem: 39504
Epoch: [0] [ 180/17227] eta: 8:13:24 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9911.8674) weight_decay: 0.0500 (0.0500) grad_norm: 6.5229 (inf) time: 1.3548 data: 0.5438 max mem: 39504
Epoch: [0] [ 190/17227] eta: 8:13:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9821.8220) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.4622 data: 0.6526 max mem: 39504
Epoch: [0] [ 200/17227] eta: 8:13:39 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9740.7363) weight_decay: 0.0500 (0.0500) grad_norm: 6.5277 (inf) time: 1.7664 data: 0.9588 max mem: 39504
Epoch: [0] [ 210/17227] eta: 8:11:11 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9667.3365) weight_decay: 0.0500 (0.0500) grad_norm: 6.5234 (inf) time: 1.6697 data: 0.8599 max mem: 39504
Epoch: [0] [ 220/17227] eta: 8:06:56 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9600.5792) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.4999 data: 0.6853 max mem: 39504
Epoch: [0] [ 230/17227] eta: 8:12:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9539.6017) weight_decay: 0.0500 (0.0500) grad_norm: 6.5281 (inf) time: 1.8146 data: 0.9983 max mem: 39504
Epoch: [0] [ 240/17227] eta: 8:10:12 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9483.6846) weight_decay: 0.0500 (0.0500) grad_norm: 6.5209 (inf) time: 1.8819 data: 1.0682 max mem: 39504
Epoch: [0] [ 250/17227] eta: 8:12:05 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9432.2231) weight_decay: 0.0500 (0.0500) grad_norm: 6.5226 (inf) time: 1.7399 data: 0.9249 max mem: 39504
Epoch: [0] [ 260/17227] eta: 8:09:03 lr: 0.000000 min_lr: 0.000000 loss: 5.9914 (5.9914) loss_scale: 8192.0000 (9384.7050) weight_decay: 0.0500 (0.0500) grad_norm: 6.5342 (inf) time: 1.7050 data: 0.8913 max mem: 39504
Epoch: [0] [ 270/17227] eta: 8:05:59 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9340.6937) weight_decay: 0.0500 (0.0500) grad_norm: 6.5350 (inf) time: 1.4748 data: 0.6655 max mem: 39504
Epoch: [0] [ 280/17227] eta: 8:06:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9299.8149) weight_decay: 0.0500 (0.0500) grad_norm: 6.5294 (inf) time: 1.6392 data: 0.8288 max mem: 39504
Epoch: [0] [ 290/17227] eta: 8:01:43 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9261.7457) weight_decay: 0.0500 (0.0500) grad_norm: 6.5184 (inf) time: 1.5291 data: 0.7187 max mem: 39504
Epoch: [0] [ 300/17227] eta: 8:03:57 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9226.2060) weight_decay: 0.0500 (0.0500) grad_norm: 6.5322 (inf) time: 1.6084 data: 0.7993 max mem: 39504
Epoch: [0] [ 310/17227] eta: 8:00:20 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9192.9518) weight_decay: 0.0500 (0.0500) grad_norm: 6.5246 (inf) time: 1.6614 data: 0.8532 max mem: 39504
Epoch: [0] [ 320/17227] eta: 8:01:15 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9161.7695) weight_decay: 0.0500 (0.0500) grad_norm: 6.5191 (inf) time: 1.5946 data: 0.7818 max mem: 39504
Epoch: [0] [ 330/17227] eta: 8:01:08 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9132.4713) weight_decay: 0.0500 (0.0500) grad_norm: 6.5284 (inf) time: 1.7839 data: 0.9708 max mem: 39504
Epoch: [0] [ 340/17227] eta: 7:59:40 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (9104.8915) weight_decay: 0.0500 (0.0500) grad_norm: 6.5283 (inf) time: 1.6464 data: 0.8362 max mem: 39504
Epoch: [0] [ 350/17227] eta: 7:59:37 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9078.8832) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.6499 data: 0.8373 max mem: 39504
Epoch: [0] [ 360/17227] eta: 7:55:51 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9054.3158) weight_decay: 0.0500 (0.0500) grad_norm: 6.5145 (inf) time: 1.4962 data: 0.6837 max mem: 39504
Epoch: [0] [ 370/17227] eta: 7:54:49 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9031.0728) weight_decay: 0.0500 (0.0500) grad_norm: 6.5290 (inf) time: 1.4254 data: 0.6160 max mem: 39504
Epoch: [0] [ 380/17227] eta: 7:53:45 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (9009.0499) weight_decay: 0.0500 (0.0500) grad_norm: 6.5321 (inf) time: 1.5879 data: 0.7780 max mem: 39504
Epoch: [0] [ 390/17227] eta: 7:55:09 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8988.1535) weight_decay: 0.0500 (0.0500) grad_norm: 6.5472 (inf) time: 1.7530 data: 0.9418 max mem: 39504
Epoch: [0] [ 400/17227] eta: 7:54:21 lr: 0.000000 min_lr: 0.000000 loss: 5.9913 (5.9914) loss_scale: 8192.0000 (8968.2993) weight_decay: 0.0500 (0.0500) grad_norm: 6.5198 (inf) time: 1.7701 data: 0.9575 max mem: 39504
Epoch: [0] [ 410/17227] eta: 7:51:48 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8949.4112) weight_decay: 0.0500 (0.0500) grad_norm: 6.5082 (inf) time: 1.4883 data: 0.6721 max mem: 39504
Epoch: [0] [ 420/17227] eta: 7:52:38 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8931.4204) weight_decay: 0.0500 (0.0500) grad_norm: 6.5235 (inf) time: 1.6048 data: 0.7914 max mem: 39504
Epoch: [0] [ 430/17227] eta: 7:52:53 lr: 0.000000 min_lr: 0.000000 loss: 5.9912 (5.9914) loss_scale: 8192.0000 (8914.2645) weight_decay: 0.0500 (0.0500) grad_norm: 6.5302 (inf) time: 1.8107 data: 0.9991 max mem: 39504
Epoch: [0] [ 440/17227] eta: 7:50:55 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8897.8866) weight_decay: 0.0500 (0.0500) grad_norm: 6.5264 (inf) time: 1.5970 data: 0.7861 max mem: 39504
Epoch: [0] [ 450/17227] eta: 7:49:04 lr: 0.000000 min_lr: 0.000000 loss: 5.9911 (5.9914) loss_scale: 8192.0000 (8882.2350) weight_decay: 0.0500 (0.0500) grad_norm: 6.5222 (inf) time: 1.4259 data: 0.6114 max mem: 39504

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions