Skip to content

A refinement report #24

@HXYNODE

Description

@HXYNODE

RMN/train.py

Line 128 in 14a9eff

loss_count /= 10 if bsz == opt.train_batch_size else i % 10

Hi, Ganchao. I found the above judgement may miss some conditions during executing the project.
e.g. When the train_batch_size is set to 2 or 3, the step of the train_loader is 24390 (48779/2=24389.5) and 16260 (48779/3=16259.67) respectively. Here 48779 is the total number of samples for MSVD dataset. Note that the division operation is not completed. It means there are only 1 or 2 samples in the 24390th or 16260th step. And it doesn't meet the condition, bsz == opt.train_batch_szie. so the loss_count will be divided by 0 (i % 10). Ooops! : (
It could be refined like followings:

if bsz == opt.train_batch_size:
    loss_count /= 10
elif bsz < opt.train_batch_size and i % 10 == 0:
    loss_count /= 10
else:
    loss_count /= i % 10

The project on my server restart again now. If it still works well after executing one epoch, I will come back to report.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions