Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

遇到了训练着突然自己中断不跑的情况 #17

Open
mary-0830 opened this issue Jul 30, 2020 · 7 comments
Open

遇到了训练着突然自己中断不跑的情况 #17

mary-0830 opened this issue Jul 30, 2020 · 7 comments

Comments

@mary-0830
Copy link

今天跑通了之后,没有使用预训练权重,想要跑30epoch。

跑着突然就在4epoch时中断了,打印的日志也没有发现有出错,请问这是怎么回事呢?

@WongKinYiu
Copy link
Owner

中斷前顯示了什麼

@mary-0830
Copy link
Author

出现这样的警告:(训练两次,中断两次)
WARNING: non-finite loss, ending training tensor([10.75441, nan, nan, nan], device='cuda:0')
WARNING: non-finite loss, ending training tensor([nan, nan, nan, nan], device='cuda:0')

@WongKinYiu
Copy link
Owner

估計是anchor和你的object size差異太大造成loss爆炸了

@mary-0830
Copy link
Author

那应该怎么调整呢?object size是600*600的。

@WongKinYiu
Copy link
Owner

search cal_anchors in https://github.com/AlexeyAB/darknet

@mary-0830
Copy link
Author

想要问几个问题:

  1. 如果我改变了anchor的值,那我是不是不能直接用您给的预训练权重呀?
  2. 为什么有些预训练模型是从260或者245epoch开始的呢?不是从0开始的吗?

@WongKinYiu
Copy link
Owner

  1. no, you can.

  2. it seems there are some bugs for using pre-trained model, try convert it from .pt to .weights first.
    also you have to add code here for suitable cutoff value https://github.com/WongKinYiu/PyTorch_YOLOv4/blob/master/models.py#L351-L354

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants