-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Training/Output stops without error #164
Comments
Hello @DannyPerspective, |
Sorry, I could not figure it out. I had to restart training from the latest checkpoint every time it happened. |
I am getting : ModuleNotFoundError: No module named 'openpoints'. have you faced similar issue in your training stage? |
Seems like the install.sh script hasn't worked properly. Have you executed it? |
no I tried to install.sh agin but the issue still persist |
If there is no folder called “openpoints” in your PointNeXt folder, it didn’t work. Try executing the commands in the install.sh one by one and look if you can find any errors. |
When i do a training on the s3dis dataset, after same epochs the output just stops.
I start the training with:
CUDA_VISIBLE_DEVICES=0 nohup python examples/segmentation/main.py --cfg cfgs/s3dis/pointnext-s.yaml > outputlog 2>&1 &
Sometimes it already stops in the first epoch, sometimes it runs for five-sixes epochs but eventually it stops.
The python process for the training is then still alive (status “sleeping”) and the hardware resources are also still allocated.
Can someone help?
output.log
The text was updated successfully, but these errors were encountered: