Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Training/Output stops without error #164

Open
DannyPerspective opened this issue Jul 26, 2024 · 6 comments
Open

Training/Output stops without error #164

DannyPerspective opened this issue Jul 26, 2024 · 6 comments

Comments

@DannyPerspective
Copy link

When i do a training on the s3dis dataset, after same epochs the output just stops.

I start the training with:
CUDA_VISIBLE_DEVICES=0 nohup python examples/segmentation/main.py --cfg cfgs/s3dis/pointnext-s.yaml > outputlog 2>&1 &

Sometimes it already stops in the first epoch, sometimes it runs for five-sixes epochs but eventually it stops.

The python process for the training is then still alive (status “sleeping”) and the hardware resources are also still allocated.
Can someone help?

output.log

@Saleem-BIM
Copy link

Hello @DannyPerspective,
did you solved your issue or still figureíng out? I am facing similar problem

@DannyPerspective
Copy link
Author

Sorry, I could not figure it out. I had to restart training from the latest checkpoint every time it happened.

@Saleem-BIM
Copy link

I am getting : ModuleNotFoundError: No module named 'openpoints'. have you faced similar issue in your training stage?

@DannyPerspective
Copy link
Author

Seems like the install.sh script hasn't worked properly. Have you executed it?
But that’s is an entirely different issue than this here.

@Saleem-BIM
Copy link

no I tried to install.sh agin but the issue still persist

@DannyPerspective
Copy link
Author

If there is no folder called “openpoints” in your PointNeXt folder, it didn’t work. Try executing the commands in the install.sh one by one and look if you can find any errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants