-
Notifications
You must be signed in to change notification settings - Fork 1
Open
Description
Describe the bug
In Training tab, if the timeout option box is checked, the training will fail. This is an existing bug since Nov 2024 and reproducible on both Mac & Windows OS.
To Reproduce
Steps to reproduce the behavior:
- Start a new model
- Go through Curation & into Training
- Fill in all training params as normal, AND check the box to "timeout" and fill in the number of minutes to time out
- Start training - a popup modal would notify user that training has failed
Console logs:
Expected behavior
Training should run as normal and ends at the timeout specified time.
Describe your data (image format, 2D /3D etc.) LaminB1 sample dataset
Environment (please complete the following information):
- OS: Mac OS 13.6 (22G120), Windows OS built 20348.2655 (EC2 instance)
- Plugin Version: 1.0.0rc8
- PyTorch version 2.0.1 on Windows OS
- GPU? yes on Windows OS
- CUDA version [e.g. 10.0]
Additional context
We discussed to remove this feature completely, with the conditions that:
- User should be able to estimate the time base on how long each epoch might potentially take and set the appropriate number of epoch
- In case training needs to be stopped before it reach the set number of epoch or before the training is auto-stopped when the model performance is no longer improved, user should be able to cancel the training
Metadata
Metadata
Assignees
Labels
No labels
