Replies: 1 comment
-
Consider using Kaggle. use GPU for your train. on CPU you have the limitation for speed. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
HI, I have been trying to do the initial training in section 9.3 but my colab environment crashes.
To make sure it was not my code I have now tried with the 08 notebook from github and one epoch is taking 8 miuntes and then it crashes.
'Your session crashed. Automatically restarting`
I can see it is using nearly all the memory is used. Colab wont allocate me a GPU (I have the pro version) so am stuck with CPU
I have tried setting the batch size for the training data to 8 instead of 32, while this reduces the memory, the time it taking is estimated to 1hr 22 minutes
1/10 [09:06<1:22:00, 546.78s/it]
But this really cut sinto my time allocate to do this module......
Any Suggestions on how to speed this up or has anyone else this type of issue?
As side note: Running it on Jupyter Lab locally
I have then setup a local docker instance of JupyterLab on a Ubuntu workstation, this also had issues with the dataloaders and allocating shared memory. to get round this I first set
shm_size: '8gb'
in the docker config for it and then there where still issues with the workers (maybe it is the type of cpu it has)I don't think all the cores are the same in the type of CPU I had so I edited the
data_setup.py
fromNUM_WORKERS = os.cpu_count()
toNUM_WORKERS = 0
. This is probably not the best solution for speed but it got me working on my own setup. The time is still not the best compared to the video but at least its only 13 minutes for all 10 epochs with a batch size of 32Here are the results I got
For anyone reference, this was the error I was seeing relating to the number of workers after I had already increased the
shm_size
: when trying to visulise the dataloadersJust in terms of an update, I just realised that on my local setup for data_setup.create_dataloaders it already had the
num_workers
as a paramter so can just do the below change in the notebook instead of editing 'data_setup.py'. So can change the dataloader part to have this set to0
Beta Was this translation helpful? Give feedback.
All reactions