-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected out of memory error #8
Comments
@adkinsty According to what you provide it is unobvious, why OOM occurs. I assume you are trying to train multiclass task, since loss is 'crossentropy'. When you fit GBDT your features matrix will be quantized and stored in uint8 array, so memory usage will be reduced compared with initial CPU arrays. And your train and valid matrices together will allocate less than 1 GB GPU memory. Here are some thoughts about what may happend:
In general, this is a rule to estimate how much GPU memory you need to train: you will need approximately n_samples * n_features bytes + n_samples * n_outputs * 16 bytes + eps. Eps depends on the set up, but probably it is about 1-2GB. So, if according this rule you should fit - we will investigate what goes wrong. Otherwise, the only option for you to train is to perform a downsampling. Any kind of distributed training such as multi GPU unfortunately is unavailable now. We have a plan to add this feature of course, but do not expect it will be releazed very soon. |
Thank you for your detailed and helpful response. This is a multi-class classification problem and the number of classes is extremely large (~5k). The target array is 1D with values ranging from 0 to n_classes-1. Number of features is about 315 and number of samples (total) is around 2e6. According to your formula, I think I would need >100gb of memory (assuming n_classes == n_outputs). So, it would seem that I need to downsample the training set if I use SketchBoost. |
@adkinsty There are some options to train in your case by splitting this big model into smaller ones or by truncating the number of classes. Ones I trained multitask regression with 20k outputs. I splited big model into 10 2k output models and that worked not bad. But the better trick in that case was to reduce input dimensions before training via TruncatedSVD to about 500, than train SketchBoost and then making the inverse transform of the prediction. But I believe this trick is very task specific, while splitting the model is more universal trick. In your case it is more complex because of multiclass - outputs are not independent. So what I think you should try:
If you are going to check some of the approaches I will appreciate to have a feedback - what does work in your case and how it is compared with another baselines and pure downsampling |
I am getting an out of memory error when trying to train a sketchboost model on a somewhat large dataset.
The shape of the training set is (1348244, 320) and the shape of the test set is (674022, 320).
The total data size is <3gb and I have 24gb of GPU memory, yet when training starts, cupy tries to allocate >20gb of memory, and it runs out of memory.
I am using cupy-cuda11x 11.6.0 and py-boost 0.4.1.
I am using the following model configuration:
Here's the error I'm getting:
Do you know why this might be happening or what I could do to fix this? For example, could I limit the amount of memory that cupy tries to allocate? Or could I use multiple GPUs for training?
The text was updated successfully, but these errors were encountered: