Skip to content

Conversation

@Ssukriti
Copy link
Collaborator

the effective batch size is always per_Device_batch_size * gradient accumulation

with my testing : with gradient accumulation 4 , num_epochs = 20

previously without fix it took 80 epochs as max_steps was 80 which overrides num_epochs. Takes evry long to train

after fix:
finishes training in 23 epochs , which is closer to 20 specified . Train time has reduced

From this blog https://lightning.ai/blog/gradient-accumulation/
batch size of 256 but can only fit a batch size of 64 into GPU memory, we can perform gradient accumulation over four batches of size 64. (After processing all four batches, we will have the accumulated gradients equivalent to a single batch of size 256.)
also here : https://discuss.huggingface.co/t/how-do-you-calculate-max-steps/40177 , same strategy is applied

Signed-off-by: Sukriti-Sharma4 <[email protected]>
@Ssukriti Ssukriti changed the title calc_max_steps fix_calc_max_steps Dec 11, 2023
Signed-off-by: Sukriti-Sharma4 <[email protected]>
@Ssukriti
Copy link
Collaborator Author

I am going to do a quality test with this once we figure out some other issues and have a benchmark for code in main . Currently paused this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant