Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically compute train_iters when train_epochs is specified. #1283

Merged
merged 16 commits into from
Sep 29, 2024

Conversation

AI-WAIFU
Copy link
Contributor

@AI-WAIFU AI-WAIFU commented Sep 16, 2024

Major changes:

  • Introduces a train_epochs argument
  • Reorders events in the pretrain function, changing from: model/optimizer creation -> data iterator creation to: data loader creation -> model/optimizer creation -> data iterator creation
  • Removes save_iters property from NeoX args and replaces it with a runtime is_save_iter function
  • Adds support for non integer checkpoint factors when using logarithmic checkpointing

Note: Most of these changes are a consequence of not being able to compute train_iters when creating the NeoX Args object. At a high level we pass both train_epochs and train_iters down to the dataloader, and use the one that is not none to specify the dataloader behavior, then if train_iters is unspecified we infer it from the dataloader after constructing it.

@AI-WAIFU AI-WAIFU marked this pull request as ready for review September 24, 2024 16:16
@Quentin-Anthony Quentin-Anthony self-assigned this Sep 24, 2024
@Quentin-Anthony
Copy link
Member

fixes #1268

Copy link
Member

@Quentin-Anthony Quentin-Anthony left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and working for me.

@Quentin-Anthony Quentin-Anthony merged commit a6d6af0 into main Sep 29, 2024
1 of 4 checks passed
@Quentin-Anthony Quentin-Anthony deleted the auto-itercount branch September 29, 2024 07:53
@Quentin-Anthony Quentin-Anthony restored the auto-itercount branch September 29, 2024 08:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants