Skip to content

WIP: Eval checkpointing#8

Open
daanelson wants to merge 3 commits into
mainfrom
eval_checkpointing
Open

WIP: Eval checkpointing#8
daanelson wants to merge 3 commits into
mainfrom
eval_checkpointing

Conversation

@daanelson
Copy link
Copy Markdown
Contributor

This adds a few features to fine-tuning:

  • Saves checkpoints every n_epochs/steps
  • Critically, patches the HF trainer s.t. deepspeed only saves the peft weights, instead of the entire 13/26 GB model every checkpoint.
  • Returns best performing model according to eval loss
  • Automatically splits train dataset into train & eval if no eval dataset is provided using train_test_split input

WIP b/c I still need to ensure the non-happy path (train_test_split=1) works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant