Skip to content

Rearrange DPOTrainer #3501

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open

Rearrange DPOTrainer #3501

wants to merge 6 commits into from

Conversation

DaizeDong
Copy link

@DaizeDong DaizeDong commented May 27, 2025

What does this PR do?

Rearranged the code structure in DPOTrainer. Details include:

  • Separated tedious model preparation procedures into three functions: _create_model_from_path, _prepare_peft_model, and _prepare_gradient_checkpointing, while keeping the original logic unchanged. This follows the implementation in SFTTrainer, which also splits model preparation into various functional parts for clarity.
  • Grouped the model preparation code together. This includes moving disable_dropout_in_model(model) and model.warnings_issued["estimate_tokens"] ahead so that all operations on the model are grouped together. This is safe and would not influence the execution.
  • Updated initialization for model and processing_class. Now we can pass None as the value for model and processing_class in DPOTrainer. This feature is adapted from SFTTrainer.
  • Padding-free Checks. Synced some checks from SFTTrainer.
  • Updated docs. Now most docs align with SFTTrainer.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a GitHub issue? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

@qgallouedec
Copy link
Member

Thanks! I like it! Please let us know when it's ready for review!

@DaizeDong
Copy link
Author

Thanks for your attention! It's ready for review. cc @qgallouedec

Copy link
Member

@qgallouedec qgallouedec left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, let's see if the CI passes! Ideally we want to have something very close to SFT, and this PR is a good move in this direction

Comment on lines 220 to 221
if model is None:
raise ValueError("No model provided. Please provide a model to train.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can remove this, and replace

- model: Optional[Union[PreTrainedModel, nn.Module, str]] = None,
+ model:Union[PreTrainedModel, nn.Module, str],

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Comment on lines 285 to 286
if processing_class is None:
raise ValueError("processing_class must be specified to tokenize a DPO dataset.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be nice to have instead

if processing_class is None:
    processing_class = AutoTokenizer.from_pretrained(model_id)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@DaizeDong
Copy link
Author

Thank you for your suggestions! I've updated the mentioned parts and more to sync the DPOTrainer with SFTTrainer!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants