Rearrange DPOTrainer #3501

DaizeDong · 2025-05-27T10:33:24Z

What does this PR do?

Rearranged the code structure in DPOTrainer. Details include:

Separated tedious model preparation procedures into three functions: _create_model_from_path, _prepare_peft_model, and _prepare_gradient_checkpointing, while keeping the original logic unchanged. This follows the implementation in SFTTrainer, which also splits model preparation into various functional parts for clarity.
Grouped the model preparation code together. This includes moving disable_dropout_in_model(model) and model.warnings_issued["estimate_tokens"] ahead so that all operations on the model are grouped together. This is safe and would not influence the execution.
Updated initialization for model and processing_class. Now we can pass None as the value for model and processing_class in DPOTrainer. This feature is adapted from SFTTrainer.
Padding-free Checks. Synced some checks from SFTTrainer.
Updated docs. Now most docs align with SFTTrainer.

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a GitHub issue? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes?
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

qgallouedec · 2025-05-27T21:34:41Z

Thanks! I like it! Please let us know when it's ready for review!

DaizeDong · 2025-05-28T03:22:54Z

Thanks for your attention! It's ready for review. cc @qgallouedec

qgallouedec

Thanks, let's see if the CI passes! Ideally we want to have something very close to SFT, and this PR is a good move in this direction

qgallouedec · 2025-05-29T16:03:19Z

trl/trainer/dpo_trainer.py

        if model is None:
            raise ValueError("No model provided. Please provide a model to train.")


You can remove this, and replace

- model: Optional[Union[PreTrainedModel, nn.Module, str]] = None, + model:Union[PreTrainedModel, nn.Module, str],

qgallouedec · 2025-05-29T16:05:44Z

trl/trainer/dpo_trainer.py

        if processing_class is None:
            raise ValueError("processing_class must be specified to tokenize a DPO dataset.")


This could be nice to have instead

if processing_class is None: processing_class = AutoTokenizer.from_pretrained(model_id)

DaizeDong · 2025-05-29T17:46:35Z

Thank you for your suggestions! I've updated the mentioned parts and more to sync the DPOTrainer with SFTTrainer!

Rearrange DPOTrainer

6d80a26

DaizeDong force-pushed the main branch from 1c1c091 to 6d80a26 Compare May 27, 2025 10:46

Merge branch 'main' into main

4eb0249

Fix PEFT Preparation

73e7046

DaizeDong force-pushed the main branch from bff9684 to 73e7046 Compare May 28, 2025 10:05

Merge branch 'main' into main

367022a

qgallouedec reviewed May 29, 2025

View reviewed changes

DaizeDong added 2 commits May 30, 2025 01:38

Update Docs & Rearrange Tokenizer Init

0ee2edd

Padding-free Checks

0fd2356

DaizeDong force-pushed the main branch from 61c7fff to 0fd2356 Compare May 29, 2025 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Rearrange DPOTrainer #3501

Rearrange DPOTrainer #3501

Uh oh!

DaizeDong commented May 27, 2025 •

edited

Loading

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

DaizeDong commented May 28, 2025

Uh oh!

qgallouedec left a comment

Uh oh!

qgallouedec May 29, 2025

Uh oh!

DaizeDong May 29, 2025

Uh oh!

qgallouedec May 29, 2025

Uh oh!

DaizeDong May 29, 2025

Uh oh!

DaizeDong commented May 29, 2025

Uh oh!

Uh oh!

		if model is None:
		raise ValueError("No model provided. Please provide a model to train.")

		if processing_class is None:
		raise ValueError("processing_class must be specified to tokenize a DPO dataset.")

Rearrange DPOTrainer #3501

Are you sure you want to change the base?

Rearrange DPOTrainer #3501

Uh oh!

Conversation

DaizeDong commented May 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Before submitting

Who can review?

Uh oh!

qgallouedec commented May 27, 2025

Uh oh!

DaizeDong commented May 28, 2025

Uh oh!

qgallouedec left a comment

Choose a reason for hiding this comment

Uh oh!

qgallouedec May 29, 2025

Choose a reason for hiding this comment

Uh oh!

DaizeDong May 29, 2025

Choose a reason for hiding this comment

Uh oh!

qgallouedec May 29, 2025

Choose a reason for hiding this comment

Uh oh!

DaizeDong May 29, 2025

Choose a reason for hiding this comment

Uh oh!

DaizeDong commented May 29, 2025

Uh oh!

Uh oh!

DaizeDong commented May 27, 2025 •

edited

Loading