diff --git a/units/en/unit2/2.md b/units/en/unit2/2.md index 859978d9..a7235ffc 100644 --- a/units/en/unit2/2.md +++ b/units/en/unit2/2.md @@ -111,8 +111,6 @@ DPO requires a [preference dataset](https://huggingface.co/docs/trl/en/dataset_f Although the `DPOTrainer` supports both explicit and implicit prompts, we recommend using explicit prompts. If provided with an implicit prompt dataset, the trainer will automatically extract the prompt from the `"chosen"` and `"rejected"` columns. For more information, refer to the [preference style](dataset_formats#preference) section. -Although the `DPOTrainer` supports both explicit and implicit prompts, we recommend using explicit prompts. If provided with an implicit prompt dataset, the trainer will automatically extract the prompt from the `"chosen"` and `"rejected"` columns. For more information, refer to the [preference style](https://huggingface.co/docs/trl/en/dataset_formats#preference) section. - | Parameter | Description | Recommendations | |-----------|-------------|-----------------| | **Beta (β)** | Controls the strength of preference optimization | **Range**: 0.1 to 0.5
**Lower values**: More conservative, closer to reference model
**Higher values**: Stronger preference alignment, risk of overfitting | @@ -149,4 +147,4 @@ Another common issue is distribution shift, where the model performs well on the - Evaluating alignment quality and model performance - Deploying your aligned model -After mastering DPO, explore advanced techniques in the [advanced DPO methods](3) section. \ No newline at end of file +After mastering DPO, explore advanced techniques in the [advanced DPO methods](3) section.