Alowden/fine tuning techniques dpo #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

alexl-oai wants to merge 9 commits into main from alowden/fine_tuning_techniques_dpo

+609 −1

alexl-oai commented Jun 13, 2025

Summary

Adds overview of fine-tuning methods + DPO guide (which was missing).

Motivation

As above. Unifies fine-tuning guides thus far (adding DPO & helping technical folk determine which fine-tuning method to use.

For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
I have conducted a self-review of my content based on the contribution guidelines:
- Relevance: This content is related to building with OpenAI technologies and is useful to others.
- Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
- Spelling and Grammar: I have checked for spelling or grammatical mistakes.
- Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
- Correctness: The information I include is correct and all of my code executes successfully.
- Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

alexl-oai added 5 commits

June 13, 2025 18:53


          Include notebook with outputs and other updated files

edabe42


          Include notebook with outputs and other updated files

01f97c4


          Merge branch 'main' into alowden/fine_tuning_techniques_dpo

c05a25b


          Update python version

828a22a


          Update python version

1ee68df

alexl-oai self-assigned this

alexl-oai added 4 commits

June 16, 2025 09:14


          resolve to american spelling

81c028a


          Remove UK spelling notebook

60e55a4


          registry update american spelling

db0528c


          update registry date

b439709

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                 "cell_type": "markdown",

                 "metadata": {},

                 "source": [

                  "Today, there are pre-existing Cookbooks for:  \n",

Contributor

robtinn Jun 16, 2025

nit: You could do the links with the cookbook titles, for example: [How to fine-tune chat models](https://cookbook.openai.com/examples/how_to_finetune_chat_models) to make this a bit neater.

Could be worth including this cookbook to https://cookbook.openai.com/examples/leveraging_model_distillation_to_fine-tune_a_model as its recent and high-quality

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

Contributor

robtinn Jun 16, 2025

could be good to have a command to install dependencies:

! pip install openai nest-asyncio --quiet

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                  ") -> List[str]:\n",

                  "    \"\"\"Return *k* distinct customer-service questions related to the given prompt.\"\"\"\n",

                  "    async with sem:\n",

                  "        resp = await async_client.chat.completions.create(\n",

Contributor

robtinn Jun 16, 2025

Best to use the responses API where possible as this is our newest core API. I would change all occurrences of this to responses if we can

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                  "                },\n",

                  "                {\"role\": \"user\", \"content\": prompt},\n",

                  "            ],\n",

                  "            temperature=0.3,\n",

Contributor

robtinn Jun 16, 2025

I like how you've set temperature and max_tokens to create contrasting datapoints in the synthetic dataset. A comment on why you do this could be nice, talking about how you're controlling the verbosity and creativity of the output.

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                 "cell_type": "markdown",

                 "metadata": {},

                 "source": [

                  "To assess the model's performance prior to fine-tuning, we'll use an automated grader (LLM-as-Judge) to score each response for friendliness and empathy. The grader will assign a score from 0 to 4 for each answer, allowing us to compute a mean baseline score for the base model."

Contributor

robtinn Jun 16, 2025

nit: LLM-as-a-Judge

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                  "\n",

                  "| **Technique**                            | **Good For**                                                                                                                                                                                                                                                                                    | **Not Good For**                                                                                            |\n",

                  "| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |\n",

                  "| **Supervised fine-tuning (SFT)**         | Emphasizing knowledge already present in the model.<br>Customizing response structure or tone.<br>Generating content in a specific format.<br>Teaching complex instructions or correcting instruction-following failures.<br>Optimizing cost/latency (saving tokens from prompt or distilling). | Adding entirely new knowledge (consider RAG instead).<br>Tasks with subjective quality.                     |\n",

Contributor

robtinn Jun 16, 2025

really detailed intro that I found helpful

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                  "1 - Neutral: correct, businesslike, minimal warmth.\n",

                  "0 - Rude, negative, or unhelpful.\n",

                  "\n",

                  "Return ONLY valid JSON → {\"score\": <integer>}\n",

Contributor

robtinn Jun 16, 2025

would be best to use structured outputs here (but given this is a baseline will leave it to you if you want to keep json mode)

robtinn reviewed

View reviewed changes

examples/Fine_tuning_direct_preference_optimization_guide.ipynb

    
                 "source": [

                  "job = sync_client.fine_tuning.jobs.retrieve(ft.id)\n",

                  "if job.status == \"succeeded\":\n",

                  "    post_scores = evaluate(job.fine_tuned_model, test_prompts, judge_model=judge_model)\n",

Contributor

robtinn Jun 16, 2025

If I was to make one extension to this cookbook, it would be to use the OpenAI Evals API to create an evaluation in the platform UI. But the cookbook looks great to me and this is just a suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet