Skip to content

Alowden/fine tuning techniques dpo #1900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

alexl-oai
Copy link

Summary

Adds overview of fine-tuning methods + DPO guide (which was missing).

Motivation

As above. Unifies fine-tuning guides thus far (adding DPO & helping technical folk determine which fine-tuning method to use.


For new content

When contributing new content, read through our contribution guidelines, and mark the following action items as completed:

  • I have added a new entry in registry.yaml (and, optionally, in authors.yaml) so that my content renders on the cookbook website.
  • I have conducted a self-review of my content based on the contribution guidelines:
    • Relevance: This content is related to building with OpenAI technologies and is useful to others.
    • Uniqueness: I have searched for related examples in the OpenAI Cookbook, and verified that my content offers new insights or unique information compared to existing documentation.
    • Spelling and Grammar: I have checked for spelling or grammatical mistakes.
    • Clarity: I have done a final read-through and verified that my submission is well-organized and easy to understand.
    • Correctness: The information I include is correct and all of my code executes successfully.
    • Completeness: I have explained everything fully, including all necessary references and citations.

We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.

@alexl-oai alexl-oai self-assigned this Jun 13, 2025
"cell_type": "markdown",
"metadata": {},
"source": [
"Today, there are pre-existing Cookbooks for: \n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: You could do the links with the cookbook titles, for example: [How to fine-tune chat models](https://cookbook.openai.com/examples/how_to_finetune_chat_models) to make this a bit neater.

Could be worth including this cookbook to https://cookbook.openai.com/examples/leveraging_model_distillation_to_fine-tune_a_model as its recent and high-quality

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be good to have a command to install dependencies:

! pip install openai nest-asyncio --quiet

") -> List[str]:\n",
" \"\"\"Return *k* distinct customer-service questions related to the given prompt.\"\"\"\n",
" async with sem:\n",
" resp = await async_client.chat.completions.create(\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Best to use the responses API where possible as this is our newest core API. I would change all occurrences of this to responses if we can

" },\n",
" {\"role\": \"user\", \"content\": prompt},\n",
" ],\n",
" temperature=0.3,\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like how you've set temperature and max_tokens to create contrasting datapoints in the synthetic dataset. A comment on why you do this could be nice, talking about how you're controlling the verbosity and creativity of the output.

"cell_type": "markdown",
"metadata": {},
"source": [
"To assess the model's performance prior to fine-tuning, we'll use an automated grader (LLM-as-Judge) to score each response for friendliness and empathy. The grader will assign a score from 0 to 4 for each answer, allowing us to compute a mean baseline score for the base model."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: LLM-as-a-Judge

"\n",
"| **Technique** | **Good For** | **Not Good For** |\n",
"| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |\n",
"| **Supervised fine-tuning (SFT)** | Emphasizing knowledge already present in the model.<br>Customizing response structure or tone.<br>Generating content in a specific format.<br>Teaching complex instructions or correcting instruction-following failures.<br>Optimizing cost/latency (saving tokens from prompt or distilling). | Adding entirely new knowledge (consider RAG instead).<br>Tasks with subjective quality. |\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really detailed intro that I found helpful

"1 - Neutral: correct, businesslike, minimal warmth.\n",
"0 - Rude, negative, or unhelpful.\n",
"\n",
"Return ONLY valid JSON → {\"score\": <integer>}\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be best to use structured outputs here (but given this is a baseline will leave it to you if you want to keep json mode)

"source": [
"job = sync_client.fine_tuning.jobs.retrieve(ft.id)\n",
"if job.status == \"succeeded\":\n",
" post_scores = evaluate(job.fine_tuned_model, test_prompts, judge_model=judge_model)\n",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I was to make one extension to this cookbook, it would be to use the OpenAI Evals API to create an evaluation in the platform UI. But the cookbook looks great to me and this is just a suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants