-
Notifications
You must be signed in to change notification settings - Fork 10.7k
Alowden/fine tuning techniques dpo #1900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"Today, there are pre-existing Cookbooks for: \n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: You could do the links with the cookbook titles, for example: [How to fine-tune chat models](https://cookbook.openai.com/examples/how_to_finetune_chat_models)
to make this a bit neater.
Could be worth including this cookbook to https://cookbook.openai.com/examples/leveraging_model_distillation_to_fine-tune_a_model as its recent and high-quality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be good to have a command to install dependencies:
! pip install openai nest-asyncio --quiet
") -> List[str]:\n", | ||
" \"\"\"Return *k* distinct customer-service questions related to the given prompt.\"\"\"\n", | ||
" async with sem:\n", | ||
" resp = await async_client.chat.completions.create(\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Best to use the responses API where possible as this is our newest core API. I would change all occurrences of this to responses if we can
" },\n", | ||
" {\"role\": \"user\", \"content\": prompt},\n", | ||
" ],\n", | ||
" temperature=0.3,\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like how you've set temperature and max_tokens to create contrasting datapoints in the synthetic dataset. A comment on why you do this could be nice, talking about how you're controlling the verbosity and creativity of the output.
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"To assess the model's performance prior to fine-tuning, we'll use an automated grader (LLM-as-Judge) to score each response for friendliness and empathy. The grader will assign a score from 0 to 4 for each answer, allowing us to compute a mean baseline score for the base model." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: LLM-as-a-Judge
"\n", | ||
"| **Technique** | **Good For** | **Not Good For** |\n", | ||
"| ---------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------- |\n", | ||
"| **Supervised fine-tuning (SFT)** | Emphasizing knowledge already present in the model.<br>Customizing response structure or tone.<br>Generating content in a specific format.<br>Teaching complex instructions or correcting instruction-following failures.<br>Optimizing cost/latency (saving tokens from prompt or distilling). | Adding entirely new knowledge (consider RAG instead).<br>Tasks with subjective quality. |\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really detailed intro that I found helpful
"1 - Neutral: correct, businesslike, minimal warmth.\n", | ||
"0 - Rude, negative, or unhelpful.\n", | ||
"\n", | ||
"Return ONLY valid JSON → {\"score\": <integer>}\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be best to use structured outputs here (but given this is a baseline will leave it to you if you want to keep json mode)
"source": [ | ||
"job = sync_client.fine_tuning.jobs.retrieve(ft.id)\n", | ||
"if job.status == \"succeeded\":\n", | ||
" post_scores = evaluate(job.fine_tuned_model, test_prompts, judge_model=judge_model)\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I was to make one extension to this cookbook, it would be to use the OpenAI Evals API to create an evaluation in the platform UI. But the cookbook looks great to me and this is just a suggestion.
Summary
Adds overview of fine-tuning methods + DPO guide (which was missing).
Motivation
As above. Unifies fine-tuning guides thus far (adding DPO & helping technical folk determine which fine-tuning method to use.
For new content
When contributing new content, read through our contribution guidelines, and mark the following action items as completed:
We will rate each of these areas on a scale from 1 to 4, and will only accept contributions that score 3 or higher on all areas. Refer to our contribution guidelines for more details.