diff --git a/1_instruction_tuning/README.md b/1_instruction_tuning/README.md index a7fae79c..43f9639e 100644 --- a/1_instruction_tuning/README.md +++ b/1_instruction_tuning/README.md @@ -1,30 +1,30 @@ -# Instruction Tuning +# 指令微调 -This module will guide you through instruction tuning language models. Instruction tuning involves adapting pre-trained models to specific tasks by further training them on task-specific datasets. This process helps models improve their performance on targeted tasks. +本章内容主要聚焦大语言模型指令微调部分。指令微调会把预训练模型在特定领域的数据集上进一步训练,来适配特定的任务。这一过程能有效地提升模型在目标任务上的性能。 -In this module, we will explore two topics: 1) Chat Templates and 2) Supervised Fine-Tuning. +具体而言,本章将会重点探索两个主题:聊天模板和有监督微调。 -## 1️⃣ Chat Templates +## 1️⃣ 聊天模板 -Chat templates structure interactions between users and AI models, ensuring consistent and contextually appropriate responses. They include components like system prompts and role-based messages. For more detailed information, refer to the [Chat Templates](./chat_templates.md) section. +聊天模板的主要作用是把用户和 AI 模型之间的交互信息结构化,确保模型能够稳定输出且根据上下文作出回答。一个聊天模板包含系统提示词和人机两个角色发送的消息。本章的[聊天模板教程](./chat_templates.md)将会详细讲述这一内容。 -## 2️⃣ Supervised Fine-Tuning +## 2️⃣ 有监督微调 -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks. It involves training the model on a task-specific dataset with labeled examples. For a detailed guide on SFT, including key steps and best practices, see the [Supervised Fine-Tuning](./supervised_fine_tuning.md) page. +有监督微调(SFT)是你将预训练模型往特定任务迁移时的一个重要过程。SFT 通过在特定领域的有标注数据集上进一步训练,来提升模型在这里应用领域的性能。本章[有监督微调教程](./supervised_fine_tuning.md)将会详细讲解相关内容,包括其中的重要步骤和最佳实践。 -## Exercise Notebooks +## 实践练习 -| Title | Description | Exercise | Link | Colab | +| 标题 | 简介 | 习题 | 链接 | Colab | |-------|-------------|----------|------|-------| -| Chat Templates | Learn how to use chat templates with SmolLM2 and process datasets into chatml format | 🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format
🐕 Convert the `openai/gsm8k` dataset into chatml format | [Notebook](./notebooks/chat_templates_example.ipynb) | Open In Colab | -| Supervised Fine-Tuning | Learn how to fine-tune SmolLM2 using the SFTTrainer | 🐢 Use the `HuggingFaceTB/smoltalk` dataset
🐕 Try out the `bigcode/the-stack-smol` dataset
🦁 Select a dataset for a real world use case | [Notebook](./notebooks/sft_finetuning_example.ipynb) | Open In Colab | - -## References - -- [Transformers documentation on chat templates](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Script for Supervised Fine-Tuning in TRL](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) -- [`SFTTrainer` in TRL](https://huggingface.co/docs/trl/main/en/sft_trainer) -- [Direct Preference Optimization Paper](https://arxiv.org/abs/2305.18290) -- [Supervised Fine-Tuning with TRL](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) -- [How to fine-tune Google Gemma with ChatML and Hugging Face TRL](https://www.philschmid.de/fine-tune-google-gemma) -- [Fine-tuning LLM to Generate Persian Product Catalogs in JSON Format](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) +| 聊天模板 | 学习使用 SmolLM2 的聊天模板,并将数据集转换为 ChatML 聊天模板的格式 | 🐢 将 `HuggingFaceTB/smoltalk` 数据集转换为 ChatnML 格式
🐕 将 `openai/gsm8k` 转换为 ChatML 格式 | [Notebook](./notebooks/chat_templates_example_cn.ipynb) | Open In Colab | +| 有监督微调 | 学习用 SFTTrainer 去微调 SmolLM2 模型 | 🐢 使用 `HuggingFaceTB/smoltalk` 数据集训练模型
🐕 使用 `bigcode/the-stack-smol` 数据集训练模型
🦁 针对一个实际场景选取数据集去训练 | [Notebook](./notebooks/sft_finetuning_example_cn.ipynb) | Open In Colab | + +## 参考资料 + +- [transformers 文档中关于聊天模板的介绍](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [使用 TRL 进行有监督微调的示例脚本](https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py) +- [TRL 官方文档关于 `SFTTrainer` 的介绍](https://huggingface.co/docs/trl/main/en/sft_trainer) +- [DPO 算法论文](https://arxiv.org/abs/2305.18290) +- [TRL 官方文档中关于有监督微调的教程](https://huggingface.co/docs/trl/main/en/tutorials/supervised_finetuning) +- [博客:用 ChatML 和 TRL 微调 Google Gemma 模型](https://www.philschmid.de/fine-tune-google-gemma) +- [教程:微调大语言模型使其输出 JSON 格式的内容](https://huggingface.co/learn/cookbook/en/fine_tuning_llm_to_generate_persian_product_catalogs_in_json_format) diff --git a/1_instruction_tuning/chat_templates.md b/1_instruction_tuning/chat_templates.md index 61ff65e6..ed6fe292 100644 --- a/1_instruction_tuning/chat_templates.md +++ b/1_instruction_tuning/chat_templates.md @@ -1,18 +1,18 @@ -# Chat Templates +# 聊天模板 -Chat templates are essential for structuring interactions between language models and users. They provide a consistent format for conversations, ensuring that models understand the context and role of each message while maintaining appropriate response patterns. +如果想将模型与用户的交互信息结构化,那么一个聊天模板(chat template)就是必需的。它为对话提供了一个固定格式,让模型能够知道上下文信息以及每条消息是由谁发出的,只有这样模型才能生成恰当的回答。 -## Base Models vs Instruct Models +## 基础模型 vs 指令模型 -A base model is trained on raw text data to predict the next token, while an instruct model is fine-tuned specifically to follow instructions and engage in conversations. For example, `SmolLM2-135M` is a base model, while `SmolLM2-135M-Instruct` is its instruction-tuned variant. +基础模型(base model)指的是在未经整理的文本数据上训练、用于预测下一个 token 的模型,而指令模型(instruct model)则是通过微调来跟随指令、参与对话的模型。举例来说,`SmolLM2-135M` 就是基础模型,而 `SmolLM2-135M-Instruct` 则是前者经指令调优得到的指令模型。 -To make a base model behave like an instruct model, we need to format our prompts in a consistent way that the model can understand. This is where chat templates come in. ChatML is one such template format that structures conversations with clear role indicators (system, user, assistant). +为了让基础模型成为指令模型,我们需要对我们的输入提示词进行规范化,用一种固定的格式输入给模型,以便于模型理解。这就用到**聊天模板**了。举例来说,ChatML 就是一个这样的模板,它将对话过程完全结构化,清晰地指明了每段信息是由哪个角色(系统、用户、助手)说出的。 -It's important to note that a base model could be fine-tuned on different chat templates, so when we're using an instruct model we need to make sure we're using the correct chat template. +需要注意,一个基础模型可以往不同的聊天模板上微调。所以当我们使用训练好的指令模型时,我们也需要注意不要用错聊天模板。 -## Understanding Chat Templates +## 聊天模板简介 -At their core, chat templates define how conversations should be formatted when communicating with a language model. They include system-level instructions, user messages, and assistant responses in a structured format that the model can understand. This structure helps maintain consistency across interactions and ensures the model responds appropriately to different types of inputs. Below is an example of a chat template: +聊天模板定义了当用户和语言模型对话时,对话信息应该遵循什么样的格式。这其中包含来自三个角色的信息:系统级的指令、用户发出的信息、AI 助手的回答。这使得每次人机交互的信息格式都是一致的,确保模型针对不同问题都能恰当回答。下面就是一个聊天模板示例: ```sh <|im_start|>user @@ -24,7 +24,7 @@ Can I ask a question?<|im_end|> <|im_start|>assistant ``` -The `transformers` library will take care of chat templates for you in relation to the model's tokenizer. Read more about how transformers builds chat templates [here](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates). All we have to do is structure our messages in the correct way and the tokenizer will take care of the rest. Here's a basic example of a conversation: +实际上,如果你使用 `transformers` 库的 tokenizer,它将会为我们将对话信息转化为聊天模板形式。你可以在[这里](https://huggingface.co/docs/transformers/en/chat_templating#how-do-i-use-chat-templates)查看相关文档。我们仅需将对话信息结构化,后面的事情交给 tokenizer 即可。比如,你可以把聊天信息写成这样: ```python messages = [ @@ -34,11 +34,11 @@ messages = [ ] ``` -Let's break down the above example, and see how it maps to the chat template format. +接下来,我们将分解聊天信息的组成:系统信息和对话部分。 -## System Messages +### 系统消息 -System messages set the foundation for how the model should behave. They act as persistent instructions that influence all subsequent interactions. For example: +系统消息从基本层面定义了模型应有的行为。它会影响接下来所有交互。看下列示例就能明白: ```python system_message = { @@ -47,9 +47,9 @@ system_message = { } ``` -## Conversations +### 对话部分 -Chat templates maintain context through conversation history, storing previous exchanges between users and the assistant. This allows for more coherent multi-turn conversations: +聊天模板也需要保留对话历史记录,将之前发生的人机对话保存下来,作为后续对话的参考。只有这样,我们才能实现多轮交互式对话。 ```python conversation = [ @@ -59,9 +59,9 @@ conversation = [ ] ``` -## Implementation with Transformers +## 使用 Transformers 构建聊天模板 -The transformers library provides built-in support for chat templates. Here's how to use them: +使用 `transformers` 构建聊天模板的示例如下: ```python from transformers import AutoTokenizer @@ -81,8 +81,14 @@ formatted_chat = tokenizer.apply_chat_template( ) ``` -## Custom Formatting -You can customize how different message types are formatted. For example, adding special tokens or formatting for different roles: +上述代码运行完后,`formatted_chat` 应该是这样: +``` +'<|im_start|>system\nYou are a helpful coding assistant.<|im_end|>\n<|im_start|>user\nWrite a Python function to sort a list<|im_end|>\n<|im_start|>assistant\n' +``` + +### 自定义聊天模板格式 + +你也可以自定义聊天模板格式,比如为不同角色的信息添加特殊的 token 来作为标识: ```python template = """ @@ -92,9 +98,9 @@ template = """ """.lstrip() ``` -## Multi-Turn Support +### 对多轮对话的支持 -Templates can handle complex multi-turn conversations while maintaining context: +聊天模板可以处理复杂多轮对话,同时保留上下文信息: ```python messages = [ @@ -105,10 +111,10 @@ messages = [ ] ``` -⏭️ [Next: Supervised Fine-Tuning](./supervised_fine_tuning.md) +⏭️ [下一节课程:有监督微调](./supervised_fine_tuning.md) -## Resources +## 其它学习资源 -- [Hugging Face Chat Templating Guide](https://huggingface.co/docs/transformers/main/en/chat_templating) -- [Transformers Documentation](https://huggingface.co/docs/transformers) -- [Chat Templates Examples Repository](https://github.com/chujiezheng/chat_templates) +- [Hugging Face 聊天模板使用指南](https://huggingface.co/docs/transformers/main/en/chat_templating) +- [Transformers 官方文档](https://huggingface.co/docs/transformers) +- [包含各种聊天模板的代码仓库](https://github.com/chujiezheng/chat_templates) diff --git a/1_instruction_tuning/notebooks/chat_templates_example.ipynb b/1_instruction_tuning/notebooks/chat_templates_example.ipynb index 93772206..a6d98a62 100644 --- a/1_instruction_tuning/notebooks/chat_templates_example.ipynb +++ b/1_instruction_tuning/notebooks/chat_templates_example.ipynb @@ -6,9 +6,9 @@ "id": "vZAvFVIAtFlq" }, "source": [ - "# Exploring Chat Templates with SmolLM2\n", + "# 借助 SmolLM2 探索聊天模板\n", "\n", - "This notebook demonstrates how to use chat templates with the `SmolLM2` model. Chat templates help structure interactions between users and AI models, ensuring consistent and contextually appropriate responses." + "这个教程演示如何在 `SmolLM2` 下使用聊天模板。借助聊天模板,人机之间的交互信息可以被结构化,这样我们才能确保模型的回答和上下文信息保持一致性。" ] }, { @@ -54,9 +54,9 @@ "id": "XTVOqbuetFlr" }, "source": [ - "## SmolLM2 Chat Template\n", + "## SmolLM2 的聊天模板\n", "\n", - "Let's explore how to use a chat template with the `SmolLM2` model. We'll define a simple conversation and apply the chat template." + "我们先看看怎样使用 `SmolLM2` 的聊天模板。这里我们先定义一个简单的对话,然后用聊天模板转换它。" ] }, { @@ -201,9 +201,9 @@ "id": "Ve4dgtjstFls" }, "source": [ - "# Apply chat template without tokenization\n", + "# 先不进行 tokenize,应用一下聊天模板\n", "\n", - "The tokenizer represents the conversation as a string with special tokens to describe the role of the user and the assistant.\n" + "先把 `tokenize` 这个入参设为 `False`。可以看到,下面例子增加了额外的角色信息。" ] }, { @@ -241,9 +241,9 @@ "id": "sfvdglOqtFls" }, "source": [ - "# Decode the conversation\n", + "# 对已经 tokenize 过的对话进行解码\n", "\n", - "Note that the conversation is represented as above but with a further assistant message.\n" + "如果已经把 `tokenize` 设为 `True`,可以通过解码的方式查看对话内容。可以看到,这里多了一个 assistant 信息。" ] }, { @@ -284,10 +284,9 @@ "id": "UcZQpspEtFlt" }, "source": [ - "# Tokenize the conversation\n", + "# tokenize 过后的对话是什么形式\n", "\n", - "Of course, the tokenizer also tokenizes the conversation and special token as ids that relate to the model's vocabulary.\n", - "\n" + "对对话应用聊天模板并进行 tokenize 之后,我们发现,整个对话,包括特殊 token,都被转化为了一系列整数。\n" ] }, { @@ -322,14 +321,21 @@ }, "source": [ "
\n", - "

Exercise: Process a dataset for SFT

\n", - "

Take a dataset from the Hugging Face hub and process it for SFT.

\n", - "

Difficulty Levels

\n", - "

🐢 Convert the `HuggingFaceTB/smoltalk` dataset into chatml format.

\n", - "

🐕 Convert the `openai/gsm8k` dataset into chatml format.

\n", + "

练习:为 SFT 准备数据集

\n", + "

从 HuggingFace 上找一个数据集,然后为后续 SFT 转化它

\n", + "

难度等级

\n", + "

🐢 将 `HuggingFaceTB/smoltalk` 数据集转换为 ChatML 形式

\n", + "

🐕 将 `openai/gsm8k` 数据集转换为 ChatML 形式

\n", "
" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "code", "execution_count": null, @@ -531,7 +537,10 @@ "execution_count": null, "metadata": { "collapsed": true, - "id": "bWUSv7NMtFlu" + "id": "bWUSv7NMtFlu", + "jupyter": { + "outputs_hidden": true + } }, "outputs": [], "source": [ @@ -557,11 +566,11 @@ "id": "qlXCuRKotFlu" }, "source": [ - "## Conclusion\n", + "## 总结\n", "\n", - "This notebook demonstrated how to apply chat templates to different models, `SmolLM2`. By structuring interactions with chat templates, we can ensure that AI models provide consistent and contextually relevant responses.\n", + "本文讲解了在 `SmolLM2` 模型中如何使用聊天模板。通过聊天模板来结构化人机交互信息,我们可以确保 AI 模型可以给出和上下文保持一致性的回答。\n", "\n", - "In the exercise you tried out converting a dataset into chatml format. Luckily, TRL will do this for you, but it's useful to understand what's going on under the hood." + "在练习中,你将尝试将数据集转为 ChatML 格式。但实际情况中,TRL 会为你做这些事情。不过,充分理解背后的处理逻辑还是很有必要的。" ] } ], @@ -570,7 +579,7 @@ "provenance": [] }, "kernelspec": { - "display_name": "py310", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -584,7 +593,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.15" + "version": "3.9.6" }, "widgets": { "application/vnd.jupyter.widget-state+json": { @@ -5722,5 +5731,5 @@ } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 4 } diff --git a/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb b/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb index d18479a9..7d259da3 100644 --- a/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb +++ b/1_instruction_tuning/notebooks/sft_finetuning_example.ipynb @@ -4,17 +4,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Supervised Fine-Tuning with SFTTrainer\n", + "# 用 SFTTrainer 实现有监督微调\n", "\n", - "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n", + "本教程将展示如何使用 `trl` 代码库中的 `SFTTrainer` 来微调 `HuggingFaceTB/SmolLM2-135M`。文中代码模块将会展示具体步骤。你可以根据难度选取在不同数据集上微调。\n", "\n", "
\n", - "

Exercise: Fine-Tuning SmolLM2 with SFTTrainer

\n", - "

Take a dataset from the Hugging Face hub and finetune a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the `HuggingFaceTB/smoltalk` dataset

\n", - "

🐕 Try out the `bigcode/the-stack-smol` dataset and finetune a code generation model on a specific subset `data/python`.

\n", - "

🦁 Select a dataset that relates to a real world use case your interested in

\n", + "

练习:用 SFTTrainer 微调 SmolLM2

\n", + "

从 HuggingFace 上选取一个数据集然后微调模型

\n", + "

难度等级

\n", + "

🐢 使用 `HuggingFaceTB/smoltalk` 数据集

\n", + "

🐕 尝试 `bigcode/the-stack-smol` 数据集,在 `data/python` 这个子集上微调一个代码生成模型

\n", + "

🦁 选取一个你感兴趣的领域,使用真实世界数据集,为你的特定应用微调模型

\n", "
" ] }, @@ -72,9 +72,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Generate with the base model\n", + "# 用基准模型生成\n", "\n", - "Here we will try out the base model which does not have a chat template. " + "我们首先看看不使用聊天模板的基础模型是什么效果。" ] }, { @@ -101,11 +101,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Dataset Preparation\n", + "## 准备数据集\n", "\n", - "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n", + "接下来我们载入一个简单数据集,并转换格式。这个数据集需要结构化为“输入-输出”这样成对的形式,其中输入是一个提示语,输出就是我们期待的模型回答。\n", "\n", - "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,." + "**TRL 会自动根据聊天模板转换输入信息。**这些信息需要写成一个字典(dict)的列表,其中 key 是 `role` 和 `content`,分别代表哪个角色以及对应的话语。" ] }, { @@ -117,7 +117,7 @@ "# Load a sample dataset\n", "from datasets import load_dataset\n", "\n", - "# TODO: define your dataset and config using the path and name parameters\n", + "# TODO: 稍后你可以使用你自己的数据集,替换以下的路径、名字等参数配置\n", "ds = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")" ] }, @@ -127,16 +127,16 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: 🦁 If your dataset is not in a format that TRL can convert to the chat template, you will need to process it. Refer to the [module](../chat_templates.md)" + "# TODO: 🦁 如果你的数据集不是 TRL 能转换成聊天模板的格式,你还需要额外处理一下。参考[聊天模板](../chat_templates_cn.md)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Configuring the SFTTrainer\n", + "## 配置 SFTTrainer\n", "\n", - "The `SFTTrainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources." + "我们还需要配置 `SFTTrainer` 的各项参数,来控制训练过程。这包括训练步数、batch size、学习率,以及评估策略。你需要根据你自己的情况和计算资源来调节。" ] }, { @@ -170,16 +170,16 @@ " eval_dataset=ds[\"test\"],\n", ")\n", "\n", - "# TODO: 🦁 🐕 align the SFTTrainer params with your chosen dataset. For example, if you are using the `bigcode/the-stack-smol` dataset, you will need to choose the `content` column`" + "# TODO: 🦁 🐕 根据你自己的数据集调节 SFTTrainer 的参数。假如你用的是 `bigcode/the-stack-smol` 数据集,那你就需要选择 `content` 这一列。" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Training the Model\n", + "## 训练模型\n", "\n", - "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss." + " 当上述配置完成后,我们就可以训练了。这个过程将遍历数据集、计算损失函数、更新模型参数等。" ] }, { @@ -209,8 +209,8 @@ "metadata": {}, "source": [ "
\n", - "

Bonus Exercise: Generate with fine-tuned model

\n", - "

🐕 Use the fine-tuned to model generate a response, just like with the base example..

\n", + "

额外练习:用微调过的模型来生成

\n", + "

🐕 使用微调过的模型来生成回答,就像之前用基准模型一样。对比一下效果

\n", "
" ] }, @@ -232,26 +232,26 @@ "# Generate response\n", "inputs = tokenizer(formatted_prompt, return_tensors=\"pt\").to(device)\n", "\n", - "# TODO: use the fine-tuned to model generate a response, just like with the base example." + "# TODO: 用微调过的模型生成相同的回答,看看有没有变化改进" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## 💐 You're done!\n", + "## 💐 完成了!\n", "\n", - "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", + "本文提供了使用 `SFTTrainer` 微调 `HuggingFaceTB/SmolLM2-135M` 的指南。依照这些步骤,你可以将模型适配到各种特殊任务场景上。如果你觉得这个课程还不错,你还可以尝试这些:\n", "\n", - "- Try this notebook on a harder difficulty\n", - "- Review a colleagues PR\n", - "- Improve the course material via an Issue or PR." + "- 使用这个 notebook 完成难点分级更高的任务\n", + "- 在 GitHub 上评审别人的 pull request\n", + "- 通过提出 Issue 或 PR 进一步改进我们的课程资料" ] } ], "metadata": { "kernelspec": { - "display_name": "py310", + "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, @@ -265,9 +265,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.15" + "version": "3.9.6" } }, "nbformat": 4, - "nbformat_minor": 2 + "nbformat_minor": 4 } diff --git a/1_instruction_tuning/supervised_fine_tuning.md b/1_instruction_tuning/supervised_fine_tuning.md index 8c6c3df4..e7250c38 100644 --- a/1_instruction_tuning/supervised_fine_tuning.md +++ b/1_instruction_tuning/supervised_fine_tuning.md @@ -1,41 +1,41 @@ -# Supervised Fine-Tuning +# 有监督微调 -Supervised Fine-Tuning (SFT) is a critical process for adapting pre-trained language models to specific tasks or domains. While pre-trained models have impressive general capabilities, they often need to be customized to excel at particular use cases. SFT bridges this gap by further training the model on carefully curated datasets with human-validated examples. +有监督微调(Supervised Fine-Tuning),简称 SFT,是将预训练模型往特定领域或任务迁移的一个重要过程。虽然预训练模型总体性能也很不错,但如果需要应对特定场景,就必须针对场景进行定制化。SFT 通过在人工筛选的高质量数据上进一步训练,将预训练模型往特定任务上迁移。 -## Understanding Supervised Fine-Tuning +## SFT 原理简介 -At its core, supervised fine-tuning is about teaching a pre-trained model to perform specific tasks through examples of labeled tokens. The process involves showing the model many examples of the desired input-output behavior, allowing it to learn the patterns specific to your use case. +SFT 核心思想是让预训练模型学习标注过的、特定领域的数据。这个过程会向模型提供很多我们想要的输入-输出,让模型去学习我们想要的特定回答模式。 -SFT is effective because it uses the foundational knowledge acquired during pre-training while adapting the model's behavior to match your specific needs. +由于利用了预训练模型学到的基础知识,通过 SFT 将模型进一步适配到特定领域的训练还是非常高效的。 -## When to Use Supervised Fine-Tuning +## 什么时候使用 SFT -The decision to use SFT often comes down to the gap between your model's current capabilities and your specific requirements. SFT becomes particularly valuable when you need precise control over the model's outputs or when working in specialized domains. +SFT 通常在你当前模型的能力和你的特殊需求还存在差距时使用,尤其是当你想要精细控制模型输出,或是想让模型在特定领域发挥作用时。 -For example, if you're developing a customer service application, you might want your model to consistently follow company guidelines and handle technical queries in a standardized way. Similarly, in medical or legal applications, accuracy and adherence to domain-specific terminology becomes crucial. In these cases, SFT can help align the model's responses with professional standards and domain expertise. +举例来说,如果你在开发客户服务相关的应用,那你可能需要让模型严格遵循公司规定、用标准化的流程去处理技术性的咨询。或者,在医疗或法律领域,准确表达特定领域的专业术语也很重要。在这些情况下,SFT 就能将模型性能对齐到领域内专家的水平,使得模型的回答符合专业标准。 -## The Fine-Tuning Process +## 微调的过程 -The supervised fine-tuning process involves adjusting a model's weights on a task-specific dataset. +SFT 主要就是在特定任务的数据集上训练模型。 -First, you'll need to prepare or select a dataset that represents your target task. This dataset should include diverse examples that cover the range of scenarios your model will encounter. The quality of this data is important - each example should demonstrate the kind of output you want your model to produce. Next comes the actual fine-tuning phase, where you'll use frameworks like Hugging Face's `transformers` and `trl` to train the model on your dataset. +首先,你需要准备一个能反映任务类型的数据集,这个数据集需要涵盖尽可能广泛的领域内问答场景。数据的质量非常重要,每条数据都应该向模型展示你希望得到的回答类型。接下来就是实际微调阶段了,你可以使用 HuggingFace 的 `transformers` 和 `trl` 在数据集上训练模型。 -Throughout the process, continuous evaluation is essential. You'll want to monitor the model's performance on a validation set to ensure it's learning the desired behaviors without losing its general capabilities. In [module 4](../4_evaluation), we'll cover how to evaluate your model. +整个过程中,不断地对模型进行评测也是很重要的。你可以找一个验证集,然后实时监控模型性能,来确保模型学到了特定领域内的回答,同时又不丧失它原有的通用能力。在[第四章](../4_evaluation),我们将讲解如何评测模型。 -## The Role of SFT in Preference Alignment +## 通过 SFT 对齐特定偏好 -SFT plays a fundamental role in aligning language models with human preferences. Techniques such as Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) rely on SFT to form a base level of task understanding before further aligning the model’s responses with desired outcomes. Pre-trained models, despite their general language proficiency, may not always generate outputs that match human preferences. SFT bridges this gap by introducing domain-specific data and guidance, which improves the model’s ability to generate responses that align more closely with human expectations. +此外,SFT 也广泛用于将语言模型对齐到特定的人类偏好上。如 RLHF 和 DPO 等技术依靠 SFT 来形成对任务的基本理解,然后再进一步对模型的响应进行优化,以达到预期效果。预训练模型,尽管在通用的语言能力上效果很强,但它的回答可能不符合人类的偏好。SFT 通过引入专业领域的数据进行指导,可以改善模型生成与人类期望的匹配程度。 -## Supervised Fine-Tuning With Transformer Reinforcement Learning +## 使用 Transformer Reinforcement Learning 进行 SFT -A key software package for Supervised Fine-Tuning is Transformer Reinforcement Learning (TRL). TRL is a toolkit used to train transformer language models using reinforcement learning (RL). +TRL(Transformer Reinforcement Learning)是常用于 SFT 的一个重要的软件包。如果你使用强化学习训练 transformer 系列的语言模型,TRL 将为你提供有用工具。 -Built on top of the Hugging Face Transformers library, TRL allows users to directly load pretrained language models and supports most decoder and encoder-decoder architectures. The library facilitates major processes of RL used in language modelling, including supervised fine-tuning (SFT), reward modeling (RM), proximal policy optimization (PPO), and Direct Preference Optimization (DPO). We will use TRL in a number of modules throughout this repo. +TRL 基于 HuggingFace 的 `transformers` 构建,允许用户直接载入预训练模型,并支持大部分 decoder 或 encoder-decoder 的架构。这个库涵盖了针对语言模型的主流强化学习算法,包括 SFT、奖励模型、PPO、DPO 等。在本课程中,我们将大量使用 TRL。 -# Next Steps +# 接下来的学习 -Try out the following tutorials to get hands on experience with SFT using TRL: +请通过下列 notebook 上手 SFT: -⏭️ [Chat Templates Tutorial](./notebooks/chat_templates_example.ipynb) +⏭️ [聊天模板教程](./notebooks/chat_templates_example_cn.ipynb) -⏭️ [Supervised Fine-Tuning Tutorial](./notebooks/sft_finetuning_example.ipynb) +⏭️ [有监督微调教程](./notebooks/supervised_fine_tuning_tutorial_cn.ipynb) \ No newline at end of file diff --git a/2_preference_alignment/README.md b/2_preference_alignment/README.md index f85b02de..48232423 100644 --- a/2_preference_alignment/README.md +++ b/2_preference_alignment/README.md @@ -1,41 +1,40 @@ -# Preference Alignment +# 偏好对齐 -This module covers techniques for aligning language models with human preferences. While supervised fine-tuning helps models learn tasks, preference alignment encourages outputs to match human expectations and values. +本章将学习如何将语言模型的输出和人类偏好对齐。虽然有监督微调(SFT)已经将模型适配到特定的任务领域了,但偏好对齐(Prefenrece Alignment)将会迫使模型的输出更加匹配人类的期望、符合人类的价值观。 -## Overview +## 概览 -Typical alignment methods involve multiple stages: -1. Supervised Fine-Tuning (SFT) to adapt models to specific domains -2. Preference alignment (like RLHF or DPO) to improve response quality +典型的偏好对齐方法一般都包含这几个步骤: +1. 使用 SFT 将模型适配到特定的领域 +2. 使用偏好对齐(如 RLHF 或 DPO 算法)进一步提升模型回答的质量 -Alternative approaches like ORPO combine instruction tuning and preference alignment into a single process. Here, we will focus on DPO and ORPO algorithms. +其它偏好对齐算法还包括 ORPO,这个算法将指令微调和偏好对齐结合进了一个单一步骤中。本章我们将重点学习 DPO 和 ORPO 算法。 -If you would like to learn more about the different alignment techniques, you can read more about them in the [Argilla Blog](https://argilla.io/blog/mantisnlp-rlhf-part-8). +如果你还想进一步学习相关对齐算法,你可以阅读[这篇博客](https://argilla.io/blog/mantisnlp-rlhf-part-8)。 -### 1️⃣ Direct Preference Optimization (DPO) +### 1️⃣ 直接偏好优化(DPO) -Direct Preference Optimization (DPO) simplifies preference alignment by directly optimizing models using preference data. This approach eliminates the need for separate reward models and complex reinforcement learning, making it more stable and efficient than traditional Reinforcement Learning from Human Feedback (RLHF). For more details, you can refer to the [Direct Preference Optimization (DPO) documentation](./dpo.md). +直接偏好优化(Direct Preference Optimization),简称 DPO,直接使用偏好数据对模型进行参数更新。这简化了偏好对齐的过程。这个方法无需额外设置激励模型、无需复杂强化学习步骤,比基于人类反馈的强化学习(RLHF)更高效更稳定。本章中对应的学习资料在[这里](./dpo.md)。 +### 2️⃣ 基于优势比的偏好优化(ORPO) -### 2️⃣ Odds Ratio Preference Optimization (ORPO) +基于优势比的偏好优化(Odds Ratio Preference Optimization),简称 ORPO,是一种将指令微调和偏好对齐结合在一起的方法。通过在 token 层面定义一个优势比(Odds),并在优势比上使用负对数似然损失函数,ORPO 改变了传统的语言建模的目标函数。ORPO 训练步骤简单、无需 DPO 中的参考模型,计算效率也更高。该方法在多项评测基准上展现了优秀的效果,尤其在 AlpacaEval 超越了传统方法。本章中对应的学习资料在[这里](./orpo.md)。 -ORPO introduces a combined approach to instruction tuning and preference alignment in a single process. It modifies the standard language modeling objective by combining negative log-likelihood loss with an odds ratio term on a token level. The approach features a unified single-stage training process, reference model-free architecture, and improved computational efficiency. ORPO has shown impressive results across various benchmarks, demonstrating better performance on AlpacaEval compared to traditional methods. For more details, you can refer to the [Odds Ratio Preference Optimization (ORPO) documentation](./orpo.md). +## 实践练习 -## Exercise Notebooks - -| Title | Description | Exercise | Link | Colab | +| 标题 | 简介 | 习题 | 链接 | Colab | |-------|-------------|----------|------|-------| -| DPO Training | Learn how to train models using Direct Preference Optimization | 🐢 Train a model using the Anthropic HH-RLHF dataset
🐕 Use your own preference dataset
🦁 Experiment with different preference datasets and model sizes | [Notebook](./notebooks/dpo_finetuning_example.ipynb) | Open In Colab | -| ORPO Training | Learn how to train models using Odds Ratio Preference Optimization | 🐢 Train a model using instruction and preference data
🐕 Experiment with different loss weightings
🦁 Compare ORPO results with DPO | [Notebook](./notebooks/orpo_finetuning_example.ipynb) | Open In Colab | +| DPO 训练 | 学习用 DPO 训练模型 | 🐢 在 Anthropic HH-RLHF 数据集上训练模型
🐕 使用你自己的偏好数据集
🦁 对不同的偏好数据集和不同大小的模型进行实验 | [Notebook](./notebooks/dpo_finetuning_example.ipynb) | Open In Colab | +| ORPO 训练 | 学习用 ORPO 训练模型 | 🐢 用指令数据和偏好数据训练模型
🐕 对不同的损失权重进行实验
🦁 对比 ORPO 和 DPO 的结果 | [Notebook](./notebooks/orpo_finetuning_example.ipynb) | Open In Colab | -## Resources +## 参考资料 -- [TRL Documentation](https://huggingface.co/docs/trl/index) - Documentation for the Transformers Reinforcement Learning (TRL) library, which implements various alignment techniques including DPO. -- [DPO Paper](https://arxiv.org/abs/2305.18290) - Original research paper introducing Direct Preference Optimization as a simpler alternative to RLHF that directly optimizes language models using preference data. -- [ORPO Paper](https://arxiv.org/abs/2403.07691) - Introduces Odds Ratio Preference Optimization, a novel approach that combines instruction tuning and preference alignment in a single training stage. -- [Argilla RLHF Guide](https://argilla.io/blog/mantisnlp-rlhf-part-8/) - A guide explaining different alignment techniques including RLHF, DPO, and their practical implementations. -- [Blog post on DPO](https://huggingface.co/blog/dpo-trl) - Practical guide on implementing DPO using the TRL library with code examples and best practices. -- [TRL example script on DPO](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) - Complete example script demonstrating how to implement DPO training using the TRL library. -- [TRL example script on ORPO](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py) - Reference implementation of ORPO training using the TRL library with detailed configuration options. -- [Hugging Face Alignment Handbook](https://github.com/huggingface/alignment-handbook) - Resource guides and codebase for aligning language models using various techniques including SFT, DPO, and RLHF. +- [TRL 官方文档](https://huggingface.co/docs/trl/index) - TRL 是一个基于 Transformers 的强化学习库,这里实现了包括 DPO 在内的各种对齐算法。 +- [DPO 论文](https://arxiv.org/abs/2305.18290) - 该论文针对当时已有的 RLHF 方法,提出了新的对齐方法,可以直接使用偏好数据优化模型参数。 +- [ORPO 论文](https://arxiv.org/abs/2403.07691) - ORPO 算法将指令微调和偏好优化和并进一个训练步骤中。 +- [RLHF 相关博客](https://argilla.io/blog/mantisnlp-rlhf-part-8/) - 这篇博客介绍了包括 RLHF、DPO 在内的对齐算法,同时也介绍了具体实现方法。 +- [DPO 相关博客](https://huggingface.co/blog/dpo-trl) - 介绍了使用 TRL 实现 DPO 的具体步骤,包括示例代码和其它最佳实践经验。 +- [DPO 示例训练脚本](https://github.com/huggingface/trl/blob/main/examples/scripts/dpo.py) - 完整的基于 TRL 的 DPO 训练代码。 +- [ORPO 示例训练脚本](https://github.com/huggingface/trl/blob/main/examples/scripts/orpo.py) - 完整的基于 TRL 的 ORPO 训练代码。 +- [Hugging Face 关于对齐训练的资料](https://github.com/huggingface/alignment-handbook) - 包括 SFT、DPO、RLHF 的语言模型对齐算法介绍,包括理论指导和实践代码。 \ No newline at end of file diff --git a/2_preference_alignment/dpo.md b/2_preference_alignment/dpo.md index a2207489..ea3b23d8 100644 --- a/2_preference_alignment/dpo.md +++ b/2_preference_alignment/dpo.md @@ -1,24 +1,24 @@ -# Direct Preference Optimization (DPO) +# 直接偏好优化(DPO) -Direct Preference Optimization (DPO) offers a simplified approach to aligning language models with human preferences. Unlike traditional RLHF methods that require separate reward models and complex reinforcement learning, DPO directly optimizes the model using preference data. +直接偏好优化(Direct Preference Optimization),简称 DPO,是一种非常简洁的使用人类偏好数据对齐模型的算法。DPO 直接使用偏好数据优化模型,无需 RLHF 的激励模型和强化学习步骤。 -## Understanding DPO +## 理解 DPO -DPO recasts preference alignment as a classification problem on human preference data. Traditional RLHF approaches require training a separate reward model and using complex reinforcement learning algorithms like PPO to align model outputs. DPO simplifies this process by defining a loss function that directly optimizes the model's policy based on preferred vs non-preferred outputs. +DPO 将偏好对齐任务转化为了一个在偏好数据上训练的分类任务。传统的 RLHF 需要训练一个额外的激励模型,并使用强化学习方法(如 PPO)去对齐模型输出。DPO简化了这个步骤,通过定义一个损失函数,直接在“倾向的输出”和“不倾向的输出”上进行训练。 -This approach has proven highly effective in practice, being used to train models like Llama. By eliminating the need for a separate reward model and reinforcement learning stage, DPO makes preference alignment more accessible and stable. +这个方法在实践中十分高效,Llama 模型的训练就使用了 DPO。同时,没有了激励模型了强化学习,DPO 训练也更简单、更稳定。 -## How DPO Works +## DPO 工作原理 -The DPO process requires supervised fine-tuning (SFT) to adapt the model to the target domain. This creates a foundation for preference learning by training on standard instruction-following datasets. The model learns basic task completion while maintaining its general capabilities. +在 DPO 之前,我们需要使用 SFT 微调模型,用指令跟随的数据集先把模型适配到特定任务领域中,让模型在这个领域具备基本能力。 -Next comes preference learning, where the model is trained on pairs of outputs - one preferred and one non-preferred. The preference pairs help the model understand which responses better align with human values and expectations. +接下来才是偏好学习。模型将在“倾向的输出”和“不倾向的输出”这样成对的数据上训练,学习哪种类型的回答更符合人类的喜好。 -The core innovation of DPO lies in its direct optimization approach. Rather than training a separate reward model, DPO uses a binary cross-entropy loss to directly update the model weights based on preference data. This streamlined process makes training more stable and efficient while achieving comparable or better results than traditional RLHF. +DPO 的关键原理在于它直接使用偏好数据进行优化。不同于 RLHF,DPO 使用了二分类的交叉墒损失函数,这里的损失直接在“倾向的输出”和“不倾向的输出”这样的成对数据上计算。这使得模型训练更稳定、更高效,同时效果甚至还比 RLHF 好。 -## DPO datasets +## DPO 数据集 -Datasets for DPO are typically created by annotating pairs of responses as preferred or non-preferred. This can be done manually or using automated filtering techniques. Below is an example structure of single turn preference dataset for DPO: +构造 DPO 专用数据集,一般需要对回答进行“倾向”和“不倾向”的标注。使用人工标注或自动化方法都可以实现这一步骤。下表就是一个示例数据集: | Prompt | Chosen | Rejected | |--------|--------|----------| @@ -26,14 +26,14 @@ Datasets for DPO are typically created by annotating pairs of responses as prefe | ... | ... | ... | | ... | ... | ... | -The `Prompt` column contains the prompt used to generate the `Chosen` and `Rejected` responses. The `Chosen` and `Rejected` columns contain the responses that are preferred and non-preferred respectively. There are variations on this structure, for example, including a system prompt column or `Input` column containing reference material. The values of `chosen` and `rejected` can be be represented as strings for single turn conversations or as conversation lists. +`Prompt` 这一栏提供问题, `Chosen` 和 `Rejected` 分别代表针对这个问题我们倾向的回答和不倾向的回答。`chosen` 和 `rejected` 也可以是一个列表形式,包含多个不同的回答。 -You can find a collection of DPO datasets on Hugging Face [here](https://huggingface.co/collections/argilla/preference-datasets-for-dpo-656f0ce6a00ad2dc33069478). +你可以在 Hugging Face 的[这个地方](https://huggingface.co/collections/argilla/preference-datasets-for-dpo-656f0ce6a00ad2dc33069478)找到很多 DPO 数据集。 -## Implementation with TRL +## 用 TRL 实现 DPO -The Transformers Reinforcement Learning (TRL) library makes implementing DPO straightforward. The `DPOConfig` and `DPOTrainer` classes follow the same `transformers` style API. -Here's a basic example of setting up DPO training: +使用 TRL 实现 DPO 非常简单直接,仅需配置 `DPOConfig` 和 `DPOTrainer` 即可。这两个类遵循 `transformers` 的 API 风格。 +下面就是一个简单的例子: ```python from trl import DPOConfig, DPOTrainer @@ -55,18 +55,18 @@ trainer = DPOTrainer( trainer.train() ``` -We will cover more details on how to use the `DPOConfig` and `DPOTrainer` classes in the [DPO Tutorial](./notebooks/dpo_finetuning_example.ipynb). +我们还将在 [DPO 教程](./notebooks/dpo_finetuning_example.ipynb) 中详细讲解 `DPOConfig` 和 `DPOTrainer` 的配置。 -## Best Practices +## 最佳实践 -Data quality is crucial for successful DPO implementation. The preference dataset should include diverse examples covering different aspects of desired behavior. Clear annotation guidelines ensure consistent labeling of preferred and non-preferred responses. You can improve model performance by improving the quality of your preference dataset. For example, by filtering down larger datasets to include only high quality examples, or examples that relate to your use case. +数据质量对 DPO 的成败至关重要。偏好数据集必须足够多样,涵盖不同的想要的回答。在数据标注过程中,需要制定清晰明确的标注指导。通过提高数据集质量一般都可以提升模型性能,可能的做法包括对大规模数据集进行过滤,仅保留高质量数据,或仅保留和应用领域相关的数据。 -During training, carefully monitor the loss convergence and validate performance on held-out data. The beta parameter may need adjustment to balance preference learning with maintaining the model's general capabilities. Regular evaluation on diverse prompts helps ensure the model is learning the intended preferences without overfitting. +训练过程中,仔细监视损失的收敛情况、及时验证性能也很重要。及时调节 $\beta$ 参数,在偏好学习和通用能力间找到平衡。有规律地在多样的问题上做验证测试,确保模型不过你和。这些也都很重要。 -Compare the model's outputs with the reference model to verify improvement in preference alignment. Testing on a variety of prompts, including edge cases, helps ensure robust preference learning across different scenarios. +同时,也要对比一下原模型和优化后模型针对同一问题的回答,看看 模型是否学到了偏好。在包括极端情况下的问题集上测试,确保模型健壮性。 -## Next Steps +## 接下来的学习 -⏩ To get hands-on experience with DPO, try the [DPO Tutorial](./notebooks/dpo_finetuning_example.ipynb). This practical guide will walk you through implementing preference alignment with your own model, from data preparation to training and evaluation. +⏩ 在 [DPO 教程](./notebooks/dpo_finetuning_example.ipynb)中,你可以直接上手实践。该教程将会带你实践 DPO 的整个过程,从数据准备指导模型训练和验证。 -⏭️ After completing the tutorial, you can explore the [ORPO](./orpo.md) page to learn about another preference alignment technique. \ No newline at end of file +⏭️ 之后,你还可以学习 [ORPO](./orpo.md),了解更多偏好优化算法。 \ No newline at end of file diff --git a/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb b/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb index 3ddef6a6..e3d9869b 100644 --- a/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb +++ b/2_preference_alignment/notebooks/dpo_finetuning_example.ipynb @@ -4,18 +4,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Preference Alignment with Direct Preference Optimization (DPO)\n", + "# 使用直接偏好优化(DPO)进行偏好对齐\n", "\n", - "This notebook will guide you through the process of fine-tuning a language model using Direct Preference Optimization (DPO). We will use the SmolLM2-135M-Instruct model which has already been through a SFT training, so it it compatible with DPO. You can also use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).\n", + "本教程将会带你使用直接偏好优化(DPO)的方法去微调一个语言模型。这里我们使用的是 SmolLM2-135M-Instruct 模型,它已经经过了 SFT 这一训练步骤,所以可以接着进行 DPO。当然你也可以使用你在[第一章](../../1_instruction_tuning/notebooks/sft_finetuning_example_cn.ipynb)训练好的模型。\n", "\n", "
\n", - "

Exercise: Aligning SmolLM2 with DPOTrainer

\n", - "

Take a dataset from the Hugging Face hub and align a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the `trl-lib/ultrafeedback_binarized` dataset

\n", - "

🐕 Try out the `argilla/ultrafeedback-binarized-preferences` dataset

\n", - "

🦁 Select a dataset that relates to a real-world use case you’re interested in, or use the model you trained in \n", - " 1_instruction_tuning

\n", + "

练习:用 DPOTrainer 对 SmolLM2 做对齐训练

\n", + "

从 Hugging Face hub 上找一个数据集并在上面对齐模型。

\n", + "

难度等级

\n", + "

🐢 使用 `trl-lib/ultrafeedback_binarized` 数据集

\n", + "

🐕 使用 `argilla/ultrafeedback-binarized-preferences` 数据集

\n", + "

🦁 选用你感兴趣的真实世界数据集,或者使用你在上一章 SFT 训练好的模型

\n", "
" ] }, @@ -41,7 +40,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Import libraries\n" + "## 准备好需要的 Python 库\n" ] }, { @@ -60,7 +59,7 @@ "import os\n", "from transformers import AutoModelForCausalLM, AutoTokenizer\n", "from datasets import load_dataset\n", - "from trl import DPOTrainer, DPOConfig" + "from trl import DPOTrainer, DPOConfig\n" ] }, { @@ -69,7 +68,7 @@ "id": "d8CvUgROUDw-" }, "source": [ - "## Format dataset" + "## 配置数据集及其格式" ] }, { @@ -82,7 +81,7 @@ "source": [ "# Load dataset\n", "\n", - "# TODO: 🦁🐕 change the dataset to one of your choosing\n", + "# TODO: 🦁🐕 你也可以使用你感兴趣的其他数据集\n", "dataset = load_dataset(path=\"trl-lib/ultrafeedback_binarized\", split=\"train\")" ] }, @@ -92,20 +91,20 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: 🐕 If your dataset is not represented as conversation lists, you can use the `process_dataset` function to convert it." + "# TODO: 🐕 如果你的数据集不是对话列表形式,你可以使用 `process_dataset` 函数进行转化" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Select the model\n", + "## 选择一个经 SFT 训练过模型\n", "\n", - "We will use the SmolLM2-135M-Instruct model which has already been through a SFT training, so it it compatible with DPO. You can also use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).\n", + "我们这里使用 SmolLM2-135M-Instruct 模型,因为它已经经过 SFT 了,所以可以接着进行 DPO 训练。你也可以使用你[第一章](../../1_instruction_tuning/notebooks/sft_finetuning_example_cn.ipynb) 训练好的模型。\n", "\n", "\n", "
\n", - "

🦁 change the model to the path or repo id of the model you trained in 1_instruction_tuning

\n", + "

🦁 如果你想使用第一章你自己训练的模型,你需要修改下面代码的 model_name 到指定的路径

\n", "
\n" ] }, @@ -115,14 +114,14 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: 🦁 change the model to the path or repo id of the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb)\n", - "\n", "model_name = \"HuggingFaceTB/SmolLM2-135M-Instruct\"\n", "\n", "device = (\n", " \"cuda\"\n", " if torch.cuda.is_available()\n", - " else \"mps\" if torch.backends.mps.is_available() else \"cpu\"\n", + " else \"mps\"\n", + " if torch.backends.mps.is_available()\n", + " else \"cpu\"\n", ")\n", "\n", "# Model to fine-tune\n", @@ -145,7 +144,7 @@ "id": "DeT5eUK_UJgK" }, "source": [ - "## Train model with DPO" + "## 开始 DPO 训练" ] }, { @@ -242,45 +241,38 @@ "source": [ "# Training arguments\n", "training_args = DPOConfig(\n", - " # Training batch size per GPU\n", + " # 单 GPU 上 batch size 的大小\n", " per_device_train_batch_size=4,\n", - " # Number of updates steps to accumulate before performing a backward/update pass\n", - " # Effective batch size = per_device_train_batch_size * gradient_accumulation_steps\n", + " # 每经过多少步才进行一次反向传播、参数更新\n", + " # 每个 GPU 实际起作用的 batch size 等于 per_device_train_batch_size 乘 gradient_accumulation_steps\n", " gradient_accumulation_steps=4,\n", - " # Saves memory by not storing activations during forward pass\n", - " # Instead recomputes them during backward pass\n", + " # 这个操作在前向传播时不保存中间结果激活值,可以节省内存\n", + " # 在反向传播时,模型重新计算前向过程,获取计算梯度所需要的激活值\n", " gradient_checkpointing=True,\n", - " # Base learning rate for training\n", + " # 训练时的基本学习率\n", " learning_rate=5e-5,\n", - " # Learning rate schedule - 'cosine' gradually decreases LR following cosine curve\n", + " # 训练过程中学习率变化策略:\"cosine\" 策略根据余弦函数的曲线形状逐步递减学习率\n", " lr_scheduler_type=\"cosine\",\n", - " # Total number of training steps\n", + " # 总的训练部数\n", " max_steps=200,\n", - " # Disables model checkpointing during training\n", + " # 在模型训练过程中不去保存 ckeckpoint\n", " save_strategy=\"no\",\n", - " # How often to log training metrics\n", + " # 打印 log 的频率\n", " logging_steps=1,\n", - " # Directory to save model outputs\n", + " # 保存结果的目录\n", " output_dir=\"smol_dpo_output\",\n", - " # Number of steps for learning rate warmup\n", + " # 学习率 warmup 的步数\n", " warmup_steps=100,\n", - " # Use bfloat16 precision for faster training\n", + " # 是否使用 bfloat16 精度来加速训练\n", " bf16=True,\n", - " # Disable wandb/tensorboard logging\n", - " report_to=\"none\",\n", - " # Keep all columns in dataset even if not used\n", + " # 这里我们不使用 wandb 或 tensorboard \n", + " report_to=None,\n", + " # 这里我们保留数据集里没用到的数据列\n", " remove_unused_columns=False,\n", - " # Enable MPS (Metal Performance Shaders) for Mac devices\n", + " # 如果你不使用 Mac 电脑,这里就是 False\n", " use_mps_device=device == \"mps\",\n", - " # Model ID for HuggingFace Hub uploads\n", + " # 如果你要将训练好的模型上传到 HuggingFace Hub,这里就是你的模型名字 \n", " hub_model_id=finetune_name,\n", - " # DPO-specific temperature parameter that controls the strength of the preference model\n", - " # Lower values (like 0.1) make the model more conservative in following preferences\n", - " beta=0.1,\n", - " # Maximum length of the input prompt in tokens\n", - " max_prompt_length=1024,\n", - " # Maximum combined length of prompt + response in tokens\n", - " max_length=1536,\n", ")" ] }, @@ -299,13 +291,13 @@ " train_dataset=dataset,\n", " # Tokenizer for processing inputs\n", " processing_class=tokenizer,\n", - " # DPO-specific temperature parameter that controls the strength of the preference model\n", - " # Lower values (like 0.1) make the model more conservative in following preferences\n", - " # beta=0.1,\n", - " # Maximum length of the input prompt in tokens\n", - " # max_prompt_length=1024,\n", - " # Maximum combined length of prompt + response in tokens\n", - " # max_length=1536,\n", + " # DPO 特有的 temperature 参数,控制参考模型的强度\n", + " # 较小的值可以让模型更保守、更缓慢地学习偏好\n", + " beta=0.1,\n", + " # 提示语的最大长度\n", + " max_prompt_length=1024,\n", + " # 提示语 + 回答 的序列最大长度\n", + " max_length=1536,\n", ")" ] }, @@ -330,13 +322,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 💐 You're done!\n", + "## 💐 完成\n", "\n", - "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `DPOTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", + "本教程带你一步步地学习了用 `DPOTrainer` 训练 `HuggingFaceTB/SmolLM2-135M`的过程。利用这些步骤,你可以高效快速地进行特定任务领域的偏好优化训练。感兴趣的话,你还可以:\n", "\n", - "- Try this notebook on a harder difficulty\n", - "- Review a colleagues PR\n", - "- Improve the course material via an Issue or PR." + "- 尝试本教程难度等级更高的任务\n", + "- 在 GitHub 上评审别人的 Pull Request\n", + "- 通过 Issue 或 PR 提出课程改进的意见" ] } ], diff --git a/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb b/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb index a1ec3a18..39c1bd9a 100644 --- a/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb +++ b/2_preference_alignment/notebooks/orpo_finetuning_example.ipynb @@ -4,17 +4,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Preference Alignment with Odds Ratio Preference Optimization (ORPO)\n", + "# 使用基于优势比的偏好优化(ORPO)进行偏好对齐\n", "\n", - "This notebook will guide you through the process of fine-tuning a language model using Odds Ratio Preference Optimization (ORPO). We will use the SmolLM2-135M model which has **not** been through SFT training, so it is not compatible with DPO. This means, you cannot use the model you trained in [1_instruction_tuning](../../1_instruction_tuning/notebooks/sft_finetuning_example.ipynb).\n", + "本教程将会带你使用基于优势比的偏好优化(ORPO)算法去微调一个语言模型。这里我们使用 SmolLM2-135M 模型,因为它还**没有**经过 SFT 的训练(所以这个模型是不能直接进行 DPO 训练的)。这里你就不能使用你在[第一章](../../1_instruction_tuning/notebooks/sft_finetuning_example_cn.ipynb)训练好的模型了。\n", "\n", "
\n", - "

Exercise: Aligning SmolLM2 with ORPOTrainer

\n", - "

Take a dataset from the Hugging Face hub and align a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the `trl-lib/ultrafeedback_binarized` dataset

\n", - "

🐕 Try out the `argilla/ultrafeedback-binarized-preferences` dataset

\n", - "

🦁 Try on a subset of mlabonne's `orpo-dpo-mix-40k` dataset

\n", + "

练习:借助 ORPOTrainer 对齐 SmolLM2 模型

\n", + "

从 Hugging Face hub 上找一个数据集并在上面对齐模型。

\n", + "

难度等级

\n", + "

🐢 使用 `trl-lib/ultrafeedback_binarized` 数据集

\n", + "

🐕 尝试使用 `argilla/ultrafeedback-binarized-preferences` 数据集

\n", + "

🦁 尝试使用 `mlabonne/orpo-dpo-mix-40k` 数据集的一个子集进行训练

\n", "
\n", "\n" ] @@ -23,7 +23,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Import libraries\n" + "## 准备好需要的 Python 库\n" ] }, { @@ -66,7 +66,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Format dataset" + "## 配置数据集及其格式" ] }, { @@ -86,7 +86,7 @@ "source": [ "# Load dataset\n", "\n", - "# TODO: 🦁🐕 change the dataset to one of your choosing\n", + "# TODO: 🦁🐕 你也可以使用你感兴趣的其他数据集\n", "dataset = load_dataset(path=\"trl-lib/ultrafeedback_binarized\")" ] }, @@ -96,14 +96,14 @@ "metadata": {}, "outputs": [], "source": [ - "# TODO: 🐕 If your dataset is not represented as conversation lists, you can use the `process_dataset` function to convert it." + "# TODO: 🐕 如果你的数据集不是对话列表形式,你可以使用 `process_dataset` 函数进行转化" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "## Define the model" + "## 定义好你要使用的模型" ] }, { @@ -279,7 +279,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Train model with ORPO" + "## 开始 ORPO 训练" ] }, { @@ -320,38 +320,38 @@ "outputs": [], "source": [ "orpo_args = ORPOConfig(\n", - " # Small learning rate to prevent catastrophic forgetting\n", + " # 使用较小的学习率可以预防“灾难性遗忘”\n", " learning_rate=8e-6,\n", - " # Linear learning rate decay over training\n", + " # 学习率衰减策略\n", " lr_scheduler_type=\"linear\",\n", - " # Maximum combined length of prompt + completion\n", + " # 提示词 + 回答 的最大长度(最大 token 数量)\n", " max_length=1024,\n", - " # Maximum length for input prompts\n", + " # 输入的提示词的最大长度\n", " max_prompt_length=512,\n", - " # Controls weight of the odds ratio loss (λ in paper)\n", + " # 优势比损失函数的权重(论文中的 λ 超参数)\n", " beta=0.1,\n", - " # Batch size for training\n", + " # 单个 GPU 的 batch size\n", " per_device_train_batch_size=2,\n", " per_device_eval_batch_size=2,\n", - " # Helps with training stability by accumulating gradients before updating\n", + " # 每多少步前向过程才更新一次参数,调节这里可以稳定训练\n", " gradient_accumulation_steps=4,\n", - " # Memory-efficient optimizer for CUDA, falls back to adamw_torch for CPU/MPS\n", + " # 使用 paged_adamw_8bit 优化器可以为 CUDA 节省内存,如果是 CPU 或 MPS 设备,则退回到使用 adamw_torch 优化器\n", " optim=\"paged_adamw_8bit\" if device == \"cuda\" else \"adamw_torch\",\n", - " # Number of training epochs\n", + " # 训练的 epoch 数量\n", " num_train_epochs=1,\n", - " # When to run evaluation\n", + " # 训练过程中验证的策略\n", " evaluation_strategy=\"steps\",\n", - " # Evaluate every 20% of training\n", + " # 每完成 20% 训练就验证一次\n", " eval_steps=0.2,\n", - " # Log metrics every step\n", + " # log 打印频率\n", " logging_steps=1,\n", - " # Gradual learning rate warmup\n", + " # 学习率 warmup 的步数\n", " warmup_steps=10,\n", - " # Disable external logging\n", - " report_to=\"none\",\n", - " # Where to save model/checkpoints\n", + " # 这里我们不使用 wandb 或 tensorboard,不使用外部打印 log 的功能\n", + " report_to=None,\n", + " # 保存训练结果的目录\n", " output_dir=\"./results/\",\n", - " # Enable MPS (Metal Performance Shaders) if available\n", + " # 如果是苹果电脑,将会使用 MPS 硬件加速\n", " use_mps_device=device == \"mps\",\n", " hub_model_id=finetune_name,\n", ")" @@ -392,13 +392,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 💐 You're done!\n", + "## 💐 完成\n", "\n", - "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolLM2-135M` model using the `ORPOTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", + "本教程带你一步步地学习了用 `ORPOTrainer` 去训练 `HuggingFaceTB/SmolLM2-135M` 模型。利用这些步骤,你可以高效快速地进行特定任务领域的偏好优化训练。感兴趣的话,你还可以:\n", "\n", - "- Try this notebook on a harder difficulty\n", - "- Review a colleagues PR\n", - "- Improve the course material via an Issue or PR." + "- 尝试本教程难度等级更高的任务\n", + "- 在 GitHub 上评审别人的 Pull Request\n", + "- 通过 Issue 或 PR 提出课程改进的意见" ] } ], diff --git a/2_preference_alignment/orpo.md b/2_preference_alignment/orpo.md index 29ede4eb..a8808871 100644 --- a/2_preference_alignment/orpo.md +++ b/2_preference_alignment/orpo.md @@ -1,47 +1,46 @@ -# Odds Ratio Preference Optimization (ORPO) +# 基于优势比的偏好优化(ORPO) -ORPO (Odds Ratio Preference Optimization) is a novel fine-tuning technique that combines fine-tuning and preference alignment into a single unified process. This combined approach offers advantages in efficiency and performance compared to traditional methods like RLHF or DPO. +基于优势比的偏好优化(Odds Ratio Preference Optimization)或 ORPO,是一种更新颖的偏好对齐方法,它把微调和偏好对齐结合,组成一个统一的过程。这个算法相比于 RLHF 和 DPO 有着更高的效率和更好的性能。 -## Understanding ORPO +## 理解 ORPO -Alignment with methods like DPO typically involve two separate steps: supervised fine-tuning to adapt the model to a domain and format, followed by preference alignment to align with human preferences. While SFT effectively adapts models to target domains, it can inadvertently increase the probability of generating both desirable and undesirable responses. ORPO addresses this limitation by integrating both steps into a single process, as illustrated in the comparison below: +诸如 DPO 的对齐方法一般包含两个步骤:使用 SFT 先让模型适配如这个领域或回答格式,然后进行偏好对齐训练。虽然 SFT 已经将模型对齐到了特定任务领域,但模型不可避免可能会产生我们不期望的回答,所以我们还需要进行下一步的偏好对齐。ORPO 则合并了这两个步骤。下图取自 ORPO 论文,对比了 RLHF、DPO 和 ORPO 的差异: ![Alignment Techniques Comparison](https://argilla.io/images/blog/mantisnlp-rlhf/part-8-alignments.png) -*Comparison of different model alignment techniques* +*三种对齐算法的对比* -## How ORPO Works +## ORPO 工作原理 -The training process leverages a preference dataset similar to what we used for DPO, where each training example contains an input prompt along with two responses: one that is preferred, and another that is rejected. Unlike other alignment methods that require separate stages and reference models, ORPO integrates preference alignment directly into the supervised fine-tuning process. This monolithic approach makes it reference model-free, computationally more efficient, and memory efficient with fewer FLOPs. +ORPO 训练使用的数据集和 DPO 相似:针对一个输入的问题包含两个可能输出:一个是“倾向的输出”,另一个是“不倾向的输出”。不同的是,ORPO 直接将偏好对齐加入到 SFT 中。这一整体性方法使得它无需 DPO 中的参考模型,同时也更高效、节省内存。 -ORPO creates a new objective by combining two main components: +ORPO 的损失函数包含两个部分: -1. **SFT Loss**: The standard negative log-likelihood loss used in language modeling, which maximizes the probability of generating reference tokens. This helps maintain the model's general language capabilities. +1. **SFT Loss**:这里使用标准的负对数似然函数,和 DPO 中的类似,用于扩大想要的 token 的生成概率。这个损失函数也有助于模型保持通用的语言能力。 +2. **Odds Ratio Loss**:这是新提出的损失函数。由于上述 SFT Loss 不能惩罚不想要的输出,所以这个函数在激励倾向的输出的同时,也惩罚不倾向的输出。具体来说,这里定义了计算优势比(Odds Ratio)的公式,通过抬高倾向输出和不倾向输出两者的优势比比值,在奖励倾向输出的同时,也压低不倾向输出的生成概率。 -2. **Odds Ratio Loss**: A novel component that penalizes undesirable responses while rewarding preferred ones. This loss function uses odds ratios to effectively contrast between favored and disfavored responses at the token level. +在两个损失函数共同作用下,模型不仅被适配进了相应的任务领域,也压低了不倾向的回答的生成概率。其中,优势比这个机制提供了一个很直观的方法,模拟了倾向回答和不倾向回答之间的差异程度。你也可以阅读 [ORPO 的论文](https://arxiv.org/abs/2403.07691),进一步了解其中的数学理论。如果你对具体的实现感兴趣,你可以阅读 [TRL 中关于这部分的实现](https://github.com/huggingface/trl/blob/b02189aaa538f3a95f6abb0ab46c0a971bfde57e/trl/trainer/orpo_trainer.py#L660)。 -Together, these components guide the model to adapt to desired generations for the specific domain while actively discouraging generations from the set of rejected responses. The odds ratio mechanism provides a natural way to measure and optimize the model's preference between chosen and rejected outputs. If you want to deep dive into the math, you can read the [ORPO paper](https://arxiv.org/abs/2403.07691). If you want to learn about ORPO from the implementation perspective, you should check out how loss for ORPO is calculated in the [TRL library](https://github.com/huggingface/trl/blob/b02189aaa538f3a95f6abb0ab46c0a971bfde57e/trl/trainer/orpo_trainer.py#L660). +## 训练结果 -## Performance and Results - -ORPO has demonstrated impressive results across various benchmarks. On MT-Bench, it achieves competitive scores across different categories: +ORPO 在很多测试基准上都取得了不错的效果。以下是它在 MT-Bench 测试基准上的结果(根据任务类别划分): ![MT-Bench Results](https://argilla.io/images/blog/mantisnlp-rlhf/part-8-mtbench.png) -*MT-Bench results by category for Mistral-ORPO models* +*Mistral-ORPO 模型在 MT-Bench 不同任务领域的结果* -When compared to other alignment methods, ORPO shows superior performance on AlpacaEval 2.0: +在 AlpacaEval 2.0 上,ORPO 展现了超越其它对齐算法的效果: ![AlpacaEval Results](https://argilla.io/images/blog/mantisnlp-rlhf/part-8-winrate.png) -*AlpacaEval 2.0 scores across different alignment methods* +*不同对齐算法在 AlpacaEval 2.0 的得分* -Compared to SFT+DPO, ORPO reduces computational requirements by eliminating the need for a reference model and halving the number of forward passes per batch. Also, the training process is more stable across different model sizes and datasets, requiring fewer hyperparameters to tune. Performance-wise, ORPO matches larger models while showing better alignment with human preferences. +相较于 SFT 加 DPO 的做法,ORPO 通过去除参考模型、减半前向推理的策略,大大降低了计算资源的要求。同时,训练过程也更稳定,需要调节的超参数也更少。在性能上,ORPO 对人类偏好的适配做得也更好。 -## Implementation +## 代码实现 -Successful implementation of ORPO depends heavily on high-quality preference data. The training data should follow clear annotation guidelines and provide a balanced representation of preferred and rejected responses across diverse scenarios. +成功训练 ORPO 也极度依赖高质量数据集。所以标注训练数据时,我们也需要清晰明确的标注标准,确保对话场景的多样性,同时倾向和不倾向的回答需要分布均匀。 -### Implementation with TRL +### 用 TRL 实现 ORPO -ORPO can be implemented using the Transformers Reinforcement Learning (TRL) library. Here's a basic example: +以下代码提供了用 TRL 实现 ORPO 的基本示例: ```python from trl import ORPOConfig, ORPOTrainer @@ -68,17 +67,17 @@ trainer = ORPOTrainer( trainer.train() ``` -Key parameters to consider: -- `orpo_alpha`: Controls the strength of preference optimization -- `orpo_beta`: Temperature parameter for the odds ratio calculation -- `learning_rate`: Should be relatively small to prevent catastrophic forgetting -- `gradient_accumulation_steps`: Helps with training stability +关键参数: +- `orpo_alpha`:用来控制偏好优化部分的权重 +- `orpo_beta`:计算优势比(Odds Ratio)时的 Temperature 参数 +- `learning_rate`:这里需要用较小的学习率,用来防治灾难性遗忘(catastrophic forgetting) +- `gradient_accumulation_steps`:调节这个也能稳定训练 -## Next Steps +## 接下来的学习 -⏩ Try the [ORPO Tutorial](./notebooks/orpo_finetuning_example.ipynb) to implement this unified approach to preference alignment. +⏩ 学习 [ORPO 教程](./notebooks/orpo_finetuning_example.ipynb) 来实践 ORPO 算法。 -## Resources -- [ORPO Paper](https://arxiv.org/abs/2403.07691) -- [TRL Documentation](https://huggingface.co/docs/trl/index) +## 学习资源 +- [ORPO 论文](https://arxiv.org/abs/2403.07691) +- [TRL 官方文档](https://huggingface.co/docs/trl/index) - [Argilla RLHF Guide](https://argilla.io/blog/mantisnlp-rlhf-part-8/) \ No newline at end of file diff --git a/3_parameter_efficient_finetuning/README.md b/3_parameter_efficient_finetuning/README.md index 7dabe6a8..15d13900 100644 --- a/3_parameter_efficient_finetuning/README.md +++ b/3_parameter_efficient_finetuning/README.md @@ -1,39 +1,39 @@ -# Parameter-Efficient Fine-Tuning (PEFT) +# 高效的参数微调(PEFT) -As language models grow larger, traditional fine-tuning becomes increasingly challenging. A full fine-tuning of even a 1.7B parameter model requires substantial GPU memory, makes storing separate model copies expensive, and risks catastrophic forgetting of the model's original capabilities. Parameter-efficient fine-tuning (PEFT) methods address these challenges by modifying only a small subset of model parameters while keeping most of the model frozen. +随着语言模型越来越大,使用传统的模型微调方法去微调 LLM 已经变得越来越难了。举例来说,微调一个 1.7B 参数量的模型需要把所有参数都放进 GPU 显存、保存模型优化和状态信息,甚至还需要保存模型备份,这就需要很大的 GPU 显存使用了。同时,微调所有参数还有很大的“灾难性遗忘”风险,可能损失模型原有的能力。针对此问题,高效的参数微调(Parameter-efficient fine-tuning 或 PEFT)被提出。它在微调模型时,保留大部分参数不变,只微调一小部分参数,大大节省了计算资源的需求。 -Traditional fine-tuning updates all model parameters during training, which becomes impractical for large models. PEFT methods introduce approaches to adapt models using fewer trainable parameters - often less than 1% of the original model size. This dramatic reduction in trainable parameters enables: +传统微调需要更新所有模型参数,对大模型很不现实。而 PEFT 相关方法则发现,仅更新一小部分参数,就足以对模型进行适配,达到微调所期待的效果。这部分需要更新的参数甚至还不到总参数量的 1%。这一重大改进使得以下操作成为可能: -- Fine-tuning on consumer hardware with limited GPU memory -- Storing multiple task-specific adaptations efficiently -- Better generalization in low-data scenarios -- Faster training and iteration cycles +- 在 GPU 显存受限的消费级显卡上微调 LLM +- 高效地为不同任务领域保存不同的微调结果 +- 在数据量不足的微调场景下也可保持很好的泛化性 +- 微调训练耗时更少 + +## PEFT 相关算法 -## Available Methods - -In this module, we will cover two popular PEFT methods: +在本章教程中,我们主要讲解两种比较常用的 PEFT 算法: ### 1️⃣ LoRA (Low-Rank Adaptation) -LoRA has emerged as the most widely adopted PEFT method, offering an elegant solution to efficient model adaptation. Instead of modifying the entire model, **LoRA injects trainable matrices into the model's attention layers.** This approach typically reduces trainable parameters by about 90% while maintaining comparable performance to full fine-tuning. We will explore LoRA in the [LoRA (Low-Rank Adaptation)](./lora_adapters.md) section. - +LoRA 可以说是最常用的 PEFT 算法了,它为高效模型微调提供了一个非常优雅的解决方案。LoRA 在需要更新的参数(一般是 attention layers 的参数)上插入可以训练的参数矩阵,训练时仅训练这部分参数。当模型训练好后,我们会利用这部分参数对原有模型进行重参数化(re-parameterization)。这样可以不改变 LLM 的整体结构和参数数量。通过这种方法,需要更新的参数量能至少减少 90%,同时性能也不差于全量参数微调。我们将在 [LoRA 低秩分解](./lora_adapters_cn.md)部分进一步讲解。 + ### 2️⃣ Prompt Tuning -Prompt tuning offers an **even lighter** approach by **adding trainable tokens to the input** rather than modifying model weights. Prompt tuning is less popular than LoRA, but can be a useful technique for quickly adapting a model to new tasks or domains. We will explore prompt tuning in the [Prompt Tuning](./prompt_tuning.md) section. +Prompt Tuning 则更加轻量化。它通过在输入部分加入**可训练的 token** 来微调,而不是改变模型的参数。Prompt Tuning 没有 LoRA 那么常用,但对于适配模型到新的任务领域来说,是个非常有用的技术。我们将在 [Prompt Tuning](./prompt_tuning_cn.md) 部分进一步讲解。 -## Exercise Notebooks +## 实践练习 -| Title | Description | Exercise | Link | Colab | +| 标题 | 简介 | 习题 | 链接 | Colab | |-------|-------------|----------|------|-------| -| LoRA Fine-tuning | Learn how to fine-tune models using LoRA adapters | 🐢 Train a model using LoRA
🐕 Experiment with different rank values
🦁 Compare performance with full fine-tuning | [Notebook](./notebooks/finetune_sft_peft.ipynb) | Open In Colab | -| Load LoRA Adapters | Learn how to load and use trained LoRA adapters | 🐢 Load pre-trained adapters
🐕 Merge adapters with base model
🦁 Switch between multiple adapters | [Notebook](./notebooks/load_lora_adapter.ipynb) | Open In Colab | +| LoRA 微调 | 学习使用 LoRA adapters 微调模型 | 🐢 用 LoRA 训练一个模型
🐕 试验不同低秩值的效果
🦁 与全量参数微调的效果进行对比 | [Notebook](./notebooks/finetune_sft_peft.ipynb) | Open In Colab | +| 载入 LoRA Adapter | 学习如何加载并使用 LoRA adapters | 🐢 加载训练好的 adapter
🐕 将 adapter 融入原有模型中
🦁 实现不同 adapter 的切换 | [Notebook](./notebooks/load_lora_adapter.ipynb) | Open In Colab | -## Resources -- [PEFT Documentation](https://huggingface.co/docs/peft) -- [LoRA Paper](https://arxiv.org/abs/2106.09685) -- [QLoRA Paper](https://arxiv.org/abs/2305.14314) -- [Prompt Tuning Paper](https://arxiv.org/abs/2104.08691) -- [Hugging Face PEFT Guide](https://huggingface.co/blog/peft) -- [How to Fine-Tune LLMs in 2024 with Hugging Face](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl) -- [TRL](https://huggingface.co/docs/trl/index) +## 参考资料 +- [PEFT 代码库官方文档](https://huggingface.co/docs/peft) +- [LoRA 论文](https://arxiv.org/abs/2106.09685) +- [QLoRA 论文](https://arxiv.org/abs/2305.14314) +- [Prompt Tuning 论文](https://arxiv.org/abs/2104.08691) +- [Hugging Face PEFT 代码库相关博客](https://huggingface.co/blog/peft) +- [文章:How to Fine-Tune LLMs in 2024 with Hugging Face](https://www.philschmid.de/fine-tune-llms-in-2024-with-trl) +- [TRL 代码库官方文档](https://huggingface.co/docs/trl/index) diff --git a/3_parameter_efficient_finetuning/lora_adapters.md b/3_parameter_efficient_finetuning/lora_adapters.md index bc41b85c..147cd872 100644 --- a/3_parameter_efficient_finetuning/lora_adapters.md +++ b/3_parameter_efficient_finetuning/lora_adapters.md @@ -1,16 +1,16 @@ -# LoRA (Low-Rank Adaptation) +# LoRA(低秩分解) -LoRA has become the most widely adopted PEFT method. It works by adding small rank decomposition matrices to the attention weights, typically reducing trainable parameters by about 90%. +LoRA 是最常用的 PEFT 算法。它针对 attention 层的权重加入参数量较少、低秩分解过的参数矩阵,用模型原有参数和低秩分解参数计算出的激活值之和代表微调过后的激活值。这样当我们只更新低秩分解过的参数矩阵时,我们需要训练的参数量能减少大约 90%。 -## Understanding LoRA +## 理解 LoRA -LoRA (Low-Rank Adaptation) is a parameter-efficient fine-tuning technique that freezes the pre-trained model weights and injects trainable rank decomposition matrices into the model's layers. Instead of training all model parameters during fine-tuning, LoRA decomposes the weight updates into smaller matrices through low-rank decomposition, significantly reducing the number of trainable parameters while maintaining model performance. For example, when applied to GPT-3 175B, LoRA reduced trainable parameters by 10,000x and GPU memory requirements by 3x compared to full fine-tuning. You can read more about LoRA in the [LoRA paper](https://arxiv.org/pdf/2106.09685). +LoRA 全称是 Low-Rank Adaptation,或叫做“低秩分解”。它的基本做法是,在微调时,冻结所有预训练模型的参数,同时为需要微调的模型层注入额外的可训练的参数矩阵(通常称之为 Adapter)。通过对需要微调的层的参数矩阵进行低秩分解,可以得到两个参数量较小的新参数矩阵;而这一层的前向计算激活值则可以用“原有参数矩阵计算出的激活值”加上“低秩分解出的两个矩阵计算出的激活值”而得到。训练时,我们只需要训练两个低秩分解的矩阵即可,这样极大减少了所需微调的参数量,同时也能保持原有模型性能。例如,如果我们用 LoRA 微调 GPT-3 175B 模型,相比于全量参数的微调,LoRA 需要参与训练的参数量可减少至万分之一、GPU 现存需求可减少至三分之一。感兴趣的读者可以阅读 [LoRA 的论文](https://arxiv.org/pdf/2106.09685)。 -LoRA works by adding pairs of rank decomposition matrices to transformer layers, typically focusing on attention weights. During inference, these adapter weights can be merged with the base model, resulting in no additional latency overhead. LoRA is particularly useful for adapting large language models to specific tasks or domains while keeping resource requirements manageable. +一般而言,LoRA 都是对 transformer 层的参数进行低秩分解的,尤其是在与注意力机制相关的参数上。在推理过程中,这些 adapter 的参数可以被直接融合进模型中,得到与原模型结构完全一致的新模型,无需增加新的层。得益于此,LoRA 尤其适合在低计算资源情况下,将大模型适配进入特定的任务领域。 -## Loading LoRA Adapters +## 如何载入 LoRA 的 Adapters -Adapters can be loaded onto a pretrained model with load_adapter(), which is useful for trying out different adapters whose weights aren’t merged. Set the active adapter weights with the set_adapter() function. To return the base model, you could use unload() to unload all of the LoRA modules. This makes it easy to switch between different task-specific weights. +如果你使用 `peft` 库,你可以用 `load_adapter()` 载入 LoRA Adapters。这对你尝试不同的 adapter 非常管用,因为它还不会将参数融合进原模型。你可以使用 `set_adapter()` 指定哪个 LoRA Adapter 在生效。如果想返回原模型,你可以使用 `unload()` 卸载所有 LoRA 参数。这种设定使得在不同任务领域间切换模型变得非常容易。 ```python from transformers import AutoModelForCausalLM @@ -23,27 +23,25 @@ model = PeftModel.from_pretrained(base_model, peft_model_id) ![lora_load_adapter](./images/lora_adapter.png) -## Merging LoRA Adapters +## 将 LoRA 参数融入原模型 -After training with LoRA, you might want to merge the adapter weights back into the base model for easier deployment. This creates a single model with the combined weights, eliminating the need to load adapters separately during inference. +如果在 LoRA 微调结束后,你想直接获取一套新的模型参数,而不是每次使用的时候都需要加载 LoRA 的 Adapter,你可以直接将 LoRA 的参数融入原模型中。 -The merging process requires attention to memory management and precision. Since you'll need to load both the base model and adapter weights simultaneously, ensure sufficient GPU/CPU memory is available. Using `device_map="auto"` in `transformers` will help with automatic memory management. Maintain consistent precision (e.g., float16) throughout the process, matching the precision used during training and saving the merged model in the same format for deployment. Before deploying, always validate the merged model by comparing its outputs and performance metrics with the adapter-based version. +融合的时候,我们首先需要注意内存的管理以及参数的精度。因为我们要同时载入原模型和 LoRA 参数,需要注意 GPU 或 CPU 的内存是否够用。在 `transformers` 中使用 `device_map="auto"` 可以替我们自动进行内存管理。同时,要注意原模型、LoRA 参数的精度需保持一致。融合后,检查模型输出是否和未融合是一致也很重要。 -Adapters are also be convenient for switching between different tasks or domains. You can load the base model and adapter weights separately. This allows for quick switching between different task-specific weights. +## 代码实现 -## Implementation Guide +在 `notebooks/` 目录下,有 PEFT 相关方法的实践教程以及练习题。我们首先会在 `load_lora_adapter_example.ipynb` 学习加载 LoRA Adapter 相关的内容,然后在 `lora_finetuning.ipynb` 中,我们将学习如果用 LoRA 进行 SFT。 -The `notebooks/` directory contains practical tutorials and exercises for implementing different PEFT methods. Begin with `load_lora_adapter_example.ipynb` for a basic introduction, then explore `lora_finetuning.ipynb` for a more detailed look at how to fine-tune a model with LoRA and SFT. - -When implementing PEFT methods, start with small rank values (4-8) for LoRA and monitor training loss. Use validation sets to prevent overfitting and compare results with full fine-tuning baselines when possible. The effectiveness of different methods can vary by task, so experimentation is key. +一个比较好的 LoRA 训练流程应该是,首先从较低的秩开始,一般是 4 到 8,同时观察训练损失值。使用验证集及时查看避免过拟合。不同任务可能有较大差异,所有还是要以实际实验现象为准。 ## OLoRA -[OLoRA](https://arxiv.org/abs/2406.01775) utilizes QR decomposition to initialize the LoRA adapters. OLoRA translates the base weights of the model by a factor of their QR decompositions, i.e., it mutates the weights before performing any training on them. This approach significantly improves stability, accelerates convergence speed, and ultimately achieves superior performance. +[OLoRA](https://arxiv.org/abs/2406.01775) 使用 QR 分解来初始化 LoRA 的 Adapter。该算法对原有的参数矩阵 W 分解为 Q 和 R 两个矩阵,其中 Q 矩阵包含 W 矩阵的 r 个正交向量,使得优化能够在一个较好的子空间进行。这样可以很大地提升收敛速度,同时也达到了非常好的效果。 -## Using TRL with PEFT +## TRL 与 PEFT 结合使用 -PEFT methods can be combined with TRL (Transformers Reinforcement Learning) for efficient fine-tuning. This integration is particularly useful for RLHF (Reinforcement Learning from Human Feedback) as it reduces memory requirements. +PEFT 也可以和 TRL 库一起使用,这对 RLHF(Reinforcement Learning from Human Feedback)尤其实用。 ```python from peft import LoraConfig @@ -67,11 +65,14 @@ model = AutoModelForCausalLM.from_pretrained( ) ``` -Above, we used `device_map="auto"` to automatically assign the model to the correct device. You can also manually assign the model to a specific device using `device_map={"": device_index}`. You could also scale training across multiple GPUs while keeping memory usage efficient. +在上述代码中,我们用 `device_map="auto"` 自动分配模型到正确的计算设备上。关于具体计算设备,你也可以手动修改:`device_map={"": device_index}`;当然你也可以扩大训练规模,如实用多 GPU 训练等。 + +## 基本的参数融合实现 + -## Basic Merging Implementation -After training a LoRA adapter, you can merge the adapter weights back into the base model. Here's how to do it: +训练好 LoRA adapter 后,将权重融合回原模型的方法如下: + ```python import torch @@ -103,7 +104,7 @@ except RuntimeError as e: merged_model.save_pretrained("path/to/save/merged_model") ``` -If you encounter size discrepancies in the saved model, ensure you're also saving the tokenizer: +保存的时候,你可能也需要保存 tokenizer 到相应目录。 ```python # Save both model and tokenizer @@ -112,13 +113,14 @@ merged_model.save_pretrained("path/to/save/merged_model") tokenizer.save_pretrained("path/to/save/merged_model") ``` -## Next Steps +## 接下来 + +⏩ 继续学习 [Prompt Tuning](prompt_tuning_cn.md),了解这种微调方式如何运作。 -⏩ Move on to the [Prompt Tuning](prompt_tuning.md) guide to learn how to fine-tune a model with prompt tuning. -⏩ Move on the [Load LoRA Adapters Tutorial](./notebooks/load_lora_adapter.ipynb) to learn how to load LoRA adapters. +⏩ 实践 [加载 LoRA Adapters 的教程](./notebooks/load_lora_adapter_cn.ipynb) 练习加载 LoRA adapters。 -# Resources +# 学习资源 - [LORA: LOW-RANK ADAPTATION OF LARGE LANGUAGE MODELS](https://arxiv.org/pdf/2106.09685) -- [PEFT Documentation](https://huggingface.co/docs/peft) -- [Hugging Face blog post on PEFT](https://huggingface.co/blog/peft) +- [PEFT 官方文档](https://huggingface.co/docs/peft) +- [Hugging Face 有关 PEFT 的博客](https://huggingface.co/blog/zh/peft) diff --git a/3_parameter_efficient_finetuning/notebooks/finetune_sft_peft.ipynb b/3_parameter_efficient_finetuning/notebooks/finetune_sft_peft.ipynb index 9028d477..4e0e0bd2 100644 --- a/3_parameter_efficient_finetuning/notebooks/finetune_sft_peft.ipynb +++ b/3_parameter_efficient_finetuning/notebooks/finetune_sft_peft.ipynb @@ -6,30 +6,37 @@ "id": "z-6LLOPZouLg" }, "source": [ - "# How to Fine-Tune LLMs with LoRA Adapters using Hugging Face TRL\n", - "\n", - "This notebook demonstrates how to efficiently fine-tune large language models using LoRA (Low-Rank Adaptation) adapters. LoRA is a parameter-efficient fine-tuning technique that:\n", - "- Freezes the pre-trained model weights\n", - "- Adds small trainable rank decomposition matrices to attention layers\n", - "- Typically reduces trainable parameters by ~90%\n", - "- Maintains model performance while being memory efficient\n", - "\n", - "We'll cover:\n", - "1. Setup development environment and LoRA configuration\n", - "2. Create and prepare the dataset for adapter training\n", - "3. Fine-tune using `trl` and `SFTTrainer` with LoRA adapters\n", - "4. Test the model and merge adapters (optional)\n" + "# 在 TRL 框架下用 LoRA 微调大语言模型\n", + "\n", + "这个 notebook 展示如何用 LoRA 高效微调大语言模型。LoRA 是一种高效的参数微调方法,有如下优点:\n", + "- 不更新预训练模型权重\n", + "- 仅在注意力层添加少量低秩分解矩阵作为训练参数\n", + "- 基本能减少 90% 训练参数\n", + "- 能保留模型原有的能力\n", + "\n", + "本文涵盖这些步骤:\n", + "1. 配置开发环境、设定 LoRA 相关配置\n", + "2. 准备数据集\n", + "3. 使用 `trl` 框架下的 `SFTTrainer` 进行 LoRA 微调\n", + "4. 测试模型性能、学习加载 adapter\n" ] }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, { "cell_type": "markdown", "metadata": { "id": "fXqd9BXgouLi" }, "source": [ - "## 1. Setup development environment\n", + "## 1. 配置开发环境\n", "\n", - "Our first step is to install Hugging Face Libraries and Pytorch, including trl, transformers and datasets. If you haven't heard of trl yet, don't worry. It is a new library on top of transformers and datasets, which makes it easier to fine-tune, rlhf, align open LLMs.\n" + "我们首先需要安装 PyTorch 和 Hugging Face 相关的库,这包括 `trl`、`transformers`、`datasets`。其中 `trl` 基于 `transformers` 和 `datasets`,用以微调模型、进行 RLHF、对齐 LLM 等。" ] }, { @@ -58,7 +65,7 @@ "id": "XHUzfwpKouLk" }, "source": [ - "## 2. Load the dataset" + "## 2. 载入数据集" ] }, { @@ -92,7 +99,7 @@ "# Load a sample dataset\n", "from datasets import load_dataset\n", "\n", - "# TODO: define your dataset and config using the path and name parameters\n", + "# TODO: 你也可以用自己的数据集\n", "dataset = load_dataset(path=\"HuggingFaceTB/smoltalk\", name=\"everyday-conversations\")\n", "dataset" ] @@ -103,27 +110,26 @@ "id": "9TOhJdtsouLk" }, "source": [ - "## 3. Fine-tune LLM using `trl` and the `SFTTrainer` with LoRA\n", - "\n", - "The [SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) from `trl` provides integration with LoRA adapters through the [PEFT](https://huggingface.co/docs/peft/en/index) library. Key advantages of this setup include:\n", - "\n", - "1. **Memory Efficiency**: \n", - " - Only adapter parameters are stored in GPU memory\n", - " - Base model weights remain frozen and can be loaded in lower precision\n", - " - Enables fine-tuning of large models on consumer GPUs\n", - "\n", - "2. **Training Features**:\n", - " - Native PEFT/LoRA integration with minimal setup\n", - " - Support for QLoRA (Quantized LoRA) for even better memory efficiency\n", - "\n", - "3. **Adapter Management**:\n", - " - Adapter weight saving during checkpoints\n", - " - Features to merge adapters back into base model\n", - "\n", - "We'll use LoRA in our example, which combines LoRA with 4-bit quantization to further reduce memory usage without sacrificing performance. The setup requires just a few configuration steps:\n", - "1. Define the LoRA configuration (rank, alpha, dropout)\n", - "2. Create the SFTTrainer with PEFT config\n", - "3. Train and save the adapter weights\n" + "## 3. 在 `trl` 框架下用 `SFTTrainer` 实现大语言模型的 LoRA 微调\n", + "\n", + "在 `trl` 中,[SFTTrainer](https://huggingface.co/docs/trl/sft_trainer) 通过 [PEFT](https://huggingface.co/docs/peft/en/index) 提供了 LoRA Adapter 的集成。这样的设定有以下几个好处:\n", + "\n", + "1. **高效利用内存**:\n", + " - 仅 Adapter 的参数会保存在 GPU 显存中\n", + " - 原模型参数被冻结,所以可以用低精度载入\n", + " - 这使得在消费级显卡上也可以微调大模型\n", + "2. **训练层面**:\n", + " - 原生 PEFT/LoRA 集成,用户开发所需代码量少\n", + " - 支持 QLoRA(量化版 LoRA),可以进一步减少内存使用\n", + "\n", + "3. **Adapter 管理**:\n", + " - 可以方便地训练过程中保存 Adapter\n", + " - 可以方便地把 Adapter 融合进原模型\n", + "\n", + "本文将会进行 LoRA 微调,当然你也可以尝试 4-bit 量化来进一步减少内存使用。配置步骤包含以下几步:\n", + "1. 定义好 LoRA 的相关参数(主要是 rank、alpha、dropout)\n", + "2. 创建一个 SFTTrainer 的实例\n", + "3. 训练模型、保存 adapter 的参数" ] }, { @@ -166,15 +172,15 @@ "id": "ZbuVArTHouLk" }, "source": [ - "The `SFTTrainer`  supports a native integration with `peft`, which makes it super easy to efficiently tune LLMs using, e.g. LoRA. We only need to create our `LoraConfig` and provide it to the trainer.\n", + "由于 `SFTTrainer`  原生支持 `peft`,使用 LoRA 训练 LLM 就变得非常简单。我们需要配置的只有 `LoraConfig`。\n", "\n", "
\n", - "

Exercise: Define LoRA parameters for finetuning

\n", - "

Take a dataset from the Hugging Face hub and finetune a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the general parameters for an abitrary finetune

\n", - "

🐕 Adjust the parameters and review in weights & biases.

\n", - "

🦁 Adjust the parameters and show change in inference results.

\n", + "

练习:为微调定义好 LoRA 相关参数

\n", + "

从 Hugging Face hub 找一个合适的数据然后微调模型

\n", + "

难度等级

\n", + "

🐢 使用默认的 LoRA 超参数直接微调

\n", + "

🐕 改变一些超参数,学习通过 weights & biases 平台查看

\n", + "

🦁 改变一些超参数,训练后查看这些改变是否影响了推理性能

\n", "
" ] }, @@ -188,20 +194,20 @@ "source": [ "from peft import LoraConfig\n", "\n", - "# TODO: Configure LoRA parameters\n", - "# r: rank dimension for LoRA update matrices (smaller = more compression)\n", + "# TODO: 配置 LoRA 参数\n", + "# r: 低秩分解的秩,越小则需要训练的参数量越少\n", "rank_dimension = 6\n", - "# lora_alpha: scaling factor for LoRA layers (higher = stronger adaptation)\n", + "# lora_alpha: 将训练好的参数加到原模型上时的缩放倍数,越大则微调参数作用越明显\n", "lora_alpha = 8\n", - "# lora_dropout: dropout probability for LoRA layers (helps prevent overfitting)\n", + "# lora_dropout: LoRA 相关层的 dropout,可以用来应对过拟合\n", "lora_dropout = 0.05\n", "\n", "peft_config = LoraConfig(\n", - " r=rank_dimension, # Rank dimension - typically between 4-32\n", - " lora_alpha=lora_alpha, # LoRA scaling factor - typically 2x rank\n", + " r=rank_dimension, # 一般选择 4 到 32\n", + " lora_alpha=lora_alpha, # 一般是 rank 的 2 倍\n", " lora_dropout=lora_dropout, # Dropout probability for LoRA layers\n", " bias=\"none\", # Bias type for LoRA. the corresponding biases will be updated during training.\n", - " target_modules=\"all-linear\", # Which modules to apply LoRA to\n", + " target_modules=\"all-linear\", # 模型哪些层会添加 LoRA Adapter\n", " task_type=\"CAUSAL_LM\", # Task type for model architecture\n", ")" ] @@ -212,7 +218,7 @@ "id": "l5NUDPcaouLl" }, "source": [ - "Before we can start our training we need to define the hyperparameters (`TrainingArguments`) we want to use." + "此外我们还需定义训练的超参数(`TrainingArguments`)。" ] }, { @@ -249,7 +255,7 @@ " bf16=True, # Use bfloat16 precision\n", " # Integration settings\n", " push_to_hub=False, # Don't push to HuggingFace Hub\n", - " report_to=\"none\", # Disable external logging\n", + " report_to=None, # Disable external logging\n", ")" ] }, @@ -259,7 +265,7 @@ "id": "cGhR7uFBouLl" }, "source": [ - "We now have every building block we need to create our `SFTTrainer` to start then training our model." + "配置完毕,我们可以创建 `SFTTrainer` 来训练模型了。" ] }, { @@ -294,7 +300,7 @@ "id": "zQ_kRN24ouLl" }, "source": [ - "Start training our model by calling the `train()` method on our `Trainer` instance. This will start the training loop and train our model for 3 epochs. Since we are using a PEFT method, we will only save the adapted model weights and not the full model." + "通过启动 `train()` 函数,我们开始训练。本次训练包含 3 个 epoch。由于我们用了 PEFT,我们可以在训练过程中或结束后,只保存 adapter 的参数,无需保存原模型参数。" ] }, { @@ -343,8 +349,7 @@ "id": "y4HHSYYzouLl" }, "source": [ - "The training with Flash Attention for 3 epochs with a dataset of 15k samples took 4:14:36 on a `g5.2xlarge`. The instance costs `1.21$/h` which brings us to a total cost of only ~`5.3$`.\n", - "\n" + "我们模型使用了 Flash Attention 加速训练。在当前数据集(15k 的样本量)训练了 3 轮,在一个 `g5.2xlarge` 机器上用了 4 小时 14 分钟 36 秒。该机器报价 `1.21$/h`,所以我们总花费仅 `5.3$`。\n" ] }, { @@ -353,13 +358,12 @@ "id": "C309KsXjouLl" }, "source": [ - "### Merge LoRA Adapter into the Original Model\n", + "### 将 LoRA Adapter 融入原模型\n", "\n", - "When using LoRA, we only train adapter weights while keeping the base model frozen. During training, we save only these lightweight adapter weights (~2-10MB) rather than a full model copy. However, for deployment, you might want to merge the adapters back into the base model for:\n", - "\n", - "1. **Simplified Deployment**: Single model file instead of base model + adapters\n", - "2. **Inference Speed**: No adapter computation overhead\n", - "3. **Framework Compatibility**: Better compatibility with serving frameworks\n" + "训练 LoRA 时,我们只训练 adapter 里的参数,而不训练原模型。所以保存的参数也只有 adapter 里的参数(可能也就 2MB 到 10MB)。然而在部署阶段,你可能需要把 Adapter 融合进原模型:\n", + "1. **简化的部署流程**:仅载入一个模型参数文件即可,无需额外载入 adapter 参数文件\n", + "2. **推理速度提升**:adapter 引入的计算已经融合进了模型中\n", + "3. **框架的兼容性**:更能和服务框架适配" ] }, { @@ -391,20 +395,9 @@ "id": "-yO6E9quouLl" }, "source": [ - "## 3. Test Model and run Inference\n", + "## 3. 测试模型、进行推理\n", "\n", - "After the training is done we want to test our model. We will load different samples from the original dataset and evaluate the model on those samples, using a simple loop and accuracy as our metric.\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "
\n", - "

Bonus Exercise: Load LoRA Adapter

\n", - "

Use what you learnt from the ecample note book to load your trained LoRA adapter for inference.

\n", - "
" + "训练结束后,我们可能需要测试模型。可以从数据集找一些样本,然后看看模型在这些样本上的性能。" ] }, { @@ -449,7 +442,7 @@ "id": "99uFDAuuouLl" }, "source": [ - "Lets test some prompt samples and see how the model performs." + "现在我们找些样本来测试:" ] }, { @@ -486,6 +479,16 @@ " print(f\" response:\\n{test_inference(prompt)}\")\n", " print(\"-\" * 50)" ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "

额外练习:载入自己训练的 LoRA Adapter

\n", + "

结束本教程后,你可以使用学到的技术,自己训练一个 LoRA,然后载入 LoRA Adapter

\n", + "
" + ] } ], "metadata": { diff --git a/3_parameter_efficient_finetuning/notebooks/load_lora_adapter.ipynb b/3_parameter_efficient_finetuning/notebooks/load_lora_adapter.ipynb index b288a4a1..b879831b 100644 --- a/3_parameter_efficient_finetuning/notebooks/load_lora_adapter.ipynb +++ b/3_parameter_efficient_finetuning/notebooks/load_lora_adapter.ipynb @@ -156,7 +156,7 @@ "id": "S65GcxNGA9kz" }, "source": [ - "## Load adapters from the Hub\n", + "## 从 HuggingFace Hub 加载 adapters\n", "\n" ] }, @@ -214,7 +214,7 @@ ")\n", "tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)\n", "\n", - "# Load the Lora model\n", + "# 载入 Lora \n", "model = PeftModel.from_pretrained(model, peft_model_id)" ] }, diff --git a/3_parameter_efficient_finetuning/prompt_tuning.md b/3_parameter_efficient_finetuning/prompt_tuning.md index 66926167..2372354c 100644 --- a/3_parameter_efficient_finetuning/prompt_tuning.md +++ b/3_parameter_efficient_finetuning/prompt_tuning.md @@ -1,22 +1,22 @@ # Prompt Tuning -Prompt tuning is a parameter-efficient approach that modifies input representations rather than model weights. Unlike traditional fine-tuning that updates all model parameters, prompt tuning adds and optimizes a small set of trainable tokens while keeping the base model frozen. +Prompt tuning 也是一个高效的微调手段,不同于微调模型参数,它改变的是输入模型的表征。具体来说,prompt tuning 在训练前会添加几个额外的 token,训练过程中,token 的 embedding 被更新,而模型参数一直保持不变。 -## Understanding Prompt Tuning +## 理解 Prompt Tuning -Prompt tuning is a parameter-efficient alternative to model fine-tuning that prepends trainable continuous vectors (soft prompts) to the input text. Unlike discrete text prompts, these soft prompts are learned through backpropagation while keeping the language model frozen. The method was introduced in ["The Power of Scale for Parameter-Efficient Prompt Tuning"](https://arxiv.org/abs/2104.08691) (Lester et al., 2021), which demonstrated that prompt tuning becomes more competitive with model fine-tuning as model size increases. Within the paper, at around 10 billion parameters, prompt tuning matches the performance of model fine-tuning while only modifying a few hundred parameters per task. +Prompt tuning 通常把可以训练的连续向量(也称为soft prompts)接在输入文本的前面。与离散的文本提示词不同,这些 soft prompts 是通过训练过程中经反向传播更新的,而语言模型则在训练中不变。该方法在 [The Power of Scale for Parameter-Efficient Prompt Tuning](https://arxiv.org/abs/2104.08691) 中提出,展示了其在模型尺寸增大时的有效性:当模型尺寸在 10B 参数量左右时,prompt tuning 只更新 soft prompts 的几百个参数,即可达到模型全量微调的效果。 -These soft prompts are continuous vectors in the model's embedding space that get optimized during training. Unlike traditional discrete prompts that use natural language tokens, soft prompts have no inherent meaning but learn to elicit the desired behavior from the frozen model through gradient descent. The technique is particularly effective for multi-task scenarios since each task requires storing only a small prompt vector (typically a few hundred parameters) rather than a full model copy. This approach not only maintains a minimal memory footprint but also enables rapid task switching by simply swapping prompt vectors without any model reloading. +Soft prompts 是模型的 embedding 空间中的一些连续数值的向量,它们会在微调过程中被更新。传统的 prompts 是一些离散的 tokens,在自然语言层面代表某些语义信息;soft prompt 则没有这些内在含义,经过参数更新后,它被用来从模型中引出一些特定行为。这种方法在多任务领域尤其有效,因为每个任务仅需保存一个 soft prompt 的向量(通常仅几百个参数),而不是对整个模型复制。这种做法不仅现存占用少,而且还能支持快速的任务切换,无需模型重新加载。 -## Training Process +## 训练过程 -Soft prompts typically number between 8 and 32 tokens and can be initialized either randomly or from existing text. The initialization method plays a crucial role in the training process, with text-based initialization often performing better than random initialization. +Soft prompts 一般包含 8 到 32 个 tokens,它的初始化可以是随机初始化,也可以是来自现有的文本。初始化过程很影响训练,后者的方法通常效果更好。 -During training, only the prompt parameters are updated while the base model remains frozen. This focused approach uses standard training objectives but requires careful attention to the learning rate and gradient behavior of the prompt tokens. +训练过程中,仅这些 soft prompts 的参数会被更新;训练用的损失函数也没有变化;但学习率需要我们认真调节,同时也建议观察 soft prompts 上面的梯度信息,以免训练失败。 -## Implementation with PEFT +## 基于 PEFT 的代码实现 -The PEFT library makes implementing prompt tuning straightforward. Here's a basic example: +使用 PEFT 库实现 prompt tuning 非常简单直接,以下是一个简单的例子: ```python from peft import PromptTuningConfig, TaskType, get_peft_model @@ -29,8 +29,8 @@ tokenizer = AutoTokenizer.from_pretrained("your-base-model") # Configure prompt tuning peft_config = PromptTuningConfig( task_type=TaskType.CAUSAL_LM, - num_virtual_tokens=8, # Number of trainable tokens - prompt_tuning_init="TEXT", # Initialize from text + num_virtual_tokens=8, # soft prompts 的 token 数量 + prompt_tuning_init="TEXT", # 初始化方法为从文本初始化 prompt_tuning_init_text="Classify if this text is positive or negative:", tokenizer_name_or_path="your-base-model", ) @@ -39,36 +39,36 @@ peft_config = PromptTuningConfig( model = get_peft_model(model, peft_config) ``` -## Comparison to Other Methods +## 与其它方法的对比 -When compared to other PEFT approaches, prompt tuning stands out for its efficiency. While LoRA offers low parameter counts and memory usage but requires loading adapters for task switching, prompt tuning achieves even lower resource usage and enables immediate task switching. Full fine-tuning, in contrast, demands significant resources and requires separate model copies for different tasks. +对比与其它 PEFT 方法,prompt tuning 胜在它的高效性。虽然 LoRA 也减少了训练参数以及所需显存,但反复加载 adapter 来切换任务就很麻烦。Prompt tuning 需要的训练参数更少,且任务切换更加方便。而全量参数微调则既需要超大的训练资源,也需要通过全量参数重新载入来切换任务。 -| Method | Parameters | Memory | Task Switching | +| 方法 | 训练参数量 | 显存需求 | 任务切换难度 | |--------|------------|---------|----------------| -| Prompt Tuning | Very Low | Minimal | Easy | -| LoRA | Low | Low | Requires Loading | -| Full Fine-tuning | High | High | New Model Copy | +| Prompt Tuning | 很低 | 很低 | 非常简单 | +| LoRA | 低 | 低 | 需要加载 Adapter | +| 全量参数微调 | 很高 | 很高 | 加载所有参数 | -When implementing prompt tuning, start with a small number of virtual tokens (8-16) and increase only if the task complexity demands it. Text initialization typically yields better results than random initialization, especially when using task-relevant text. The initialization strategy should reflect the complexity of your target task. +在实际 prompt tuning 训练过程中,建议先从较小的 virtual tokens 数量开始(如 8 到 16 个),仅当任务复杂度增加时,再增加 virtual tokens 数量。从文本初始化通常比随机初始化更好,尤其是你使用和任务相关的文本时。初始化方法需要和你的任务复杂度匹配。 -Training requires slightly different considerations than full fine-tuning. Higher learning rates often work well, but careful monitoring of prompt token gradients is essential. Regular validation on diverse examples helps ensure robust performance across different scenarios. +训练过程中,你还需注意学习率。如果使用较大的学习率,你需要时刻观察 soft prompt 更新时的梯度信息,以防训练崩溃。训练过程中,定期的验证也是确保性能的良好手段。 -## Application +## 应用 -Prompt tuning excels in several scenarios: +Prompt tuning 在这些场景下优势明显: -1. Multi-task deployment -2. Resource-constrained environments -3. Rapid task adaptation -4. Privacy-sensitive applications +1. 多任务下的大语言模型部署 +2. 计算资源有限的训练场景 +3. 需要在不同任务间快速切换的场景 +4. 针对隐私敏感的应用场景 -As models get smaller, prompt tuning becomes less competitive compared to full fine-tuning. For example, on models like SmolLM2 scales prompt tuning is less relevant than full fine-tuning. +而当模型变小时,prompt tuning 就没有那么有竞争力了。比如在 SmolLM2 这样的模型尺寸下,prompt tuning 的意义就不大,可能还不如全量微调。 -## Next Steps +## 接下来 -⏭️ Move on to the [LoRA Adapters Tutorial](./notebooks/finetune_sft_peft.ipynb) to learn how to fine-tune a model with LoRA adapters. +⏭️ 学习 [LoRA Adapters 的教程](./notebooks/finetune_sft_peft.ipynb),了解如何实操 LoRA 微调模型的过程。 -## Resources -- [PEFT Documentation](https://huggingface.co/docs/peft) -- [Prompt Tuning Paper](https://arxiv.org/abs/2104.08691) -- [Hugging Face Cookbook](https://huggingface.co/learn/cookbook/prompt_tuning_peft) +## 学习资源 +- [PEFT 官方文档](https://huggingface.co/docs/peft) +- [Prompt Tuning 论文](https://arxiv.org/abs/2104.08691) +- [Hugging Face Cookbook 中关于 prompt tuning 的部分](https://huggingface.co/learn/cookbook/prompt_tuning_peft) diff --git a/4_evaluation/README.md b/4_evaluation/README.md index 8140cd4c..aaa4700a 100644 --- a/4_evaluation/README.md +++ b/4_evaluation/README.md @@ -1,39 +1,40 @@ -# Evaluation +# 评测 -Evaluation is a critical step in developing and deploying language models. It helps us understand how well our models perform across different capabilities and identify areas for improvement. This module covers both standard benchmarks and domain-specific evaluation approaches to comprehensively assess your smol model. +在开发和部署语言模型的过程中,模型评测是一个非常关键的步骤。它有助于我们理解模型在不同方面的能力究竟如何,并找到进一步提示改进的空间。本章将涵盖标准基准测试和特定领域的评估方法,以全面评估你的 smol 模型。 -We'll use [`lighteval`](https://github.com/huggingface/lighteval), a powerful evaluation library developed by Hugging Face that integrates seamlessly with the Hugging Face ecosystem. For a deeper dive into evaluation concepts and best practices, check out the evaluation [guidebook](https://github.com/huggingface/evaluation-guidebook). +我们将使用 [`lighteval`](https://github.com/huggingface/lighteval) 这个强大的评测代码库。它由 HuggingFace 开发,并完美地集成入了 HuggingFace 的生态系统。我们还提供了[指南书籍](https://github.com/huggingface/evaluation-guidebook)以便读者想要深入学习评测的相关概念和最佳实践。 -## Module Overview +## 章节总览 -A thorough evaluation strategy examines multiple aspects of model performance. We assess task-specific capabilities like question answering and summarization to understand how well the model handles different types of problems. We measure output quality through factors like coherence and factual accuracy. Safety evaluation helps identify potential harmful outputs or biases. Finally, domain expertise testing verifies the model's specialized knowledge in your target field. +一个全面的评估策略会检查模型多个方面的性能。我们将评估模型特定领域的能力,比如回答问题、概括总结,来理解模型处理不同问题的能力。我们通过生成的连贯性和事实准确性等因素来衡量输出质量。同时,我们也需要安全评测,防治模型输出有害的信息或带有偏见的观点。最后,我们还可以进行特定领域的专业性测试,来确认模型是否在特定领域掌握了专业知识。 -## Contents +## 目录 -### 1️⃣ [Automatic Benchmarks](./automatic_benchmarks.md) +### 1️⃣ [自动化基准测试](./automatic_benchmarks_cn.md) -Learn to evaluate your model using standardized benchmarks and metrics. We'll explore common benchmarks like MMLU and TruthfulQA, understand key evaluation metrics and settings, and cover best practices for reproducible evaluation. +学习如何使用标准的测试基准和指标来评估模型。我们将会学习常见的测试基准,如 MMLU 和 TruthfulQA,理解重要指标和相关配置,同时为可复现的评估结果提供最佳实践。 +### 2️⃣ [自定义领域的评测](./custom_evaluation_cn.md) -### 2️⃣ [Custom Domain Evaluation](./custom_evaluation.md) -Discover how to create evaluation pipelines tailored to your specific use case. We'll walk through designing custom evaluation tasks, implementing specialized metrics, and building evaluation datasets that match your requirements. +学习怎样为你的特定任务领域量身定做评估流程。我们将学习设计自定义评估任务、代码实现特定的指标,以及构建符合你要求的评估数据集。 -### 3️⃣ [Domain Evaluation Project](./project/README.md) -Follow a complete example of building a domain-specific evaluation pipeline. You'll learn to generate evaluation datasets, use Argilla for data annotation, create standardized datasets, and evaluate models using LightEval. +### 3️⃣ [领域评估的项目示例](./project/README_CN.md) -### Exercise Notebooks +通过一个完整的例子,学习构建特定领域的评测流程。这包含:生成评测数据集、使用 Argilla 平台进行数据标注、构建标准化的数据集、用 LightEval 评测模型。 -| Title | Description | Exercise | Link | Colab | +### Notebook 练习 + +| 标题 | 简述 | 习题 | 链接 | Colab | |-------|-------------|----------|------|-------| -| Evaluate and Analyze Your LLM | Learn how to use LightEval to evaluate and compare models on specific domains | 🐢 Use medical domain tasks to evaluate a model
🐕 Create a new domain evaluation with different MMLU tasks
🦁 Create a custom evaluation task for your domain | [Notebook](./notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb) | Open In Colab | +| 评测并分析你的大语言模型 | 学习使用 LightEval 在特定领域评测、比较模型 | 🐢 使用医学相关领域的任务评估模型
🐕 Create a new domain evaluation with different MMLU tasks
🦁 为你的特定任务领域创建一个自定义的评测任务 | [Notebook](./notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb) | Open In Colab | ## Resources -- [Evaluation Guidebook](https://github.com/huggingface/evaluation-guidebook) - Comprehensive guide to LLM evaluation -- [LightEval Documentation](https://github.com/huggingface/lighteval) - Official docs for the LightEval library -- [Argilla Documentation](https://docs.argilla.io) - Learn about the Argilla annotation platform -- [MMLU Paper](https://arxiv.org/abs/2009.03300) - Paper describing the MMLU benchmark -- [Creating a Custom Task](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) -- [Creating a Custom Metric](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) -- [Using existing metrics](https://github.com/huggingface/lighteval/wiki/Metric-List) \ No newline at end of file +- [评测的指南书籍](https://github.com/huggingface/evaluation-guidebook) - 大语言模型评测领域的全面指南 +- [LightEval 文档](https://github.com/huggingface/lighteval) - LightEval 官方文档 +- [Argilla 文档](https://docs.argilla.io) - 了解 Argilla 标注平台 +- [MMLU 论文](https://arxiv.org/abs/2009.03300) - 关于 MMLU 测评基准的论文 +- [LightEval 如何添加自定义任务](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) +- [LightEval 如何添加自定义测评指标](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) +- [LightEval 如何使用现有的测评指标](https://github.com/huggingface/lighteval/wiki/Metric-List) \ No newline at end of file diff --git a/4_evaluation/automatic_benchmarks.md b/4_evaluation/automatic_benchmarks.md index 6a980fa4..9d63a003 100644 --- a/4_evaluation/automatic_benchmarks.md +++ b/4_evaluation/automatic_benchmarks.md @@ -1,76 +1,81 @@ -# Automatic Benchmarks +# 自动化基准测试 -Automatic benchmarks serve as standardized tools for evaluating language models across different tasks and capabilities. While they provide a useful starting point for understanding model performance, it's important to recognize that they represent only one piece of a comprehensive evaluation strategy. +自动化基准测试可以作为一个标准化的工具,来衡量语言模型在不同任务上的能力。不过,虽然它可以有效地用来了解模型当前的性能,但我们还要认识到,这些评测结果仅仅是模型全面评估的一小部分,不能完全反映模型性能。 -## Understanding Automatic Benchmarks +## 理解自动化基准评测 -Automatic benchmarks typically consist of curated datasets with predefined tasks and evaluation metrics. These benchmarks aim to assess various aspects of model capability, from basic language understanding to complex reasoning. The key advantage of using automatic benchmarks is their standardization - they allow for consistent comparison across different models and provide reproducible results. +自动化基准评测一般是在定义好的领域和评测指标下,对特定的数据集进行测试。这种基准测试会评估模型多方面的能力,从最基础的语言理解到复杂的逻辑推理。其最主要的优点还是标准型方面,不同模型都使用相同标准来测试,可以用来对比不同模型的效果,同时测评出的结果也是可复现的。 -However, it's crucial to understand that benchmark performance doesn't always translate directly to real-world effectiveness. A model that excels at academic benchmarks may still struggle with specific domain applications or practical use cases. +然而,我们也需要知道,这种基准测试也不是完全反映模型的现实能力的。比如,一个在学术评测基准上表现优异的模型,也许在其它应用领域或实践层面表现很差。 -## Benchmarks and Their Limitations +## 现有评测基准及其局限性 -### General Knowledge Benchmarks +### 通识层面的评测基准 -MMLU (Massive Multitask Language Understanding) tests knowledge across 57 subjects, from science to humanities. While comprehensive, it may not reflect the depth of expertise needed for specific domains. TruthfulQA evaluates a model's tendency to reproduce common misconceptions, though it can't capture all forms of misinformation. +MMLU(Massive Multitask Language Understanding)这个评测基准,涵盖了从科学到人文共 57 个学科的知识,是一个通识层面的评测基准。但虽然全面,它可能在某些领域的专业深度还不算够。另一方面,TruthfulQA 这个评测基准,涵盖 38 个学科问答,则会评测模型输出的真实性如何。 -### Reasoning Benchmarks -BBH (Big Bench Hard) and GSM8K focus on complex reasoning tasks. BBH tests logical thinking and planning, while GSM8K specifically targets mathematical problem-solving. These benchmarks help assess analytical capabilities but may not capture the nuanced reasoning required in real-world scenarios. +### 推理层面的评测基准 -### Language Understanding -HELM provides a holistic evaluation framework, while WinoGrande tests common sense through pronoun disambiguation. These benchmarks offer insights into language processing capabilities but may not fully represent the complexity of natural conversation or domain-specific terminology. +BBH(Big Bench Hard)和 GSM8K 这两个评测基准重点关注复杂的推理任务。BBH 主要测试逻辑思考和规划能力,GSM8K 则特别关注数学问题的求解。这些评测基准可以用来评测模型的分析问题能力,但人在现实世界中微妙的推理细节可能会被忽略。 -## Alternative Evaluation Approaches +### 语言理解层面的评测基准 -Many organizations have developed alternative evaluation methods to address the limitations of standard benchmarks: +在语言理解层面,HELM 提供了一个全面的评测框架,而 WinoGrande 则通过代词指代的歧义消除,测试模型在常识层面的能力。这些评测基准让我们能深入了解模型在语言处理层面的水平,但缺点是暂未模仿到人与人之间对话的复杂性,同时暂未测评到专业术语。 -### LLM-as-Judge -Using one language model to evaluate another's outputs has become increasingly popular. This approach can provide more nuanced feedback than traditional metrics, though it comes with its own biases and limitations. +## 其它评测方法 -### Evaluation Arenas -Platforms like Anthropic's Constitutional AI Arena allow models to interact and evaluate each other in controlled environments. This can reveal strengths and weaknesses that might not be apparent in traditional benchmarks. +除了上述评基准,很多机构也开发了其它评测方法,以应对标准化基准测试的缺陷: -### Custom Benchmark Suites -Organizations often develop internal benchmark suites tailored to their specific needs and use cases. These might include domain-specific knowledge tests or evaluation scenarios that mirror actual deployment conditions. +### 用大语言模型作为评审 + +用一个大语言模型去评测另一个大语言模型的输出,这种方法最近开始常用起来。相比于传统的评测指标,这种方法可以提供更细致入微的反馈。缺点是作为评审的大语言模型自己也有偏见和局限性,可能导致评测结果不够好。 + +### 在竞技场内相互评测 + +像 Anthropic's Constitutional AI Arena 这样的平台,可以让模型在里面相互互动、评测。这样的评测场景也有助于模型发现各自的强项和弱点。 + +### 自定义评测基准 + +很多组织也会自己开发对内的评测基准,通常是针对特定的需求或应用场景开发的。这样开发出来的评测基准一般包含特定的专业领域知识、反映产品的实际应哟过场景。 ## Creating Your Own Evaluation Strategy -Remember that while LightEval makes it easy to run standard benchmarks, you should also invest time in developing evaluation methods specific to your use case. +虽然使用 LightEval 可以很方便地进行标准化的基准评测,但作为一个 LLM 开发者,你必须也要针对你们产品的应用场景开发自己的评测方案。标准化的基准评测仅仅是一测评开始的第一步,你绝不能只用它进行模型测试。 -While standard benchmarks provide a useful baseline, they shouldn't be your only evaluation method. Here's how to develop a more comprehensive approach: +如何自定义你的方位测评?方法如下: -1. Start with relevant standard benchmarks to establish a baseline and enable comparison with other models. +1. 首先从相关的标准化基准测试中开始,建立一个基准,保证能够和其它模型进行对比。 -2. Identify the specific requirements and challenges of your use case. What tasks will your model actually perform? What kinds of errors would be most problematic? +2. 针对你的应用场景的独特需求,确认你的模型将会应对的挑战。例如,你的模型上线后将会主要执行什么任务?有可能出现哪些问题?哪些 bad case 是最应该避免的? -3. Develop custom evaluation datasets that reflect your actual use case. This might include: - - Real user queries from your domain - - Common edge cases you've encountered - - Examples of particularly challenging scenarios +3. 开发你自己的测试数据集,以便专门应对你的测试场景。这可能包括: + - 在你的特定领域里,真实用户的请求 + - 常见的边缘案例 + - 可能发生的有挑战性的情况 -4. Consider implementing a multi-layered evaluation strategy: - - Automated metrics for quick feedback - - Human evaluation for nuanced understanding - - Domain expert review for specialized applications - - A/B testing in controlled environments +4. 也需要考虑开发一个多层级的评测策略: + - 首先,为了能快速获取反馈,你可以设置一个自动化的评测指标 + - 针对细微的语言理解能力,考虑人工测评 + - 针对专业领域的应用,引入行业专家的评审 + - 在控制变量的环境下,进行 A/B 测试 -## Using LightEval for Benchmarking +## 使用 LightEval 进行基准评测 -LightEval tasks are defined using a specific format: +LightEval 评测任务通过以下格式定义: ``` {suite}|{task}|{num_few_shot}|{auto_reduce} ``` -- **suite**: The benchmark suite (e.g., 'mmlu', 'truthfulqa') -- **task**: Specific task within the suite (e.g., 'abstract_algebra') -- **num_few_shot**: Number of examples to include in prompt (0 for zero-shot) -- **auto_reduce**: Whether to automatically reduce few-shot examples if prompt is too long (0 or 1) +- **suite**:哪一套基准评测方法(比如 'mmlu'、'truthfulqa') +- **task**:这一套基准评测方法中的哪个任务(比如 'abstract_algebra') +- **num_few_shot**:提示词中加入的示例的数量(如果是 0,那就是 zero-shot 测试) +- **auto_reduce**:当提示词太长时,是否自动减少提示词中 few-shot 的样本量 -Example: `"mmlu|abstract_algebra|0|0"` evaluates on MMLU's abstract algebra task with zero-shot inference. +举例来说,`"mmlu|abstract_algebra|0|0"` 就会评测 MMLU 的 abstract algebra 任务,推理是 zero-shot 形式。 -### Example Evaluation Pipeline +### 评测代码示例 -Here's a complete example of setting up and running an evaluation on automatic benchmarks relevant to one specific domain: +以下代码就是一个在某个领域进行自动化评测的示例: ```python from lighteval.tasks import Task, Pipeline @@ -114,7 +119,8 @@ results = pipeline.get_results() pipeline.show_results() ``` -Results are displayed in a tabular format showing: +测评结果会以表格形式打印出来: + ``` | Task |Version|Metric|Value | |Stderr| |----------------------------------------|------:|------|-----:|---|-----:| @@ -124,8 +130,8 @@ Results are displayed in a tabular format showing: |leaderboard:mmlu:high_school_biology:5 | 0|acc |0.1500|± |0.0819| ``` -You can also handle the results in a pandas DataFrame and visualise or represent them as you want. +使用 pandas 的 `DataFrame` 或其它可视化方式呈现结果也是可以的。 -# Next Steps +# 接下来 -⏩ Explore [Custom Domain Evaluation](./custom_evaluation.md) to learn how to create evaluation pipelines tailored to your specific needs +⏩ 学习 [Custom Domain Evaluation](./custom_evaluation_cn.md),了解如何根据你的特定需求创建自定义的评测流程。 diff --git a/4_evaluation/custom_evaluation.md b/4_evaluation/custom_evaluation.md index c82868a7..eb1451f9 100644 --- a/4_evaluation/custom_evaluation.md +++ b/4_evaluation/custom_evaluation.md @@ -1,16 +1,16 @@ -# Custom Domain Evaluation +# 在特定领域进行自定义评测 -While standard benchmarks provide valuable insights, many applications require specialized evaluation approaches tailored to specific domains or use cases. This guide will help you create custom evaluation pipelines that accurately assess your model's performance in your target domain. +虽然标准化的评测基准让我们对模型的性能有了初步的认识,但针对特定的应用场景,我们还需要专门制定评测方法,考察模型在特定领域的表现。本文将带你创建自定义的评测流程,针对你的目标领域对模型进行精准评测。 -## Designing Your Evaluation Strategy +## 设计评测策略 -A successful custom evaluation strategy starts with clear objectives. Consider what specific capabilities your model needs to demonstrate in your domain. This might include technical knowledge, reasoning patterns, or domain-specific formats. Document these requirements carefully - they'll guide both your task design and metric selection. +成功创建自定义评测策略的第一步是确定清晰的目标。你需要考虑在你的特定领域,哪些特殊能力是你的模型需要具备的?这可能涉及技术层面的知识、推理的模式、特定的格式等。你需要认真记录好这些需求,然后参考这些需求去设计测试任务、选择评测指标。 -Your evaluation should test both standard use cases and edge cases. For example, in a medical domain, you might evaluate both common diagnostic scenarios and rare conditions. In financial applications, you might test both routine transactions and complex edge cases involving multiple currencies or special conditions. +测试的样例不仅需要包含标准应用场景,也要考虑边缘场景。举例来说,如果是医学领域,常见的诊断场景和罕见情况都是需要考虑的。在金融领域,除了常规交易,复杂交易(比如设计多种货币或特殊条件的情况)的处理能力也需要被测试到。 -## Implementation with LightEval +## 使用 LightEval 的代码实现 -LightEval provides a flexible framework for implementing custom evaluations. Here's how to create a custom task: +LightEval 是一个非常灵活的框架,可以用来实现自定义的测评任务。下面代码展示了如何创建自定义测试任务: ```python from lighteval.tasks import Task, Doc @@ -34,9 +34,9 @@ class CustomEvalTask(Task): return response.strip() == ref.strip() ``` -## Custom Metrics +## 自定义评价指标 -Domain-specific tasks often require specialized metrics. LightEval provides a flexible framework for creating custom metrics that capture domain-relevant aspects of performance: +特定领域的测试任务通常也需要特殊的评价指标。LightEval 也可以灵活地做到这一点: ```python from aenum import extend_enum @@ -72,7 +72,7 @@ custom_metric_group = SampleLevelMetricGrouping( extend_enum(Metrics, "custom_metric_name", custom_metric_group) ``` -For simpler cases where you only need one metric value per sample: +如果每个样例只有一个指标,代码可以是这样: ```python def simple_metric(predictions: list[str], formatted_doc: Doc, **kwargs) -> bool: @@ -92,41 +92,44 @@ simple_metric_obj = SampleLevelMetric( extend_enum(Metrics, "simple_metric", simple_metric_obj) ``` -You can then use your custom metrics in your evaluation tasks by referencing them in the task configuration. The metrics will be automatically computed across all samples and aggregated according to your specified functions. +实现完代码后,你就可以在你的评测任务中引用这些指标的名称,然后在你的评测任务中使用。这些指标会在测试过程中自动在每个样本上计算,并最终统计数值。 -For more complex metrics, consider: -- Using metadata in your formatted documents to weight or adjust scores -- Implementing custom aggregation functions for corpus-level statistics -- Adding validation checks for your metric inputs -- Documenting edge cases and expected behavior +如果需要使用更复杂的评测指标,你还可以实现这些功能: -For a complete example of custom metrics in action, see our [domain evaluation project](./project/README.md). +- 使用元数据,对不同测试样本的分数进行加权或其它调整 +- 对所有样本的指标进行统计时,你可以实现一个自定义的函数(上述示例中 corpus-level 统计使用了取平均的方法) +- 对输入到你的评测指标函数中的数据进行格式检查 +- 记录边缘场景及其期望的行为 -## Dataset Creation -High-quality evaluation requires carefully curated datasets. Consider these approaches for dataset creation: -1. Expert Annotation: Work with domain experts to create and validate evaluation examples. Tools like [Argilla](https://github.com/argilla-io/argilla) make this process more efficient. +你可以学习本章 [domain evaluation project](./project/README_CN.md) 这个项目课程,真正地动手实践一下自定义测评。 -2. Real-World Data: Collect and anonymize real usage data, ensuring it represents actual deployment scenarios. +## 测试数据集的创建 -3. Synthetic Generation: Use LLMs to generate initial examples, then have experts validate and refine them. +高质量的测评需要高质量的测试数据集。在创建数据集时,需要考虑以下方面: -## Best Practices +1. 专家级的标注:与领域专家一起创建和检验测试样本。你可以用 [Argilla](https://github.com/argilla-io/argilla) 高效地进行标注。 -- Document your evaluation methodology thoroughly, including any assumptions or limitations -- Include diverse test cases that cover different aspects of your domain -- Consider both automated metrics and human evaluation where appropriate -- Version control your evaluation datasets and code -- Regularly update your evaluation suite as you discover new edge cases or requirements +2. 真实世界的数据:收集真实数据并进行脱敏,确保这些样本能代表真实部署模型的场景。 -## References +3. 借助合成数据:使用 LLM 生成一些初始样本,然后让领域专家检查、修改。这样可以助你快速创建数据集。 -- [LightEval Custom Task Guide](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) -- [LightEval Custom Metrics](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) -- [Argilla Documentation](https://docs.argilla.io) for dataset annotation -- [Evaluation Guidebook](https://github.com/huggingface/evaluation-guidebook) for general evaluation principles +## 最佳实践 -# Next Steps +- 全面记录你的测试方法,包括各种假设和局限 +- 保证测试样本的多样性,确保你的领域内各个方面都能被测试到 +- 如有需要,自动化的测试指标和人工评测都要用上 +- 对测评数据集和代码进行版本控制 +- 定期更新你的评测流程,不断加入新的边缘场景、完善新的需求 -⏩ For a complete example of implementing these concepts, see our [domain evaluation project](./project/README.md). \ No newline at end of file +## 参考资料 + +- [LightEval 如何添加自定义任务](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) +- [LightEval 如何添加自定义测评指标](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) +- [Argilla 文档](https://docs.argilla.io) 可以用来进行数据标注 +- [评测的指南书籍](https://github.com/huggingface/evaluation-guidebook) 大语言模型评测领域的全面指南 +- +# 接下来 + +⏩ 完整的自定义测评请见本章 [domain evaluation project](./project/README_CN.md)。 \ No newline at end of file diff --git a/4_evaluation/notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb b/4_evaluation/notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb index 9597634b..592b84ce 100644 --- a/4_evaluation/notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb +++ b/4_evaluation/notebooks/lighteval_evaluate_and_analyse_your_LLM.ipynb @@ -6,24 +6,24 @@ "id": "TZVw9f5QYWPL" }, "source": [ - "# lighteval is your AI evaluation library\n", + "# 使用 lighteval 作为你的 AI 测评框架\n", "\n", - "This notebook explores how you can use lighteval to evaluate and compare LLMs.\n", + "本文将带你实践如何使用 lighteval 测评大语言模型,并与其它模型进行对比。\n", "\n", - "`lighteval` has been around a while and it's a great tool for getting eval score on major benchmarks. It's just been refactored to support being used like a library in Python, which makes it great for comparing models across benchmarks.\n", + "使用 `lighteval` 可以很方便地在主流评测基准上计算评测分数。最近,`lighteval` 被进行了一次重构,以便在 Python 中最为一个库进行使用。\n", "\n", - "So let's dig in to some eval scores.\n", + "接下来,我们将试着计算一些分数。\n", "\n", "
\n", - "

Exercise: Evaluate Your Own Model

\n", - "

Now that you've seen how to evaluate models on specific domains, try evaluating a model on a domain that interests you.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the existing medical domain tasks but evaluate a different model from the Hugging Face hub

\n", - "

🐕 Create a new domain evaluation by selecting different MMLU tasks (e.g., computer science, mathematics, physics)

\n", - "

🦁 Create a custom evaluation task using LightEval's task framework and evaluate models on your specific domain

\n", + "

练习:评测你的模型

\n", + "

基于前面的学习,你可以试着选一个你感兴趣的领域,在上面测试模型。

\n", + "

难度等级

\n", + "

🐢 从 Hugging Face hub 另找一个模型,在已有的医学领域测试模型

\n", + "

🐕 选择一个不同的 MMLU 任务(如 comupter science、mathematics、physcics),进行模型测试

\n", + "

🦁 用 LightEval 自定义评测任务,测试你选的模型

\n", "
\n", "\n", - "## Install dependencies" + "## 安装依赖" ] }, { @@ -63,17 +63,17 @@ "id": "TDKs5ShvXw8K" }, "source": [ - "## Setup `lighteval` evaluation\n", + "## 配置 `lighteval` 评测环境和评测流程\n", "\n", - "We need to setup the evaluation environment and pipeline. Much of this we will disable because we're keeping things in the notebook, but we could also use `push_to_hub` or `push_to_tensorboard`.\n", + "我们首先需要配置 `lighteval` 评测环境和评测流程。注意 `push_to_hub` 和 `push_to_tensorboard` 的用法。\n", "\n", "### `push_to_hub`\n", "\n", - "This is useful if we're evaluating a model and want to persist its evaluation with weights and configuration on the Hugging Face hub.\n", + "如果我们想要把评测结果和模型的配置和权重绑定在一起,我们可以用 `push_to_hub` 将其保存到 Hugging Face hub 上。\n", "\n", "### `push_to_tensorboard`\n", "\n", - "This would be useful if we were building an evaluation tool or script, where we wanted to view results within tensorboard." + "如果我们建立了评测工具或脚本用来可视化结果,我们可以使用 `push_to_tensorboard` 来进行可视化。" ] }, { @@ -206,21 +206,21 @@ "id": "nsNjwzCtltkA" }, "source": [ - "# Compares models with `lighteval`\n", + "# 用 `lighteval` 对比不同模型\n", "\n", - "We are going to compare two small LLMs on a domain. We will use `Qwen2.5-0.5B` and `SmolLM2-360M-Instruct` and we will evaluate them on a medical domain.\n", + "这里我们尝试对比两个模型:`Qwen2.5-0.5B` 和 `SmolLM2-360M-Instruct`。我们选择的领域是医学领域。\n", "\n", - "We can create a domain evaluation from a subset of MMLU evaluations, by defining the evaluation tasks. In lighteval, tasks are described as strings.\n", + "这里我们是用 MMLU 的一个字任务创建评测的。在 lighteval 中,不同测试任务可以用字符串指代,格式如下:\n", "\n", "`{suite}|{task}:{subtask}|{num_few_shot}|{0 or 1 to reduce num_few_shot if prompt is too long}`\n", "\n", - "Therefore, we will pass our list of medicine related tasks like this:\n", + "所以,针对我们选择的医学领域,我们可以传入一系列任务,用逗号分隔开:\n", "\n", "```\n", "\"leaderboard|mmlu:anatomy|5|0,leaderboard|mmlu:professional_medicine|5|0,leaderboard|mmlu:high_school_biology|5|0,leaderboard|mmlu:high_school_chemistry|5|0\"\n", "```\n", "\n", - "Which can be translated to :\n", + "上述字符串代表以下四个任务:\n", "\n", "| Suite | Task | Num Fewshot Example | Limit Fewshots |\n", "|---|---|---|---|\n", @@ -229,7 +229,7 @@ "| leaderboard | mmlu:high_school_biology | 5 | False |\n", "| leaderboard | mmlu:high_school_chemistry | 5 | False |\n", "\n", - "For a full list of lighteval supported tasks. Checkout this page in [the documentation](https://github.com/huggingface/lighteval/wiki/Available-Tasks)." + "完整的任务列表,可以参考[这篇文档](https://github.com/huggingface/lighteval/wiki/Available-Tasks)。" ] }, { @@ -249,7 +249,7 @@ "id": "XwcJklSFX4H6" }, "source": [ - "# Evaluate Qwen2.5 0.5B" + "# 评测 Qwen2.5 0.5B" ] }, { @@ -716,7 +716,7 @@ "id": "CIwCaCxJX_hA" }, "source": [ - "# Evaluate SmolLM 360M" + "# 评测 SmolLM 360M" ] }, { @@ -877,9 +877,9 @@ "id": "0HD8aFwSYGHu" }, "source": [ - "# Visualize Results\n", + "# 可视化结果\n", "\n", - "Now that we have results from the two models we can visualize them side-by-side. We'll keep visualisation simple here, but with this data structure you could represent scores in many ways." + "得到结果后,我们可以可视化一下,并列在一起对比。这里我们使用了 `pandas` 简单地实现了可视化操作,不过你也可以使用不同的可视化方法:" ] }, { @@ -909,13 +909,13 @@ "id": "qJEbQeYDplKX" }, "source": [ - "# 💐 That's it!\n", + "# 💐 就是这些!\n", "\n", - "You have a handy notebook to view model evals. You could use this to:\n", + "本文提供了标准化评测的简单代码,使用这些代码,你还可以:\n", "\n", - "- select the right model for your inference use case\n", - "- evaluate checkpoints during training\n", - "- share model scores" + "- 选择模型进行推理\n", + "- 在训练过程中针对不同阶段保存的模型实时测试\n", + "- 分享你的测试成绩到开源社区" ] }, { @@ -924,9 +924,9 @@ "id": "jWdS38syaipm" }, "source": [ - "🏃Next Steps\n", + "🏃接下来\n", "\n", - "- If you want to go deeper into your evaluation results check out this [notebook](https://github.com/huggingface/evaluation-guidebook/blob/main/contents/examples/comparing_task_formulations.ipynb)\n" + "- 如果你还想深入学习,可以看看这个 [notebook](https://github.com/huggingface/evaluation-guidebook/blob/main/contents/examples/comparing_task_formulations.ipynb)\n" ] } ], diff --git a/4_evaluation/project/README.md b/4_evaluation/project/README.md index 98d32eca..c896e4af 100644 --- a/4_evaluation/project/README.md +++ b/4_evaluation/project/README.md @@ -1,72 +1,70 @@ -# Domain Specific Evaluation with Argilla, Distilabel, and LightEval +# 用 Argilla、Distilabel 和 LightEval 进行特定领域评测 -Most popular benchmarks look at very general capabilities (reasoning, math, code), but have you ever needed to study more specific capabilities? +绝大多数的常用评测基准关注的都是模型非常基本的能力,比如推理、数学、编程等,而忽略了特定的专业领域能力。如何进行专业领域(如金融、法律、医学等)的模型评测呢? -What should you do if you need to evaluate a model on a **custom domain** relevant to your use-cases? (For example, financial, legal, medical use cases) +本教程将会展示**自定义领域**模型测试的完整流程。我们重点关注数据部分。在相关数据收集、标注测试数据方面,我们会使用 [Argilla](https://github.com/argilla-io/argilla)、[Distilabel](https://github.com/argilla-io/distilabel) 和 [LightEval](https://github.com/huggingface/lighteval) 作为工具,生成考试问题相关的数据。 -This tutorial shows you the full pipeline you can follow, from creating relevant data and annotating your samples to evaluating your model on them, with the easy to use [Argilla](https://github.com/argilla-io/argilla), [distilabel](https://github.com/argilla-io/distilabel), and [lighteval](https://github.com/huggingface/lighteval). For our example, we'll focus on generating exam questions from multiple documents. -## Project Structure +## 项目结构 -For our process, we will follow 4 steps, with a script for each: generating a dataset, annotating it, extracting relevant samples for evaluation from it, and actually evaluating models. +本项目包含四份 Python 代码文件。我们分四个步骤完成模型在自定义领域的测评,每份代码对应一个步骤。这四个步骤分别是:数据生成、数据标注、相关测试样本的提取,以及模型评测。 -| Script Name | Description | +| 代码文件 | 概述 | |-------------|-------------| -| generate_dataset.py | Generates exam questions from multiple text documents using a specified language model. | -| annotate_dataset.py | Creates an Argilla dataset for manual annotation of the generated exam questions. | -| create_dataset.py | Processes annotated data from Argilla and creates a Hugging Face dataset. | -| evaluation_task.py | Defines a custom LightEval task for evaluating language models on the exam questions dataset. | +| generate_dataset.py | 使用一个专门的语言模型,从多个文本文档中生成考试问题。 | +| annotate_dataset.py | 用 Argilla 创建一个数据集,手动为生成的考试问题数据进行标注。 | +| create_dataset.py | 处理标注过的数据,并创建对应的 HuggingFace 数据集。 | +| evaluation_task.py | 自定义了一个 LightEval 任务,在前面建立好的测试数据上测试。 | -## Steps +## 步骤 -### 1. Generate Dataset +### 1. 数据集的生成 -The `generate_dataset.py` script uses the distilabel library to generate exam questions based on multiple text documents. It uses the specified model (default: Meta-Llama-3.1-8B-Instruct) to create questions, correct answers, and incrorect answers (known as distractors). You should add you own data samples and you might wish to use a different model. +使用 `generate_dataset.py`,我们可以用 `distilabel` 这个库根据几个文本文档生成一些考试问题。一个特定的模型(这里默认使用 Meta-Llama-3.1-8B-Instruct)被拿来用以生成问题、正确答案和错误答案(错误答案用来作为干扰项)。你需要加入你自己的数据,也可以切换使用别的模型。 -To run the generation: +通过以下命令可以开始生成: ```sh python generate_dataset.py --input_dir path/to/your/documents --model_id your_model_id --output_path output_directory ``` -This will create a [Distiset](https://distilabel.argilla.io/dev/sections/how_to_guides/advanced/distiset/) containing the generated exam questions for all documents in the input directory. +代码运行中将会创建一个 [Distiset](https://distilabel.argilla.io/dev/sections/how_to_guides/advanced/distiset/),它包含根据文档生成的考试问题,其中保存文档的目录是 `input_dir`。 -### 2. Annotate Dataset +### 2. 标注数据集 -The `annotate_dataset.py` script takes the generated questions and creates an Argilla dataset for annotation. It sets up the dataset structure and populates it with the generated questions and answers, randomizing the order of answers to avoid bias. Once in Argilla, you or a domain expert can validate the dataset with the correct answers. +使用 `annotate_dataset.py` 可以将生成的问题创建为一个 Argilla 数据集,用以标注。该程序会构建起数据集结构,并把生成的问题和回答填充进统一结构中,还可以改变数据样本的顺序来避免偏向性。使用 Argilla,你或者一个领域专家可以用选择的方式给出每个问题的正确回答。 -You will see suggested correct answers from the LLM in random order and you can approve the correct answer or select a different one. The duration of this process will depend on the scale of your evaluation dataset, the complexity of your domain data, and the quality of your LLM. For example, we were able to create 150 samples within 1 hour on the domain of transfer learning, using Llama-3.1-70B-Instruct, mostly by approving the correct answer and discarding the incorrect ones. +在标注界面中,已生成的几个回答会随机排列着,供标注人员选择正确答案。这其中,LLM 会提供一个建议的正确回答,你可以选择你认为正确的回答,也可以对选中的正确回答进行编辑改进。标注过程的耗时取决于你的数据集规模、领域内数据的复杂度,以及 LLM 的能力强弱。举例来说,我们是可以借助 Llama-3.1-70B-Instruct 在 1 小时内标注好迁移学习领域 150 个样本的,大多数时候,直接选择正确答案即可。 + +通过以下命令可以开始标注: -To run the annotation process: ```sh python annotate_dataset.py --dataset_path path/to/distiset --output_dataset_name argilla_dataset_name ``` -This will create an Argilla dataset that can be used for manual review and annotation. +这将会创建一个 Argilla 数据集,用来手工检查和标注数据。 ![argilla_dataset](./images/domain_eval_argilla_view.png) -If you're not using Argilla, deploy it locally or on spaces following this [quickstart guide](https://docs.argilla.io/latest/getting_started/quickstart/). +可以参考[这里的指引](https://docs.argilla.io/latest/getting_started/quickstart/),在本地或 Hugging Face 的 space 里部署标注任务。 -### 3. Create Dataset +### 3. 创建数据集 -The `create_dataset.py` script processes the annotated data from Argilla and creates a Hugging Face dataset. It handles both suggested and manually annotated answers. The script will create a dataset with the question, possible answers, and the column name for the correct answer. To create the final dataset: +使用 `create_dataset.py` 可以进一步处理 Argilla 标注的数据,并创建一个 Hugging Face 数据集。这个数据集里每条数据包含这些内容:问题、可能的回答、正确回答(所在的列的名字)。运行以下命令即可创建最终的数据集: ```sh huggingface_hub login python create_dataset.py --dataset_path argilla_dataset_name --dataset_repo_id your_hf_repo_id ``` -This will push the dataset to the Hugging Face Hub under the specified repository. You can view the sample dataset on the hub [here](https://huggingface.co/datasets/burtenshaw/exam_questions/viewer/default/train), and a preview of the dataset looks like this: +最终,数据集会被推送到 Hugging Face Hub 里。本示例的数据集已经上传了,可以在[这里](https://huggingface.co/datasets/burtenshaw/exam_questions/viewer/default/train)查看,大致是这样: ![hf_dataset](./images/domain_eval_dataset_viewer.png) -### 4. Evaluation Task - -The `evaluation_task.py` script defines a custom LightEval task for evaluating language models on the exam questions dataset. It includes a prompt function, a custom accuracy metric, and the task configuration. +### 4. 开始测评 -To evaluate a model using lighteval with the custom exam questions task: +使用 `evaluation_task.py`,你可以自定义一个 LightEval 任务,用来在前面创建的数据集上测试模型。具体执行命令如下: ```sh lighteval accelerate \ @@ -76,10 +74,8 @@ lighteval accelerate \ --output_dir "./evals" ``` -You can find detailed guides in lighteval wiki about each of these steps: - -- [Creating a Custom Task](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) -- [Creating a Custom Metric](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) -- [Using existing metrics](https://github.com/huggingface/lighteval/wiki/Metric-List) - +此外,lighteval 的 wiki 也提供了更详细的讲解: +- [自定义评测任务](https://github.com/huggingface/lighteval/wiki/Adding-a-Custom-Task) +- [自定义评测指标](https://github.com/huggingface/lighteval/wiki/Adding-a-New-Metric) +- [使用已有的评测指标](https://github.com/huggingface/lighteval/wiki/Metric-List) diff --git a/5_vision_language_models/README.md b/5_vision_language_models/README.md index 48d95fed..cc087c22 100644 --- a/5_vision_language_models/README.md +++ b/5_vision_language_models/README.md @@ -1,34 +1,34 @@ -# Vision Language Models +# 视觉语言模型 -## 1. VLM Usage +## 1. 视觉语言模型的用处 -Vision Language Models (VLMs) process image inputs alongside text to enable tasks like image captioning, visual question answering, and multimodal reasoning. +视觉语言模型(Vision Language Models 或简称 VLM)是一种不仅仅接收语言输入、而且可以处理图片输入的模型,它可以支持诸如图像文本描述生成(image captioning)、视觉问答(visual question answering)、多模态推理(multimodal reasoning)等任务。 -A typical VLM architecture consists of an image encoder to extract visual features, a projection layer to align visual and textual representations, and a language model to process or generate text. This allows the model to establish connections between visual elements and language concepts. +一个典型的 VLM 架构包含一个图像编码器(用来提取视觉特征)、一个映射层(用来对齐视觉特征和文本表征)以及一个语言模型(用以处理视觉语言特征并输出文本)。这使得模型得以在视觉元素和语言概念之间建立起连接。 -VLMs can be used in different configurations depending on the use case. Base models handle general vision-language tasks, while chat-optimized variants support conversational interactions. Some models include additional components for grounding predictions in visual evidence or specializing in specific tasks like object detection. +VLM 可以有很多用处。基本的用途包括通用的视觉语言任务,而那些针对聊天对话场景优化过的 VLM 则可以支持对话式的人机互动。还有些模型可以根据图像中的证据预测事实,或者进行特定的任务,如物体检测。 -For more on the technicality and usage of VLMs, refer to the [VLM Usage](./vlm_usage.md) page. +关于 VLM 的更多技术和使用,建议读者在 [VLM 的使用](./vlm_usage_cn.md)这一节中学习。 -## 2. VLM Fine-Tuning +## 2. 视觉语言模型的微调 -Fine-tuning a VLM involves adapting a pre-trained model to perform specific tasks or to operate effectively on a particular dataset. The process can follow methodologies such as supervised fine-tuning, preference optimization, or a hybrid approach that combines both, as introduced in Modules 1 and 2. +微调一个 VLM 的过程,通常是指选择一个预训练过的模型,在一个新的数据集上学习处理特定领域的任务。这个过程可以参考本教程第 1 和 2 章的相关方法,比如有监督微调、偏好优化等。 -While the core tools and techniques remain similar to those used for LLMs, fine-tuning VLMs requires additional focus on data representation and preparation for images. This ensures the model effectively integrates and processes both visual and textual data for optimal performance. Given that the demo model, SmolVLM, is significantly larger than the language model used in the previous module, it's essential to explore methods for efficient fine-tuning. Techniques like quantization and PEFT can help make the process more accessible and cost-effective, allowing more users to experiment with the model. +虽然核心工具与技术和大语言模型(LLMs)所用的大致相同,但微调视觉语言模型(VLMs)需要格外关注图像的数据表示与准备。如此才能确保模型有效整合并处理视觉与文本数据,以实现最佳性能。鉴于演示模型 SmolVLM 比前一模块使用的语言模型大得多,探索高效的微调方法至关重要。量化和参数高效微调(PEFT)等技术,能让这一过程更易于操作且成本更低,使更多用户能够对该模型进行试验。 -For detailed guidance on fine-tuning VLMs, visit the [VLM Fine-Tuning](./vlm_finetuning.md) page. +如果你想了解详细的微调 VLM 的技术,你可以学习 [VLM 微调](./vlm_finetuning_cn.md) 这一节。 ## Exercise Notebooks -| Title | Description | Exercise | Link | Colab | +| 标题 | 概述 | 练习 | 链接 | Colab | |-------|-------------|----------|------|-------| -| VLM Usage | Learn how to load and use a pre-trained VLM for various tasks | 🐢 Process an image
🐕 Process multiple images with batch handling
🦁 Process a full video| [Notebook](./notebooks/vlm_usage_sample.ipynb) | Open In Colab | -| VLM Fine-Tuning | Learn how to fine-tune a pre-trained VLM for task-specific datasets | 🐢 Use a basic dataset for fine-tuning
🐕 Try a new dataset
🦁 Experiment with alternative fine-tuning methods | [Notebook](./notebooks/vlm_sft_sample.ipynb)| Open In Colab | +| VLM 的使用 | 学习如何载入并使用一个预训练过的 VLM 来处理各种任务 | 🐢 尝试处理一张图片
🐕 尝试用 batch 的方式处理多个图片
🦁 尝试处理一整个视频 | [Notebook](./notebooks/vlm_usage_sample_cn.ipynb) | Open In Colab | +| VLM 微调 | 学习针对特定任务数据集来微调一个预训练过的 VLM | 🐢 使用一个基本数据集进行微调
🐕 尝试使用新数据集
🦁 试验一种不同的微调方法 | [Notebook](./notebooks/vlm_sft_sample_cn.ipynb)| Open In Colab | -## References +## 参考资料 - [Hugging Face Learn: Supervised Fine-Tuning VLMs](https://huggingface.co/learn/cookbook/fine_tuning_vlm_trl) - [Hugging Face Learn: Supervised Fine-Tuning SmolVLM](https://huggingface.co/learn/cookbook/fine_tuning_smol_vlm_sft_trl) - [Hugging Face Learn: Preference Optimization Fine-Tuning SmolVLM](https://huggingface.co/learn/cookbook/fine_tuning_vlm_dpo_smolvlm_instruct) diff --git a/5_vision_language_models/notebooks/vlm_sft_sample.ipynb b/5_vision_language_models/notebooks/vlm_sft_sample.ipynb index 5af7d6a2..9ddf65a6 100644 --- a/5_vision_language_models/notebooks/vlm_sft_sample.ipynb +++ b/5_vision_language_models/notebooks/vlm_sft_sample.ipynb @@ -4,17 +4,17 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Fine-Tuning a VLM \n", + "# 微调一个 VLM \n", "\n", - "This notebook demonstrates how to fine-tune the `HuggingFaceTB/SmolVLM-Instruct` model using the `SFTTrainer` from the `trl` library. The notebook cells run and will finetune the model. You can select your difficulty by trying out different datasets.\n", + "本文将展示如何用 `trl` 库的 `SFTTrainer` 微调 `HuggingFaceTB/SmolVLM-Instruct` 模型。文中的代码均可运行,你可选择不同的微调数据集尝试训练。\n", "\n", "
\n", - "

Exercise: Fine-Tuning SmolVLM with SFTTrainer

\n", - "

Take a dataset from the Hugging Face hub and finetune a model on it.

\n", - "

Difficulty Levels

\n", - "

🐢 Use the `HuggingFaceM4/ChartQA` dataset for SFT training run.

\n", - "

🐕 Use the fine-tuned to model generate a response, and improve upon the base example.

\n", - "

🦁 Try out the other datasets and show improvement.

\n", + "

练习:用 SFTTrainer 微调 SmolVLM

\n", + "

从 Hugging Face hub 找一个数据,在该数据集上微调模型

\n", + "

难度等级

\n", + "

🐢 用 `HuggingFaceM4/ChartQA` 数据集进行 SFT

\n", + "

🐕 用微调过的模型生成回答,和未微调的模型进行效果对比

\n", + "

🦁 尝试使用不同的数据集进行微调训练,并对比效果

\n", "
" ] }, @@ -92,11 +92,11 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Dataset Preparation\n", + "## 数据集准备\n", "\n", - "We will load a sample dataset and format it for training. The dataset should be structured with input-output pairs, where each input is a prompt and the output is the expected response from the model.\n", + "我们将加载一个示例数据集,并对其进行格式化以用于训练。该数据集应构建为输入-输出对的形式,其中每个输入是一个提示语(包含文本和图像),输出是模型预期的响应。\n", "\n", - "**TRL will format input messages based on the model's chat templates.** They need to be represented as a list of dictionaries with the keys: `role` and `content`,." + "**TRL 将根据模型的聊天模板对输入消息进行格式化。** 它们会被整理成一个字典列表,字典包含的 key 有 `role` 和 `content`。" ] }, { @@ -209,7 +209,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Set Up LoRA" + "## 配置 LoRA" ] }, { @@ -249,9 +249,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Configuring the Trainer\n", + "## 配置 Trainer\n", "\n", - "The `Trainer` is configured with various parameters that control the training process. These include the number of training steps, batch size, learning rate, and evaluation strategy. Adjust these parameters based on your specific requirements and computational resources." + "通过调整 `Trainer` 的各种参数,我们可以控制训练过程。这些参数包括训练步数、batch size、学习率以及评估策略。你可根据具体需求和计算资源来调整这些参数。\n" ] }, { @@ -379,9 +379,9 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Training the Model\n", + "## 训练模型\n", "\n", - "With the trainer configured, we can now proceed to train the model. The training process will involve iterating over the dataset, computing the loss, and updating the model's parameters to minimize this loss." + "训练器配置完成后,我们现在就可以着手训练模型了。训练过程将包括对数据集进行迭代,计算损失值,并更新模型参数以最小化该损失。 " ] }, { @@ -427,13 +427,13 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## 💐 You're done!\n", + "## 💐 完成了!\n", "\n", - "This notebook provided a step-by-step guide to fine-tuning the `HuggingFaceTB/SmolVLM` model using the `SFTTrainer`. By following these steps, you can adapt the model to perform specific tasks more effectively. If you want to carry on working on this course, here are steps you could try out:\n", + "本文提供了一份使用 `SFTTrainer` 对 `HuggingFaceTB/SmolVLM` 模型进行微调的分步指南。按照这些步骤操作,你可以让模型更有效地执行特定任务。如果你想继续学习本课程,不妨尝试以下步骤:\n", "\n", - "- Try this notebook on a harder difficulty\n", - "- Review a colleagues PR\n", - "- Improve the course material via an Issue or PR." + "- 尝试更高难度等级的练习题。\n", + "- 对 GitHub 上的 PRs 进行review。\n", + "- 通过 GitHub 提出 issue 或 PR,为完善本课程材料贡献力量。\n" ] } ], diff --git a/5_vision_language_models/notebooks/vlm_usage_sample.ipynb b/5_vision_language_models/notebooks/vlm_usage_sample.ipynb index 59f29d42..7554d217 100644 --- a/5_vision_language_models/notebooks/vlm_usage_sample.ipynb +++ b/5_vision_language_models/notebooks/vlm_usage_sample.ipynb @@ -4,14 +4,14 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "# Processing images and text with VLMs \n", + "# 用 VLM 处理图片或视频相关的多模态任务\n", "\n", - "This notebook demonstrates how to utilize the `HuggingFaceTB/SmolVLM-Instruct` 4bit-quantized model for various multimodal tasks such as:\n", - "- Visual Question Answering (VQA): Answering questions based on image content.\n", - "- Text Recognition (OCR): Extracting and interpreting text in images.\n", - "- Video Understanding: Describing videos through sequential frame analysis.\n", + "本文以 `HuggingFaceTB/SmolVLM-Instruct` 的 4bit 量化模型为例,展示如何处理以下多模态任务:\n", + "- 视觉问题回答(VQA):基于图片内容回答问题\n", + "- 文本识别(OCR):提取并识别图片中的文本文字\n", + "- 视频理解:通过分析一系列的视频帧,来生成对视频的描述\n", "\n", - "By structuring prompts effectively, you can leverage the model for many applications, such as scene understanding, document analysis, and dynamic visual reasoning." + "通过有效地组织提示语(文本 + 视觉),你可以将该模型应用于诸多领域,如场景理解、文档分析以及动态视觉推理。" ] }, { @@ -76,15 +76,16 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Processing Images" + "## 图像级输入的情况" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ - "Let's start with generating captions and answering questions about an image. We'll also explore processing multiple images.\n", - "### 1. Single Image Captioning" + "我们先探索生成图片的文本描述,以及针对图片提问的情况。后面我们也会探索多图输入的情况。\n", + "\n", + "### 1. 单副图片的文本描述生成" ] }, { @@ -182,8 +183,8 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 2. Comparing Multiple Images\n", - "The model can process and compare multiple images. Let's determine the common theme between two images." + "### 2. 对比多幅图片\n", + "模型也可以处理并对比多幅图片。我们接下来让模型总结一下两张图片的共同主题。" ] }, { @@ -243,9 +244,10 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "### 🔠 Text Recognition (OCR)\n", - "VLM can also recognize and interpret text in images, making it suitable for tasks like document analysis.\n", - "You could try experimenting on images with denser text." + "### 🔠 文字识别(OCR)\n", + "\n", + "VLM 也可以识别并提取文字,这使得它也可以用来处理诸如文档分析之类的任务。\n", + "你也可以试试使用文字更密集的图片。" ] }, { @@ -310,14 +312,15 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "## Processing videos\n", + "## 视频级输入的情况\n", "\n", - "Visual-Language Models (VLMs) can process videos indirectly by extracting keyframes and reasoning over them in temporal order. While VLMs lack the temporal modeling capabilities of dedicated video models, they can still:\n", - "- Describe actions or events by analyzing sampled frames sequentially.\n", - "- Answer questions about videos based on representative keyframes.\n", - "- Summarize video content by combining textual descriptions of multiple frames.\n", + "视觉语言模型(VLMs)可以通过提取关键帧并按时间顺序对其进行推理,从而间接处理视频。虽然视觉语言模型缺乏专用视频模型的时间建模能力,但它们仍然能够:\n", "\n", - "Let experiment on an example:\n", + "- 通过按顺序分析采样帧来描述动作或事件。\n", + "- 根据具有代表性的关键帧回答有关视频的问题。\n", + "- 通过组合多个帧的文本描述来总结视频内容。\n", + "\n", + "我们用这个视频作为例子:\n", "\n", "