Skip to content

Releases: modelscope/ms-swift

Patch release v4.1.2

18 Apr 15:56

Choose a tag to compare

Patch release v4.1.1

13 Apr 14:21

Choose a tag to compare

v4.1.0

07 Apr 07:54

Choose a tag to compare

中文版

新特性

  1. Megatron-SWIFT
    a. mcore-bridge 从 ms-swift 拆分成独立 repo,为最先进模型提供 megatron-core 模型定义:https://github.com/modelscope/mcore-bridge
    b. 支持 GRPO Router Replay,使用--router_replay_mode 参数。 感谢招商技术团队 @XianlongLi 的贡献。
    c. Qwen3.5 解除 TP 数受 num_query_groups 限制的约束,支持 CP 和序列 packing,并支持多模态 MTP。参考 Qwen3.5 最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Qwen3_5-Best-Practice.html
    d. 新模型支持:GLM-5、Deepseek-v3.2 和 MiniMax2.5。
    e. 支持 muon、dist_muon 优化器,训练脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/megatron/muon.sh
    f. 支持 --tuner_type lora_llm,对 LLM 部分使用 LoRA 训练,对 ViT/Aligner 使用全参数训练。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/multimodal/lora_llm_vit_full
  2. RL
    a. OPSD 算法支持,支持设置教师模型为训练模型并支持设置 teacher_prompt,参考https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html#opsd-on-policy-self-distillation
    b. REAL 算法支持,使用 --loss_type real 参数。感谢招商技术团队 @li2zhi 的贡献。
    c. 支持 QLoRA GRPO,参考 https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/internal/qlora.sh
    d. GRPO K3-KL 计算增加 clamp 操作稳定训练。
    e. top-k 默认值从 50 修改为 -1,top-p 默认值从 0.95 修改为 1。
  3. 训练
    a. 优化 yaml 启动方式的支持,参考:https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    b. 新增架构文档:https://swift.readthedocs.io/zh-cn/latest/Customization/Architecture.html
    c. 新增 Metax 支持最佳实践:https://swift.readthedocs.io/zh-cn/latest/BestPractices/Metax-support.html
    d. 新增通过 uv 安装 ms-swift 的支持。

新模型

  1. 纯文本模型
    a. MiniMax/MiniMax-M2.5
    b. deepseek-ai/DeepSeek-V3.2
    c. Alibaba-AAIG/YuFeng-XGuard-Reason-0.6B系列 (感谢 @ciaoyizhen 的贡献)
  2. 多模态模型
    a. google/gemma-4-E2B-it系列,脚本参考:https://github.com/modelscope/ms-swift/blob/main/examples/models/gemma4/train.sh

English Version

New Features

  1. Megatron-SWIFT
    a. mcore-bridge has been split from ms-swift into an independent repository, providing megatron-core model definitions for state-of-the-art models: https://github.com/modelscope/mcore-bridge
    b. Support for GRPO Router Replay via the --router_replay_mode parameter. Thanks to @XianlongLi from the CMB Tech team for the contribution.
    c. Qwen3.5 removes the TP size restriction imposed by num_query_groups, with added support for CP, sequence packing, and multimodal MTP. Refer to the Qwen3.5 best practices: https://swift.readthedocs.io/zh-cn/latest/BestPractices/Qwen3_5-Best-Practice.html
    d. New model support: GLM-5, DeepSeek-V3.2, and MiniMax2.5.
    e. Support for muon and dist_muon optimizers. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/megatron/muon.sh
    f. Support for --tuner_type lora_llm, enabling LoRA training on the LLM component and full-parameter training on ViT/Aligner. Training script reference: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/multimodal/lora_llm_vit_full
  2. RL
    a. Support for the OPSD algorithm, with the ability to set the teacher model as the training model and configure teacher_prompt. Refer to: https://swift.readthedocs.io/zh-cn/latest/Instruction/GKD.html#opsd-on-policy-self-distillation
    b. Support for the REAL algorithm via the --loss_type real parameter. Thanks to @li2zhi from the CMB Tech team for the contribution.
    c. Support for QLoRA GRPO. Refer to: https://github.com/modelscope/ms-swift/blob/main/examples/train/grpo/internal/qlora.sh
    d. Added clamp operation to GRPO K3-KL computation for training stability.
    e. Changed the default value of top-k from 50 to -1, and top-p from 0.95 to 1.
  3. Training
    a. Improved support for YAML-based launch configurations. Refer to: https://github.com/modelscope/ms-swift/tree/main/examples/yaml
    b. Added architecture documentation: https://swift.readthedocs.io/zh-cn/latest/Customization/Architecture.html
    c. Added Metax support best practices: https://swift.readthedocs.io/zh-cn/latest/BestPractices/Metax-support.html
    d. Added support for installing ms-swift via uv.

New Models

  1. Text-Only Models
    a. MiniMax/MiniMax-M2.5
    b. deepseek-ai/DeepSeek-V3.2
    c. Alibaba-AAIG/YuFeng-XGuard-Reason-0.6B series (Thanks to @ciaoyizhen for the contribution)
  2. Multimodal Models
    a. google/gemma-4-E2B-it series. Training script reference: https://github.com/modelscope/ms-swift/blob/main/examples/models/gemma4/train.sh

What's Changed

Read more

Patch release v4.0.4

03 Apr 22:36

Choose a tag to compare

Patch release v4.0.3

29 Mar 04:21

Choose a tag to compare

Patch release v4.0.2

14 Mar 14:20

Choose a tag to compare

Patch release v4.0.1

08 Mar 04:33

Choose a tag to compare

v4.0.0

03 Mar 08:25

Choose a tag to compare

中文版

新特性

  1. 架构优化
    a. 目录结构重构与依赖关系优化,使用模块化设计,提升架构的可扩展性和可定制性。
    b. model_typetemplate解耦,简化同一 model_type 含多个 template 的模型支持流程。
    c. Megatron-SWIFT 训练循环重写,使用 megatron-core 替代 megatron-lm 依赖。(兼容Ascend NPU)
  2. Megatron-SWIFT
    a. 新模型支持:Qwen3.5系列、GLM4.7-Flash、MiniMax-M2.1、OLMoE。
    b. Embedding 任务支持,训练示例:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/embedding
    c. Reranker 任务支持,训练示例:https://github.com/modelscope/ms-swift/tree/main/examples/megatron/reranker
    d. 新增save_total_limit参数,自动清理过期 checkpoint,并保留指标最优和最新的权重。
    e. Qwen3-Next/Qwen3.5 新增apply_wd_to_qk_layernorm参数,支持对 qk layernorm 应用权重衰减。
    f. 多模态MoE模型lora支持 --target_modules all-router 配置。
  3. RL
    a. 支持GDPO算法计算优势,使用参数--scale_rewards gdpo。(感谢 @Auraithm 的贡献)
    b. GKD 支持使用 top-k logits 计算KL以节约显存,使用参数 --gkd_topk_logits
    c. GKD 支持使用 teacher server,避免显式加载教师模型。
  4. 训练
    a. 新增 muon clip 优化器支持,训练示例:https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muonclip.sh (感谢 @vx120 的贡献)
    b. 依赖更新:兼容最新依赖 python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0等。
    c. generative reranker lm_head 部分计算优化,降低显存占用。
    d. fsdp2支持激活 cpu offload;deepspeed elastic支持。(感谢招商 @meichangsu1 的贡献)

新模型

  1. 纯文本模型
    a. Qwen/Qwen3-Coder-Next
    b. ZhipuAI/GLM-4.7-Flash, ZhipuAI/GLM-5
    c. MiniMaxAI/MiniMax-M2.1
    d. Tencent-YouTu-Research/Youtu-LLM-2B
    e. IQuestLab/IQuest-Coder-V1-40B-Instruct
    f. allenai/OLMoE-1B-7B-0924-Instruct系列(感谢 @qianhao0713 的贡献)
  2. 多模态模型
    a. Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-9B 系列。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5
    b. Qwen3-VL-Embedding, Qwen3-VL-Reranker。训练脚本参考:https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding/qwen3, https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker/qwen3
    c. deepseek-ai/DeepSeek-OCR-2
    d. ZhipuAI/GLM-OCR
    e. PaddlePaddle/PaddleOCR-VL-1.5
    f. OpenBMB/MiniCPM-o-4_5
    g. stepfun-ai/Step3-VL-10B
    h. google/medgemma-4b-it 系列

English Version

New Features

  1. Architecture Optimization
    a. Directory structure refactoring and dependency optimization with modular design to enhance architecture scalability and customizability.
    b. Decoupling of model_type and template to simplify support for models with multiple templates under the same model_type.
    c. Rewritten Megatron-SWIFT training loop using megatron-core instead of megatron-lm dependency. (Compatible with Ascend NPU)
  2. Megatron-SWIFT
    a. New model support: Qwen3.5 series, GLM4.7-Flash, MiniMax-M2.1, OLMoE.
    b. Embedding task support. Training example: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/embedding
    c. Reranker task support. Training example: https://github.com/modelscope/ms-swift/tree/main/examples/megatron/reranker
    d. Added save_total_limit parameter to automatically clean up expired checkpoints while retaining the best-performing and latest weights.
    e. Added apply_wd_to_qk_layernorm parameter for Qwen3-Next/Qwen3.5 to support weight decay on qk layernorm.
    f. Multi-modal MoE model LoRA supports --target_modules all-router configuration.
  3. RL
    a. Support for GDPO algorithm to compute advantages using parameter --scale_rewards gdpo. (Thanks to @Auraithm)
    b. GKD supports using top-k logits to compute KL for memory savings with parameter --gkd_topk_logits.
    c. GKD supports using teacher server to avoid explicitly loading the teacher model.
  4. Training
    a. Added Muon-CLIP optimizer support. Training example: https://github.com/modelscope/ms-swift/blob/main/examples/train/optimizer/muonclip.sh (Thanks to @vx120)
    b. Dependency updates: Compatible with latest dependencies including python3.12, transformers 5.2.0, vllm 0.15.1, trl 0.28, liger-kernel 0.7.0, etc.
    c. Optimized generative reranker lm_head computation to reduce memory usage.
    d. FSDP2 supports CPU offload activation; DeepSpeed elastic support. (Thanks to @meichangsu1)

New Models

  1. Text-only Models
    a. Qwen/Qwen3-Coder-Next
    b. ZhipuAI/GLM-4.7-Flash, ZhipuAI/GLM-5
    c. MiniMaxAI/MiniMax-M2.1
    d. Tencent-YouTu-Research/Youtu-LLM-2B
    e. IQuestLab/IQuest-Coder-V1-40B-Instruct
    f. allenai/OLMoE-1B-7B-0924-Instruct series (Thanks to @qianhao0713)
  2. Multi-modal Models
    a. Qwen/Qwen3.5-35B-A3B, Qwen/Qwen3.5-9B series. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/models/qwen3_5
    b. Qwen3-VL-Embedding, Qwen3-VL-Reranker. Training scripts: https://github.com/modelscope/ms-swift/tree/main/examples/train/embedding/qwen3, https://github.com/modelscope/ms-swift/tree/main/examples/train/reranker/qwen3
    c. deepseek-ai/DeepSeek-OCR-2
    d. ZhipuAI/GLM-OCR
    e. PaddlePaddle/PaddleOCR-VL-1.5
    f. OpenBMB/MiniCPM-o-4_5
    g. stepfun-ai/Step3-VL-10B
    h. google/medgemma-4b-it series

What's Changed

Read more

Patch release v3.12.6

28 Feb 01:46

Choose a tag to compare

What's Changed

Full Changelog: v3.12.5...v3.12.6

Patch release v3.12.5

14 Feb 10:10

Choose a tag to compare