Skip to content

Feature[add model mistralai/ministral3]#4293

Open
learncat163 wants to merge 9 commits intoPaddlePaddle:developfrom
learncat163:feature/add-model-ministral3
Open

Feature[add model mistralai/ministral3]#4293
learncat163 wants to merge 9 commits intoPaddlePaddle:developfrom
learncat163:feature/add-model-ministral3

Conversation

@learncat163
Copy link
Copy Markdown

PR 新增 mistralai系列的ministral3 模型

权重信息

目前提供了 Ministral-3-3B-Instruct-2512 和 Ministral-3-8B-Instruct-2512 2个版本的支持和权重转换。

代码即可以直接加载HF上的原始权重,也可以支持paddle格式权重的直接加载。

精度对齐

使用 tests/transformers/ministral3/test_modeling.pyTestMistral3DiffAlignment 类实现精度对齐测试断言(top10 token和logits diff)。

token top 10 对齐

使用prompt: 'Hello, how are you today?'

输出的token ids

Step Torch Paddle Status
1 1362 1362 OK
2 4525 4525 OK
3 7771 7771 OK
4 1044 1044 OK
5 15412 15412 OK
6 1636 1636 OK
7 1046 1046 OK
8 3075 3075 OK
9 2314 2314 OK
10 1636 1636 OK

PyTorch 生成文本: " I'm fine, thank you. How about you"

Paddle 生成文本: " I'm fine, thank you. How about you"

最后一层输出的logits diff

logits max_diff: 5.912781e-05 (threshold: 0.01)
logits mean_diff: 3.532475e-06

微调loss下降对比

step ms-swift paddle
2 1.182000 0.982295
3 1.659000 0.715159
4 1.533000 0.894038
5 1.773000 0.490606
6 0.929800 1.107558
7 0.995100 1.125524
8 0.517400 0.672211
9 0.671700 0.469049
10 1.126000 0.149397
11 1.089000 0.210774
12 1.221000 0.229611
13 1.037000 0.142172
14 0.757500 0.381565
15 0.828500 0.190437
16 0.630000 0.078277
17 0.552100 0.073114
18 0.424100 0.036691
19 0.732700 0.061551
20 0.363100 0.018962
21 0.490500 0.022519
... ... ...
total 97

paddle使用配置 tests/config/ci/ministral3_sft.yaml

ms-swift需要特别注意,需要安装一下依赖,不然可能会有问题pip install "mistral-common>=1.8.6" -U

ms-swift使用配置如下:

注册模板my_register.py
from swift.template import TemplateMeta, register_template
register_template(
    TemplateMeta(
        template_type='mistral_2512_text',
        prefix=['<s>'],
        prompt=['[INST]{{QUERY}}[/INST]'],
        chat_sep=['</s>'],
        suffix=['</s>'],
        system_prefix=['<s>[SYSTEM_PROMPT]{{SYSTEM}}[/SYSTEM_PROMPT]'],
        default_system=None,
        auto_add_bos=False,
    )
)
print("Registered 'mistral_2512_text' template")

from transformers.quantizers.auto import AUTO_QUANTIZER_MAPPING

_fp8_quantizer_cls = AUTO_QUANTIZER_MAPPING.get('fp8')
if _fp8_quantizer_cls:
    _orig_fp8_init = _fp8_quantizer_cls.__init__

    def _patched_fp8_init(self, quantization_config, **kwargs):
        quantization_config.dequantize = True  # 加载时将 FP8 权重转为 BF16
        _orig_fp8_init(self, quantization_config, **kwargs)

    _fp8_quantizer_cls.__init__ = _patched_fp8_init
    print("[PATCH] FP8 Quantizer: force dequantize=True")
else:
    print("[WARN] FP8 Quantizer not found in AUTO_QUANTIZER_MAPPING")
启动命令
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
MODEL_PATH="$HOME/llm/mistralai/Ministral-3-8B-Instruct-2512"
TRAIN_DATA="$PROJECT_DIR/tmp/gsm8k/gsm8k_train_swift.jsonl"
OUTPUT_DIR="$PROJECT_DIR/tmp/mistral3-8b-sft-swift"
REGISTER_SCRIPT="$SCRIPT_DIR/my_register.py"

# HF 镜像
export HF_ENDPOINT=https://hf-mirror.com

mkdir -p "$OUTPUT_DIR"

echo "========================================"
echo "ms-swift SFT Training - Ministral-3-8B"
echo "========================================"
echo "model:       $MODEL_PATH"
echo "data:        $TRAIN_DATA"
echo "output:      $OUTPUT_DIR"
echo "register:    $REGISTER_SCRIPT"
echo "max_steps:   200"
echo "========================================"

# 检查模型路径是否存在
if [ ! -d "$MODEL_PATH" ]; then
    echo "ERROR: Model path not found: $MODEL_PATH"
    exit 1
fi

# 检查数据文件是否存在
if [ ! -f "$TRAIN_DATA" ]; then
    echo "ERROR: Training data not found: $TRAIN_DATA"
    echo "Please run prepare_data.py first."
    exit 1
fi

# 激活 swift conda 环境并运行训练
source ~/miniconda3/etc/profile.d/conda.sh && conda activate swift && \
swift sft \
    --model "$MODEL_PATH" \
    --model_type mistral3 \
    --tuner_type full \
    --template mistral_2512_text \
    --custom_register_path "$REGISTER_SCRIPT" \
    --dataset "$TRAIN_DATA" \
    --max_length 2048 \
    --per_device_train_batch_size 1 \
    --learning_rate 2e-5 \
    --optim adamw_torch \
    --weight_decay 0 \
    --max_steps 200 \
    --warmup_steps 2 \
    --gradient_accumulation_steps 1 \
    --num_train_epochs 1 \
    --bf16 true \
    --seed 23 \
    --max_grad_norm -1 \
    --output_dir "$OUTPUT_DIR" \
    --logging_steps 1 \
    --save_strategy no \
    2>&1 | tee "$OUTPUT_DIR/training.log"

@paddle-bot
Copy link
Copy Markdown

paddle-bot Bot commented Apr 16, 2026

Thanks for your contribution!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 61.56584% with 216 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e660ee5). Learn more about missing BASE report.

Files with missing lines Patch % Lines
paddleformers/transformers/ministral3/modeling.py 55.55% 208 Missing ⚠️
...leformers/transformers/ministral3/configuration.py 92.77% 6 Missing ⚠️
paddleformers/cli/utils/llm_utils.py 0.00% 2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (61.56%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files
@@            Coverage Diff             @@
##             develop    #4293   +/-   ##
==========================================
  Coverage           ?   35.00%           
==========================================
  Files              ?      478           
  Lines              ?    89900           
  Branches           ?        0           
==========================================
  Hits               ?    31467           
  Misses             ?    58433           
  Partials           ?        0           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants