Feature[add model mistralai/ministral3] by learncat163 · Pull Request #4293 · PaddlePaddle/PaddleFormers

learncat163 · 2026-04-16T07:00:44Z

PR 新增 mistralai系列的ministral3 模型

权重信息

目前提供了 Ministral-3-3B-Instruct-2512 和 Ministral-3-8B-Instruct-2512 2个版本的支持和权重转换。

代码即可以直接加载HF上的原始权重，也可以支持paddle格式权重的直接加载。

精度对齐

使用 tests/transformers/ministral3/test_modeling.py 的 TestMistral3DiffAlignment 类实现精度对齐测试断言（top10 token和logits diff）。

token top 10 对齐

使用prompt: 'Hello, how are you today?'

输出的token ids

Step	Torch	Paddle	Status
1	1362	1362	OK
2	4525	4525	OK
3	7771	7771	OK
4	1044	1044	OK
5	15412	15412	OK
6	1636	1636	OK
7	1046	1046	OK
8	3075	3075	OK
9	2314	2314	OK
10	1636	1636	OK

PyTorch 生成文本: " I'm fine, thank you. How about you"

Paddle 生成文本: " I'm fine, thank you. How about you"

最后一层输出的logits diff

logits max_diff: 5.912781e-05 (threshold: 0.01)
logits mean_diff: 3.532475e-06

微调loss下降对比

step	ms-swift	paddle
2	1.182000	0.982295
3	1.659000	0.715159
4	1.533000	0.894038
5	1.773000	0.490606
6	0.929800	1.107558
7	0.995100	1.125524
8	0.517400	0.672211
9	0.671700	0.469049
10	1.126000	0.149397
11	1.089000	0.210774
12	1.221000	0.229611
13	1.037000	0.142172
14	0.757500	0.381565
15	0.828500	0.190437
16	0.630000	0.078277
17	0.552100	0.073114
18	0.424100	0.036691
19	0.732700	0.061551
20	0.363100	0.018962
21	0.490500	0.022519
...	...	...
total 97

paddle使用配置 tests/config/ci/ministral3_sft.yaml

ms-swift需要特别注意，需要安装一下依赖，不然可能会有问题pip install "mistral-common>=1.8.6" -U

ms-swift使用配置如下：

注册模板my_register.py

from swift.template import TemplateMeta, register_template
register_template(
    TemplateMeta(
        template_type='mistral_2512_text',
        prefix=['<s>'],
        prompt=['[INST]{{QUERY}}[/INST]'],
        chat_sep=['</s>'],
        suffix=['</s>'],
        system_prefix=['<s>[SYSTEM_PROMPT]{{SYSTEM}}[/SYSTEM_PROMPT]'],
        default_system=None,
        auto_add_bos=False,
    )
)
print("Registered 'mistral_2512_text' template")

from transformers.quantizers.auto import AUTO_QUANTIZER_MAPPING

_fp8_quantizer_cls = AUTO_QUANTIZER_MAPPING.get('fp8')
if _fp8_quantizer_cls:
    _orig_fp8_init = _fp8_quantizer_cls.__init__

    def _patched_fp8_init(self, quantization_config, **kwargs):
        quantization_config.dequantize = True  # 加载时将 FP8 权重转为 BF16
        _orig_fp8_init(self, quantization_config, **kwargs)

    _fp8_quantizer_cls.__init__ = _patched_fp8_init
    print("[PATCH] FP8 Quantizer: force dequantize=True")
else:
    print("[WARN] FP8 Quantizer not found in AUTO_QUANTIZER_MAPPING")

启动命令

SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
PROJECT_DIR="$(cd "$SCRIPT_DIR/../../../.." && pwd)"
MODEL_PATH="$HOME/llm/mistralai/Ministral-3-8B-Instruct-2512"
TRAIN_DATA="$PROJECT_DIR/tmp/gsm8k/gsm8k_train_swift.jsonl"
OUTPUT_DIR="$PROJECT_DIR/tmp/mistral3-8b-sft-swift"
REGISTER_SCRIPT="$SCRIPT_DIR/my_register.py"

# HF 镜像
export HF_ENDPOINT=https://hf-mirror.com

mkdir -p "$OUTPUT_DIR"

echo "========================================"
echo "ms-swift SFT Training - Ministral-3-8B"
echo "========================================"
echo "model:       $MODEL_PATH"
echo "data:        $TRAIN_DATA"
echo "output:      $OUTPUT_DIR"
echo "register:    $REGISTER_SCRIPT"
echo "max_steps:   200"
echo "========================================"

# 检查模型路径是否存在
if [ ! -d "$MODEL_PATH" ]; then
    echo "ERROR: Model path not found: $MODEL_PATH"
    exit 1
fi

# 检查数据文件是否存在
if [ ! -f "$TRAIN_DATA" ]; then
    echo "ERROR: Training data not found: $TRAIN_DATA"
    echo "Please run prepare_data.py first."
    exit 1
fi

# 激活 swift conda 环境并运行训练
source ~/miniconda3/etc/profile.d/conda.sh && conda activate swift && \
swift sft \
    --model "$MODEL_PATH" \
    --model_type mistral3 \
    --tuner_type full \
    --template mistral_2512_text \
    --custom_register_path "$REGISTER_SCRIPT" \
    --dataset "$TRAIN_DATA" \
    --max_length 2048 \
    --per_device_train_batch_size 1 \
    --learning_rate 2e-5 \
    --optim adamw_torch \
    --weight_decay 0 \
    --max_steps 200 \
    --warmup_steps 2 \
    --gradient_accumulation_steps 1 \
    --num_train_epochs 1 \
    --bf16 true \
    --seed 23 \
    --max_grad_norm -1 \
    --output_dir "$OUTPUT_DIR" \
    --logging_steps 1 \
    --save_strategy no \
    2>&1 | tee "$OUTPUT_DIR/training.log"

…l-ministral3

paddle-bot · 2026-04-16T07:00:53Z

Thanks for your contribution!

codecov-commenter · 2026-04-16T08:57:45Z

Codecov Report

❌ Patch coverage is 61.56584% with 216 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (develop@e660ee5). Learn more about missing BASE report.

Files with missing lines	Patch %	Lines
paddleformers/transformers/ministral3/modeling.py	55.55%	208 Missing ⚠️
...leformers/transformers/ministral3/configuration.py	92.77%	6 Missing ⚠️
paddleformers/cli/utils/llm_utils.py	0.00%	2 Missing ⚠️

❌ Your patch status has failed because the patch coverage (61.56%) is below the target coverage (75.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

@@            Coverage Diff             @@
##             develop    #4293   +/-   ##
==========================================
  Coverage           ?   35.00%           
==========================================
  Files              ?      478           
  Lines              ?    89900           
  Branches           ?        0           
==========================================
  Hits               ?    31467           
  Misses             ?    58433           
  Partials           ?        0

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

learncat163 added 5 commits April 15, 2026 19:28

支持Ministral3

f4ff8c0

Merge remote-tracking branch 'upstream/develop' into feature/add-mode…

e4c4303

…l-ministral3

支持Ministral3

200fa96

Merge remote-tracking branch 'upstream/develop' into feature/add-mode…

41586a8

…l-ministral3

支持Ministral3

5c81aa9

paddle-bot Bot added the contributor label Apr 16, 2026

learncat163 added 4 commits April 16, 2026 15:11

修复风格检查问题

cc00b39

修复风格检查问题

1efbb26

修复风格检查问题

3de18e9

修复ci报告的报错

080d89c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature[add model mistralai/ministral3]#4293

Feature[add model mistralai/ministral3]#4293
learncat163 wants to merge 9 commits intoPaddlePaddle:developfrom
learncat163:feature/add-model-ministral3

learncat163 commented Apr 16, 2026

Uh oh!

paddle-bot Bot commented Apr 16, 2026

Uh oh!

codecov-commenter commented Apr 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

learncat163 commented Apr 16, 2026

PR 新增 mistralai系列的ministral3 模型

权重信息

精度对齐

token top 10 对齐

最后一层输出的logits diff

微调loss下降对比

Uh oh!

paddle-bot Bot commented Apr 16, 2026

Uh oh!

codecov-commenter commented Apr 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants