Skip to content

多模态GRPO/OPD,端口错误,loss变成0, rollout持续输出!!!!!AttributeError: 'GKDTrainer' object has no attribute 'is_multimodal' #9439

@Asunatan

Description

@Asunatan

Checklist / 检查清单

  • I have searched existing issues, and this is a new question or discussion topic. / 我已经搜索过现有的 issues,确认这是一个新的问题与讨论。

Question Description / 问题描述

您好,我刚开始接触多模态OPD这个方向,并且是lora+SFT之后再进行lora+OPD,想要恢复指令跟随的能力,我的lora用在了LLM,下面是我的脚本:
启动教师模型:

MAX_PIXELS=602112 \
CUDA_VISIBLE_DEVICES=0 \
vllm serve /home/user02/SCY/Model/Qwen3.6-27B \
    --host 10.116.39.70 \
    --port 6000 \
    --tensor-parallel-size 1 \
    --max-model-len 20480 \
    --gpu-memory-utilization 0.9 \
    --max-logprobs 64 \
    --limit-mm-per-prompt '{"image": 100,"video": 1}' \
    --reasoning-parser qwen3

启动Async rollout server:

 MAX_PIXELS=602112 \
 CUDA_VISIBLE_DEVICES=2 \
 ROOT_IMAGE_DIR=/home/user02/SCY/thyroid_benchmark_desensitization \
 swift rollout \
     --model /home/user02/SCY/Model/Qwen3.5-9B \
     --vllm_use_async_engine true \
     --vllm_max_model_len 20480 \
     --vllm_gpu_memory_utilization 0.9 \
     --vllm_enable_lora true \
     --host 10.116.39.70 \
     --port 6001 \
     --vllm_max_lora_rank 64 \
     --vllm_limit_mm_per_prompt '{"image": 100,"video": 1}'

这里我发现一个问题,我指定port 是6001,但是在启动之后我去看log,显示的是:

Image Image

他报了很多警告,例如(EngineCore pid=235500) WARNING 05-28 16:32:28 [model_manager.py:373] Regarding Qwen3_5ForConditionalGeneration, no matching PunicaWrapper is found; visual.blocks.25.attn.qkv will be ignored.,问了chat,说要设置--disable-punica??不确定,好像没影响。此外他变成了6007 端口,我明明设置的6001,好奇怪,但是这不重要,我只需要在OPD启动脚本里面--vllm_server_port 6007
启动OPD训练:

PYTORCH_CUDA_ALLOC_CONF='expandable_segments:True' \
NNODES=1 \
NODE_RANK=0 \
NPROC_PER_NODE=4 \
MAX_PIXELS=602112 \
MASTER_PORT=10086 \
ROOT_IMAGE_DIR=/home/user02/SCY/thyroid_benchmark_desensitization \
CUDA_VISIBLE_DEVICES=4,5,6,7 \
swift rlhf \
    --rlhf_type gkd \
    --model /home/user02/SCY/Model/Qwen3.5-9B \
    --enable_thinking true \
    --teacher_model_server http://10.116.39.70:6000 \
    --gkd_logits_topk 64 \
    --tuner_type lora \
    --lora_rank 64 \
    --lora_alpha 128 \
    --target_modules all-linear \
    --freeze_vit True \
    --freeze_aligner True \
    --freeze_llm false \
    --adapters /home/user02/SCY/thyroid_benchmark/code/swift/checkpoint/v6-20260516-002949/checkpoint-1042 \
    --use_vllm true \
    --vllm_mode server \
    --vllm_server_host 10.116.39.70 \
    --vllm_server_port 6007 \
    --max_completion_length 10240 \
    --max_length 10240 \
    --truncation_strategy delete \
    --lmbda 1 \
    --seq_kd false \
    --beta 0.5 \
    --dataset  /home/user02/SCY/thyroid_benchmark/code/swift/data_json/train_RL.json \
    --attn_impl flash_attention_2  \
    --torch_dtype bfloat16 \
    --load_from_cache_file True \
    --split_dataset_ratio 0.0 \
    --num_train_epochs 1 \
    --per_device_train_batch_size 1 \
    --per_device_eval_batch_size 1 \
    --learning_rate 1e-5 \
    --gradient_accumulation_steps 8 \
    --save_steps 20 \
    --logging_steps 1 \
    --warmup_ratio 0.1 \
    --dataloader_num_workers 32 \
    --output_dir /home/user02/SCY/thyroid_benchmark/code/swift/checkpoint/OPD \
    --dataset_num_proc 128 \
    --deepspeed zero2 \
    --packing True \
    --report_to tensorboard

我遇到一个更奇怪的问题,loss在第二次更新之后直接变成0了:

Image

想要请教我的三个脚本参数设置是不是有错误?我366天25小时在线,期待回复

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions