使用A100启动的时候报这个错误，如何解决？

### System Info / 系統信息

基于https://github.com/vllm-project/vllm/issues/32637， 使用vllm的docker镜像加载模型

### Who can help? / 谁可以帮助到您？

使用vllm的dockerj来加载，

报下面的错误
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] WorkerProc failed to start.
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] Traceback (most recent call last):
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 737, in worker_main
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     worker = WorkerProc(*args, **kwargs)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]              ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 575, in __init__
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     self.worker.load_model()
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 274, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3839, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     self.model = model_loader.load_model(
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]                  ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 58, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     self.load_weights(model, model_config)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]   File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 301, in load_weights
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766]     raise ValueError(
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] ValueError: Following weights were not initialized from checkpoint: {'model.layers.35.self_attn.q_b_proj.weight', 'model.layers.25.self_attn.kv_b_proj.weight', 'model.layers.35.post_attention_layernorm.weight', 'model.layers.35.input_layernorm.weight', 'model.layers.25.self_attn.q_b_proj.weight', 'model.layers.25.self_attn.kv_a_proj_with_mqa.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.25.self_attn.kv_a_layernorm.weight', 'model.layers.35.mlp.experts.w13_weight', 'model.layers.25.mlp.experts.w2_weight', 'model.layers.25.mlp.shared_experts.gate_up_proj.weight', 'model.layers.35.mlp.gate.e_score_correction_bias', 'model.layers.35.mlp.shared_experts.down_proj.weight', 'model.layers.35.mlp.shared_experts.gate_up_proj.weight', 'model.layers.25.mlp.gate.e_score_correction_bias', 'model.layers.25.self_attn.q_a_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_a_proj.weight', 'model.layers.25.mlp.shared_experts.down_proj.weight', 'model.layers.35.self_attn.q_a_layernorm.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.35.self_attn.kv_a_proj_with_mqa.weight', 'model.layers.25.mlp.experts.w13_weight', 'model.layers.35.self_attn.kv_a_layernorm.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.35.self_attn.q_a_proj.weight', 'model.layers.25.mlp.gate.weight', 'model.layers.35.self_attn.kv_b_proj.weight', 'model.layers.35.mlp.experts.w2_weight', 'model.layers.35.mlp.gate.weight'}j

### Information / 问题信息

- [x] The official example scripts / 官方的示例脚本
- [x] My own modified scripts / 我自己修改的脚本和任务

### Reproduction / 复现过程

这个docker compose的配置。
environment:
      VLLM_USE_FLASH_ATTN: "1"
      #VLLM_USE_TRITON: "0"
      #VLLM_DISABLE_CUDA_GRAPHS: "1"
      VLLM_ATTENTION_BACKEND: "FLASH_ATTN"
      #VLLM_USE_FLASHINFER_SAMPLER: "0"
      #NVIDIA_TF32_OVERRIDE: "0"
    ports:
      - "8888:8000"
    volumes:
      - /data/ai/models/vllm:/models
    command:
      --tensor-parallel-size 2
      --gpu-memory-utilization 0.95
      --model /models/GLM-4.7-Flash
      --max-num-batched-tokens 65536
      --block-size 16
      --max-model-len 32768
      --max_num_seqs 16
      --tool-call-parser glm47
      --reasoning-parser glm45
      --enable-auto-tool-choice
      --trust-remote-code
      --served-model-name GLM_4.7_30B

### Expected behavior / 期待表现

期待能正常启动没有报错=

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

使用A100启动的时候报这个错误，如何解决？ #132

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

使用A100启动的时候报这个错误，如何解决？ #132

Description

System Info / 系統信息

Who can help? / 谁可以帮助到您？

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions