Skip to content

使用A100启动的时候报这个错误,如何解决? #132

@simonjhy

Description

@simonjhy

System Info / 系統信息

基于https://github.com/vllm-project/vllm/issues/32637, 使用vllm的docker镜像加载模型

Who can help? / 谁可以帮助到您?

使用vllm的dockerj来加载,

报下面的错误
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] WorkerProc failed to start.
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] Traceback (most recent call last):
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 737, in worker_main
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] worker = WorkerProc(*args, **kwargs)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/executor/multiproc_executor.py", line 575, in init
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] self.worker.load_model()
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_worker.py", line 274, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/v1/worker/gpu_model_runner.py", line 3839, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] self.model = model_loader.load_model(
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] ^^^^^^^^^^^^^^^^^^^^^^^^
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/base_loader.py", line 58, in load_model
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] self.load_weights(model, model_config)
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] File "/usr/local/lib/python3.12/dist-packages/vllm/model_executor/model_loader/default_loader.py", line 301, in load_weights
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] raise ValueError(
(Worker_TP1 pid=339) ERROR 01-21 20:31:50 [multiproc_executor.py:766] ValueError: Following weights were not initialized from checkpoint: {'model.layers.35.self_attn.q_b_proj.weight', 'model.layers.25.self_attn.kv_b_proj.weight', 'model.layers.35.post_attention_layernorm.weight', 'model.layers.35.input_layernorm.weight', 'model.layers.25.self_attn.q_b_proj.weight', 'model.layers.25.self_attn.kv_a_proj_with_mqa.weight', 'model.layers.25.post_attention_layernorm.weight', 'model.layers.25.self_attn.kv_a_layernorm.weight', 'model.layers.35.mlp.experts.w13_weight', 'model.layers.25.mlp.experts.w2_weight', 'model.layers.25.mlp.shared_experts.gate_up_proj.weight', 'model.layers.35.mlp.gate.e_score_correction_bias', 'model.layers.35.mlp.shared_experts.down_proj.weight', 'model.layers.35.mlp.shared_experts.gate_up_proj.weight', 'model.layers.25.mlp.gate.e_score_correction_bias', 'model.layers.25.self_attn.q_a_layernorm.weight', 'model.layers.25.self_attn.o_proj.weight', 'model.layers.25.self_attn.q_a_proj.weight', 'model.layers.25.mlp.shared_experts.down_proj.weight', 'model.layers.35.self_attn.q_a_layernorm.weight', 'model.layers.35.self_attn.o_proj.weight', 'model.layers.35.self_attn.kv_a_proj_with_mqa.weight', 'model.layers.25.mlp.experts.w13_weight', 'model.layers.35.self_attn.kv_a_layernorm.weight', 'model.layers.25.input_layernorm.weight', 'model.layers.35.self_attn.q_a_proj.weight', 'model.layers.25.mlp.gate.weight', 'model.layers.35.self_attn.kv_b_proj.weight', 'model.layers.35.mlp.experts.w2_weight', 'model.layers.35.mlp.gate.weight'}j

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

这个docker compose的配置。
environment:
VLLM_USE_FLASH_ATTN: "1"
#VLLM_USE_TRITON: "0"
#VLLM_DISABLE_CUDA_GRAPHS: "1"
VLLM_ATTENTION_BACKEND: "FLASH_ATTN"
#VLLM_USE_FLASHINFER_SAMPLER: "0"
#NVIDIA_TF32_OVERRIDE: "0"
ports:
- "8888:8000"
volumes:
- /data/ai/models/vllm:/models
command:
--tensor-parallel-size 2
--gpu-memory-utilization 0.95
--model /models/GLM-4.7-Flash
--max-num-batched-tokens 65536
--block-size 16
--max-model-len 32768
--max_num_seqs 16
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--trust-remote-code
--served-model-name GLM_4.7_30B

Expected behavior / 期待表现

期待能正常启动没有报错=

Metadata

Metadata

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions