Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The weights after sft of video data cannot be inferred #33

Open
orzgugu opened this issue Dec 11, 2024 · 3 comments
Open

The weights after sft of video data cannot be inferred #33

orzgugu opened this issue Dec 11, 2024 · 3 comments

Comments

@orzgugu
Copy link

orzgugu commented Dec 11, 2024

I used the LongVU/scripts/train_video_qwen.sh script to perform SFT on the video data. I used the checkpoint obtained from the training to run inference with the inference code you provided, but an error occurred. The error message is as follows:

Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. Loading checkpoint shards: 0%| | 0/6 [00:05<?, ?it/s] Traceback (most recent call last): File "/.../LongVU/scripts/inference_video.py", line 20, in <module> tokenizer, model, image_processor, context_len = load_pretrained_model( File "/.../LongVU/longvu/builder.py", line 159, in load_pretrained_model model = CambrianQwenForCausalLM.from_pretrained( File "/.../conda_env/longuv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3838, in from_pretrained ) = cls._load_pretrained_model( File "/.../conda_env/longuv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 4298, in _load_pretrained_model new_error_msgs, offload_index, state_dict_index = _load_state_dict_into_meta_model( File "/.../conda_env/longuv/lib/python3.10/site-packages/transformers/modeling_utils.py", line 895, in _load_state_dict_into_meta_model set_module_tensor_to_device(model, param_name, param_device, **set_module_kwargs) File "/.../conda_env/longuv/lib/python3.10/site-packages/accelerate/utils/modeling.py", line 373, in set_module_tensor_to_device raise ValueError( ValueError: Trying to set a tensor of shape torch.Size([544997376]) in "weight" (which has shape torch.Size([152064, 3584])), this looks incorrect.

I am not sure if this is due to the inference code you provided being incompatible with the checkpoint after SFT. I really need your help! It’s very urgent, and I would greatly appreciate your assistance!

@Amshaker
Copy link

Same problem.

@orzgugu
Copy link
Author

orzgugu commented Dec 17, 2024

Same problem.

Delete the safetensors suffix file, delete the model.safetensors.index.json file, and rename pytorch_model_fsdp.bin to pytorch_model.bin

@HenryHZY
Copy link

HenryHZY commented Jan 4, 2025

Same problem.

Delete the safetensors suffix file, delete the model.safetensors.index.json file, and rename pytorch_model_fsdp.bin to pytorch_model.bin

This works for me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants