-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
long videos inference error #17
Comments
when run inference.py,the output is !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! why? |
Hi, @cmhhw I have locally tested the model on a A100 80G and it gives the accurate output.
|
yes,
Yes, I use the code provided in "click for quick inference code," and the parameters loaded are LongVU_Qwen2_7B. I still encounter this issue. My environment is: torch=2.5.0, python=3.10.15, and the CUDA version is 12.4. Additionally, I conduct the test on an A100. At the same time, since my GPU has 40GB of VRAM, when executing inference, I noticed that the model is automatically being inferred across multiple GPUs. What should I do to avoid this ? |
Hi, @cmhhw, I am using torch==2.1.2 as shown in the conda env requirements.txt. You can set model.to('cuda:0'). |
@beatriceadel are you using the correct tokenizer as we provided in LongVU_Qwen2_7B? |
Thank you very much. Following your suggestions, I have resolved the issue of garbled output in long videos. However, I encountered some problems while reading the "quick inference code." Here is the code:
vr = VideoReader(video_path, ctx=cpu(0), num_threads=1)
fps = float(vr.get_avg_fps())
frame_indices = np.array([i for i inrange(0, len(vr), round(fps))])
video = []
for frame_index in frame_indices:
img = vr[frame_index].asnumpy()
video.append(img)
The above code retrieves certain frame indices from the original video and appends them to the video list, which is then used as input to the model. I am confused about how this method can help the model understand the entire video. Doesn't this way lose a lot of information?
At 2024-11-13 00:33:15, "XiaoqianShen" ***@***.***> wrote:
Hi, @cmhhw, I am using torch==2.1.2 as shown in the conda env requirements.txt. You can set model.to('cuda:0').
—
Reply to this email directly, view it on GitHub, or unsubscribe.
You are receiving this because you were mentioned.Message ID: ***@***.***>
|
Yes, i double checked and i did use the tokenizer provided in hf |
In this code, we sampled the video at 1fps, which is already dense sampling compare with most of the previous baselines limited to uniformly sample ~64 frames. |
Hi. I'm having the same problem about the model outputting a bunch of exclamations as you after I installed a newer version of pytorch than in the requirements.txt. I tried using the exact original requirements.txt (torch==2.1.2) but encountered an error about the method register_fake() not in torch.library UPDATE
the model still outputs !!!!!...
|
Ok, I see. We have never tested with 8bit. Maybe you need to change to a GPU with larger VRAM and inference with float16. |
Had the same issue where the model only produces exclamation marks on the provided llama-3B checkpoint, but things work fine on the provided llama-1B checkpoint. |
Actually doing inference with Update: the eval numbers are still much worse than the ones reported in the paper though. I was only able to get 37.85 on MLVU for llama-3B (instead of 55.9 in the paper). It's capability on long videos still seems limited. |
@yanlai00 You should set |
Hello, when loading the model for inference, it was found that the inference results on short videos meet expectations, but on long videos (the example provided in the project), the inference results produce a series of special characters.
this is my example
The text was updated successfully, but these errors were encountered: