Uniformize LlavaNextVideoProcessor kwargs #35613

yonigozlan · 2025-01-10T16:30:29Z

What does this PR do?

Adds uniformized processors following #31911 for LlavaNextVideoProcessor .

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

HuggingFaceDocBuilderDev · 2025-01-10T16:57:41Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

zucchini-nlp

Sorry for late review, forgot about this PR.

Cool that we're standardizing video LLMs. Overall LGTM, we just need a few tests with video processors to make sure nothing breaks

zucchini-nlp · 2025-01-13T12:34:57Z

src/transformers/models/llava_next_video/processing_llava_next_video.py

        images: ImageInput = None,
+        text: Union[TextInput, PreTokenizedInput, List[TextInput], List[PreTokenizedInput]] = None,
+        audio=None,


tiny concern about squishing in audio before the videos, but I don't think any user passes videos as positional arg so maybe we are oke

I don't want to add more complexity by trying to validate the order of audio and video now, so let's leave as is and just take this noted

zucchini-nlp · 2025-01-13T12:35:24Z

src/transformers/models/llava_onevision/processing_llava_onevision.py

@@ -39,7 +39,7 @@ class LlavaOnevisionProcessorKwargs(ProcessingKwargs, total=False):
            "padding": False,
        },
        "image_kwargs": {},
-        "video_kwargs": {},
+        "videos_kwargs": {},


zucchini-nlp · 2025-01-13T12:36:00Z

src/transformers/pipelines/image_text_to_text.py

+        model_inputs = self.processor(images=images, text=text, return_tensors=self.framework, **processing_kwargs).to(
+            dtype=self.torch_dtype
+        )


am I right that we don't need legacy=False anymore?

I ended up keeping it because I had put v5.0.0 as the deprecation version

zucchini-nlp · 2025-01-13T12:37:04Z

tests/models/llava_next_video/test_processor_llava_next_video.py

+    def test_processor_to_json_string(self):
+        processor = self.get_processor()
+        obj = json.loads(processor.to_json_string())
+        print(processor)


Oops thanks for catching that :)

zucchini-nlp · 2025-01-13T12:39:34Z

tests/models/llava_next_video/test_processor_llava_next_video.py

+
+
+@require_vision
+class LlavaNextVideoProcessorTest(ProcessorTesterMixin, unittest.TestCase):


afaik ProcessorTesterMixin doesn't test videos_kwargs yet. I think we need to add video tests to make sure that llava-next-video processor works as expected

Indeed! I added tests for video_kwargs, very similar to those on images_kwargs :)

tests/models/llava_next_video/test_processor_llava_next_video.py

yonigozlan · 2025-01-14T19:58:05Z

Thanks for the feedback @zucchini-nlp !
I also just noticed that there is an issue with LlavaOneVision processor test, more specifically test_chat_template_dict seems to fail, with this error:

src/transformers/models/llava_onevision/processing_llava_onevision.py:167: in __call__
    one_video = to_numpy_array(video_inputs.get("pixel_values_videos")[0])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

    def to_numpy_array(img) -> np.ndarray:
        if not is_valid_image(img):
>           raise ValueError(f"Invalid image type: {type(img)}")
E           ValueError: Invalid image type: <class 'list'>

src/transformers/image_utils.py:231: ValueError

This is the case in main as well so seems unrelated to this PR.

zucchini-nlp · 2025-01-15T08:02:40Z

Yep, will be fixed by #35660

ArthurZucker

cool thanks!

…xt-video-processor

yonigozlan · 2025-02-14T17:09:39Z

I had to make some modifications to llava_next_video processor tests following this PR #35953 .
@zucchini-nlp Could you confirm these modifications are fine? I had to add some return_tensors=None in the chat_template tests to have consistent output types when comparing.

zucchini-nlp

Yep, LGTM, thanks!

molbap

LGTM as well - for the processing common tests for videos there's a couple models that were recently/will be soon merged, would be cool to check if they work !

molbap · 2025-02-18T11:18:26Z

src/transformers/models/llava_next_video/processing_llava_next_video.py

+        "image_kwargs": {},
+        "videos_kwargs": {},


Not sure you have to specify an empty dictionary here!

yonigozlan requested review from molbap, qubvel, Rocketknight1 and ArthurZucker as code owners January 10, 2025 16:30

yonigozlan mentioned this pull request Jan 10, 2025

Uniform kwargs for processors #31911

Open

40 tasks

yonigozlan requested a review from zucchini-nlp January 10, 2025 16:31

zucchini-nlp reviewed Jan 14, 2025

View reviewed changes

yonigozlan added 2 commits January 14, 2025 19:53

Uniformize processor kwargs and add tests

7a411e9

add videos_kwargs tests

fd69d5c

yonigozlan force-pushed the uniformize-llava-next-video-processor branch from a75909c to fd69d5c Compare January 14, 2025 19:54

fix copies

ef23921

qubvel removed their request for review January 20, 2025 18:23

ArthurZucker approved these changes Feb 13, 2025

View reviewed changes

yonigozlan added 3 commits February 13, 2025 22:24

Merge remote-tracking branch 'upstream/main' into uniformize-llava-ne…

5840121

…xt-video-processor

Merge remote-tracking branch 'upstream/main' into uniformize-llava-ne…

fe3cf47

…xt-video-processor

fix llava_next_video chat template tests

8992cac

yonigozlan requested a review from zucchini-nlp February 14, 2025 23:13

Merge branch 'main' into uniformize-llava-next-video-processor

269235b

zucchini-nlp approved these changes Feb 17, 2025

View reviewed changes

molbap approved these changes Feb 18, 2025

View reviewed changes

yonigozlan added 3 commits February 18, 2025 12:22

Merge branch 'main' into uniformize-llava-next-video-processor

6f7e3a7

remove unnecessary default kwargs

1460bbf

Merge branch 'main' into uniformize-llava-next-video-processor

e885600

yonigozlan merged commit 9b479a2 into huggingface:main Feb 18, 2025
25 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uniformize LlavaNextVideoProcessor kwargs #35613

Uniformize LlavaNextVideoProcessor kwargs #35613

yonigozlan commented Jan 10, 2025

HuggingFaceDocBuilderDev commented Jan 10, 2025

zucchini-nlp left a comment

zucchini-nlp Jan 13, 2025

zucchini-nlp Jan 13, 2025

zucchini-nlp Jan 13, 2025

yonigozlan Jan 14, 2025

zucchini-nlp Jan 13, 2025

yonigozlan Jan 14, 2025

zucchini-nlp Jan 13, 2025

yonigozlan Jan 14, 2025

yonigozlan commented Jan 14, 2025

zucchini-nlp commented Jan 15, 2025

ArthurZucker left a comment

yonigozlan commented Feb 14, 2025

zucchini-nlp left a comment

molbap left a comment

molbap Feb 18, 2025



		@require_vision
		class LlavaNextVideoProcessorTest(ProcessorTesterMixin, unittest.TestCase):

Uniformize LlavaNextVideoProcessor kwargs #35613

Uniformize LlavaNextVideoProcessor kwargs #35613

Conversation

yonigozlan commented Jan 10, 2025

What does this PR do?

Who can review?

HuggingFaceDocBuilderDev commented Jan 10, 2025

zucchini-nlp left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yonigozlan commented Jan 14, 2025

zucchini-nlp commented Jan 15, 2025

ArthurZucker left a comment

Choose a reason for hiding this comment

yonigozlan commented Feb 14, 2025

zucchini-nlp left a comment

Choose a reason for hiding this comment

molbap left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment