Skip to content

feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks#1366

Open
min1321 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
min1321:feat/openai-pass-video-url
Open

feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks#1366
min1321 wants to merge 1 commit into
EvolvingLMMs-Lab:mainfrom
min1321:feat/openai-pass-video-url

Conversation

@min1321

@min1321 min1321 commented Jun 16, 2026

Copy link
Copy Markdown

Summary

  • Add two opt-in init params to OpenAICompatible: pass_video_url and enable_thinking_kwarg, so a vLLM-served Qwen3-VL / Qwen3.5-VL backend can do server-side video decoding instead of forcing client-side frame extraction.
  • pass_video_url=True sends each video as {"type": "video_url", "video_url": {"url": "file://..."}} so vLLM can apply media_io_kwargs.num_frames and attach absolute-time signals; enable_thinking_kwarg forwards chat_template_kwargs.enable_thinking via extra_body.
  • Defaults are unchanged — existing tasks/configs are unaffected.

In scope

  • lmms_eval/models/chat/openai.py:
    • OpenAICompatible.__init__: accept pass_video_url: bool = False and enable_thinking_kwarg: object = None.
    • build_payload_for_index: when pass_video_url=True, build the OpenAI messages list manually (skip to_openai_messages) and emit each video as a video_url part. When either flag is set, populate payload["extra_body"] with media_io_kwargs and/or chat_template_kwargs.

Out of scope

  • Other adapters (vllm, sglang, huggingface, litellm, …). Same idea would apply but each has its own video path; happy to follow up if maintainers want.
  • Changes to qwen_vl_utils or protocol.py.
  • Audio / image handling — unchanged.

Validation

  • python -m lmms_eval --model openai --tasks extremewhenbench --model_args "...,pass_video_url=True,max_frames_num=768,enable_thinking_kwarg=False,..." against vLLM-served Qwen3.5-9B | sample size: N=2,273 | key metrics: mIoU | result: 0.048 with new flags vs. 0.003 default (existing path) — pass. Matches a hand-rolled openai-client reference (0.047) within run-to-run noise.
  • Default-path regression: existing tasks via --model openai without the new flags produce identical output | result: pass.

Risk / Compatibility

  • Defaults preserve existing behavior; new flags are opt-in. No new dependencies.
  • extra_body is OpenAI-API spec-compliant; non-vLLM backends that don't understand media_io_kwargs simply ignore it (server-specific).

Type of Change

  • Bug fix (non-breaking change)
  • New feature
  • New benchmark/task
  • New model integration
  • Breaking change
  • Documentation update
  • Refactoring (no functional changes)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant