-
Notifications
You must be signed in to change notification settings - Fork 604
Pull requests: EvolvingLMMs-Lab/lmms-eval
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: add ExtremeWhenBench (hour-scale natural-language temporal grounding)
#1367
opened Jun 16, 2026 by
min1321
Loading…
1 of 7 tasks
feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks
#1366
opened Jun 16, 2026 by
min1321
Loading…
1 of 7 tasks
[ICLR 2026] XmodBench. New MCQ benchmark + omni-LLM interleave wrappers
#1365
opened Jun 14, 2026 by
XingruiWang
Loading…
3 of 7 tasks
Add Qwen-native JSON coordinate variants for pointing tasks
#1361
opened Jun 5, 2026 by
njb-nvidia
Contributor
Loading…
fix: guard choices[0] and message=None before content access (41 sites, 32 files)
#1332
opened May 17, 2026 by
qizwiz
Loading…
feat: add Bedrock and local vLLM providers for llm_judge
#1298
opened Apr 14, 2026 by
ShownX
Loading…
Fix missing Task import for type annotation in evaluator
#1291
opened Apr 10, 2026 by
luv-oct22
Loading…
2 tasks
feat: add physics reasoning benchmarks (PhysBench, ContPhy, PhysGame, PhysicsRW, PhysReason)
#1272
opened Mar 26, 2026 by
Luodian
Contributor
Loading…
4 tasks
feat: add VBench video generation evaluation benchmark
#1271
opened Mar 26, 2026 by
Luodian
Contributor
Loading…
3 tasks
feat: add MiniMax as LLM judge provider (default model: MiniMax-M3)
#1263
opened Mar 22, 2026 by
octo-patch
Loading…
3 tasks done
ProTip!
Follow long discussions with comments:>50.