EvolvingLMMs-Lab / lmms-eval Public

Notifications You must be signed in to change notification settings
Fork 604
Star 4.2k

Code
Issues 25
Pull requests 16
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: EvolvingLMMs-Lab/lmms-eval

Labels 14 Milestones 0

New pull request New

16 Open 820 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

feat: add ExtremeWhenBench (hour-scale natural-language temporal grounding)

#1367 opened Jun 16, 2026 by min1321

Loading…

1 of 7 tasks

feat(openai): add pass_video_url and enable_thinking_kwarg for vLLM-served video tasks

#1366 opened Jun 16, 2026 by min1321

Loading…

1 of 7 tasks

[ICLR 2026] XmodBench. New MCQ benchmark + omni-LLM interleave wrappers

#1365 opened Jun 14, 2026 by XingruiWang

Loading…

3 of 7 tasks

feat(vstat): add VSTAT benchmark task

#1363 opened Jun 7, 2026 by pinzhihuang

Loading…

6 tasks

Add Qwen-native JSON coordinate variants for pointing tasks

#1361 opened Jun 5, 2026 by njb-nvidia Contributor

Loading…

fix: guard choices[0] and message=None before content access (41 sites, 32 files)

#1332 opened May 17, 2026 by qizwiz

Loading…

Feat/ollama model

#1322 opened May 5, 2026 by eliasubz

Loading…

1 of 7 tasks

feat: vLLM-Omni for video generation models

#1314 opened Apr 27, 2026 by pufanyi Collaborator • Draft

fix(evaluator): auto-init gloo process group for multi-rank launches

#1306 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

fix(api/task): guard empty results list in process_results

#1305 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

fix(eval): abort all ranks on per-rank exception instead of deadlocking

#1304 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

feat: add Bedrock and local vLLM providers for llm_judge

#1298 opened Apr 14, 2026 by ShownX

Loading…

Fix missing Task import for type annotation in evaluator

#1291 opened Apr 10, 2026 by luv-oct22

Loading…

2 tasks

feat: add physics reasoning benchmarks (PhysBench, ContPhy, PhysGame, PhysicsRW, PhysReason)

#1272 opened Mar 26, 2026 by Luodian Contributor

Loading…

4 tasks

feat: add VBench video generation evaluation benchmark

#1271 opened Mar 26, 2026 by Luodian Contributor

Loading…

3 tasks

feat: add MiniMax as LLM judge provider (default model: MiniMax-M3)

#1263 opened Mar 22, 2026 by octo-patch

Loading…

3 tasks done

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!