This folder contains the open-source code release for MMR-Bench (offline, cost-aware multimodal LLM routing).
MMR-Bench is a comprehensive benchmark designed to evaluate multimodal LLM routing under diverse settings.
It supports systematic comparison across scenarios and provides analysis beyond single-dataset evaluation (e.g., cost–accuracy trade-offs, cross-dataset generalization, and modality transfer).
From this directory:
pip install -e .Optional environment variables:
HF_HOME: Hugging Face cache directory (defaults to~/.cache/huggingface).
If you want CLIP/OpenCLIP embeddings (text+image) for the baseline routers:
pip install -e '.[embedding]'Generate a tiny synthetic offline benchmark (CSV + images), then run a router:
python scripts/make_toy_data.py
mmrbench --data-root data/toy --dataset toy --mode 22 --router kmeansnewYou can also run via module entrypoint:
python -m mmrbench --data-root data/toy --dataset toy --mode 22 --router kmeansnewThis writes a cost–accuracy curve to outputs/ and prints a JSON summary including nAUC, Ps, and QNC.
The full benchmark is distributed as:
- image folders (e.g.
MathVerse/,SEEDBenchv2Plus/, …) - a merged outcomes table
MMR_Bench.csv
Place them under data/ (see data/README.md). Example run:
python -m mmrbench --data-root data --dataset ocrbench+seedbench+mmstar --mode 22 --router linearmfOptional: download helper script (requires HF access):
export HF_HOME=~/.cache/huggingface # optional
pip install -e '.[hf]'
python scripts/prepare_hf_mmr_bench.py --dest dataMMR-Bench is evaluated offline: for each instance and each candidate model, you provide:
question(string)img_path(string; optional but recommended)- for each model name
M:M_correct(0/1)M_cost(float; any consistent cost unit)
You can route across multiple datasets by concatenating names with + (e.g. ocrbench+mathvista). By default the loader expects:
- CSV at
<data-root>/<dataset>.csv - optional images under
<data-root>/<dataset>/
If <data-root>/MMR_Bench.csv exists and --dataset is a +-separated subset of:
{ocrbench, seedbench, mmstar, realworldqa, mathvista, mathvision, mathverse},
the loader will use the merged CSV and infer image paths from dataset_idx.
This release focuses on the routing algorithms + offline evaluation. If you have the full MMR-Bench outcome tables from the paper (CSV/Parquet), point --data-root to them and run the corresponding routers/modes.
If you find MMR-Bench useful, please consider citing our paper





