eval.defaultsdefines inference parameters shared by every dataset entry. Override them inside an individual dataset block if needed.eval.datasetsenumerates the datasets to evaluate. Each entry should specify:name: a short identifier that appears in logs and dashboards.path: the path to the dataset JSONL file.rm_type: which reward function to use for scoring.n_samples_per_eval_prompt: how many candidate completions to generate per prompt.
- When
ifbenchis used,slime/rollout/rm_hub/ifbench.pywill automatically prepares the scoring environment, so no additional manual setup is required beyond providing the dataset path.