Nano3 full Evaluation config release

**Is your feature request related to a problem? Please describe.**
The [eval configuration](https://github.com/NVIDIA-NeMo/Nemotron/blob/02d6bd036ec364c34dc9f0145413c7fe72a890e6/src/nemotron/recipes/nano3/stage3_eval/config/default.yaml#L154) for Nano3 currently includes only the following tasks: `adlr_mmlu`, `adlr_arc_challenge_llama_25_shot`, `adlr_winogrande_5_shot`, `hellaswag`, and `openbookqa`. I don’t see configurations for the benchmarks reported in Table 3 of the Nemotron 3 Nano paper.

I tried manually configuring LiveCodeBench, SciCode, IFBench, RULER, MMLU-Pro, and MMLU-Prox, and reran evaluation on nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with `vllm` deployment, but my results didn’t match the reported numbers. I suspect this is due to differences in the evaluation setup.

Could you point me to the full evaluation configuration used for the reported benchmarks? I wasn’t able to find detailed information about the eval setup in the paper.

Any guidance would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nano3 full Evaluation config release #167

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Nano3 full Evaluation config release #167

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions