Skip to content

Nano3 full Evaluation config release #167

@hieuchi911

Description

@hieuchi911

Is your feature request related to a problem? Please describe.
The eval configuration for Nano3 currently includes only the following tasks: adlr_mmlu, adlr_arc_challenge_llama_25_shot, adlr_winogrande_5_shot, hellaswag, and openbookqa. I don’t see configurations for the benchmarks reported in Table 3 of the Nemotron 3 Nano paper.

I tried manually configuring LiveCodeBench, SciCode, IFBench, RULER, MMLU-Pro, and MMLU-Prox, and reran evaluation on nvidia/NVIDIA-Nemotron-3-Nano-30B-A3B-BF16 with vllm deployment, but my results didn’t match the reported numbers. I suspect this is due to differences in the evaluation setup.

Could you point me to the full evaluation configuration used for the reported benchmarks? I wasn’t able to find detailed information about the eval setup in the paper.

Any guidance would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions