Skip to content

HAE-RAE/hr-simple-evals

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

27 Commits
 
 
 
 
 
 
 
 

Repository files navigation

hr-simple-evals

This repository contains simple evaluation utilities for Korean language models.

Running evaluations

Use evaluation.py to run a model on a dataset from the HAERAE-HUB/KoSimpleEval collection. The basic command looks like:

python evaluation.py \
  --model <model-id-or-path> \
  --dataset <subset-name> \
  --dataset_hub_id HAERAE-HUB/KoSimpleEval \
  --temperature 0.7 \
  --top_p 0.9 \
  --max_tokens 1024

Supported subset names include:

  • ArenaHard
  • ClinicalQA
  • HRB1_0
  • KMMLU_Redux
  • KMMLU-Pro
  • KMMLU-HARD
  • KorMedLawQA
  • MCLM
  • gpqa-diamond (You Should use Idavidrein/gpqa - gpqa-diamond subset for evaluation)
  • KSM (You Should use HAERAE-HUB/HRM8K - KSM subset for evaluation)
  • AIME2024 (You Should use HuggingFaceH4/aime_2024 train subset for evaluation)
  • AIME2025 (You Should use yentinglin/aime_2025 default subset for evaluation)

Replace <model-id-or-path> with the Hugging Face model ID or a local checkpoint.

The script will generate responses using the specified model and evaluate them according to the dataset configuration defined in dataset_configs.py.

Using AIME datasets

You can evaluate models on the AIME (American Invitational Mathematics Examination) datasets:

AIME 2025

python evaluation.py \
  --model <model-id-or-path> \
  --dataset_hub_id yentinglin/aime_2025 \
  --split default \
  --temperature 0.0 \
  --max_tokens 1024

AIME 2024

python evaluation.py \
  --model <model-id-or-path> \
  --dataset_hub_id HuggingFaceH4/aime_2024 \
  --split train \
  --temperature 0.0 \
  --max_tokens 1024

These datasets contain challenging mathematics problems that test a model's mathematical reasoning capabilities.

About

hr-simple-evals

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages