data available for which problems were solved by which models? #3

rawwerks · 2025-01-02T19:41:33Z

@paul-gauthier - i'm really inspired by this benchmark!

in your blog post, you mentioned "The new benchmark uses the 225 problems that were solved by 3 or fewer models. "

do you have the data on which problems were solved by which models? i looked here but it only seems to be the summaries.

it would be helpful to see this data to help me partition the benchmark into easy/medium/hard problems. i'm also interested in running optimizations to get a specific model to overcome problems it previously got wrong (without having to run the whole benchmark every time, which for some models is expensive).

thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data available for which problems were solved by which models? #3

data available for which problems were solved by which models? #3

rawwerks commented Jan 2, 2025

data available for which problems were solved by which models? #3

data available for which problems were solved by which models? #3

Comments

rawwerks commented Jan 2, 2025