You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in your blog post, you mentioned "The new benchmark uses the 225 problems that were solved by 3 or fewer models. "
do you have the data on which problems were solved by which models? i looked here but it only seems to be the summaries.
it would be helpful to see this data to help me partition the benchmark into easy/medium/hard problems. i'm also interested in running optimizations to get a specific model to overcome problems it previously got wrong (without having to run the whole benchmark every time, which for some models is expensive).
thanks!
The text was updated successfully, but these errors were encountered:
@paul-gauthier - i'm really inspired by this benchmark!
in your blog post, you mentioned "The new benchmark uses the 225 problems that were solved by 3 or fewer models. "
do you have the data on which problems were solved by which models? i looked here but it only seems to be the summaries.
it would be helpful to see this data to help me partition the benchmark into easy/medium/hard problems. i'm also interested in running optimizations to get a specific model to overcome problems it previously got wrong (without having to run the whole benchmark every time, which for some models is expensive).
thanks!
The text was updated successfully, but these errors were encountered: