-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some wrong cases #2
Comments
Hi indeed if you want to parse latex, it needs to be in latex environment (therefore wrapping it in any latex env notation like $$ or [ ] or others). What doesn't need to be wrapped are simple expression like 1/2 or 1.0222 that can be picked up from the string. Can you try reruning with the $$? And show if there are still some failure cases?
I don't see an easy way to adress. The way it works is that each target (latex or expr) has set of regexes which are used to identify the answer. All latex regexes require the latex environment to match, so if the models doesn't output it no latex parsing will be done. Why is done this way? Because recalling what the answer is from the text is incredibly hard using rule based parsing, here the LLM could be probably useful :). So yeah the only way to fix this imo is using LLM for recalling what the answer is.
Do you have an example ? I tuned the setting based on popular models and from what I have seen most of them can easily output the latex environment. Frankly if the model can't output latex env I am very doubtful about it's math abilities. There are two things that is parsable without latex environement: simple \frac and \boxed env |
Thank You for Your Prompt Reply! Let me first address the issue with the code snippet provided: eval_dict = [
{"pred": "$0.0833333333333333$", "gt": "$\\frac{1}{12}$"},
{"pred": "$1,4.5$", "gt": "$1,\\frac{9}{2}$"},
{"pred": "$\\frac{x}{7}+\\frac{2}{7}$",
"gt": "$\\frac{x+2}{7}$", "timeout": True},
{"pred": "$\\sec^2(y)$", "gt": "$\\tan^2(y)+1$", "timeout": True},
{"pred": "$\\begin{pmatrix}-\\frac{7}{4}&-2\\\\4&\\frac{1}{4}\\end{pmatrix}$",
"gt": "$(\\begin{pmatrix}-\\frac{7}{4}&-2\\\\4&\\frac{1}{4}\\\\\\end{pmatrix})$", "timeout": True},
{"pred": '$\\begin{pmatrix}\\frac{1}{3x^{2/3}}&0&0\\\\0&1&0\\\\-\\sin(x)&0&0\\end{pmatrix}$',
"gt": '$(\\begin{pmatrix}\\frac{1}{3\\sqrt[3]{x}^2}&0&0\\\\0&1&0\\\\-\\sin(x)&0&0\\\\\\end{pmatrix})$', "timeout": True},
{"pred": '$-\\frac{8x^2}{9(x^2-2)^{5/3}}+\\frac{2}{3(x^2-2)^{2/3}}$',
"gt": '$-\\frac{2(x^2+6)}{9(x^2-2)\\sqrt[3]{x^2-2}^2}$', "timeout": True},
{"pred": '$-34x-45y+20z-100=0$', "gt": '$34x+45y-20z+100=0$'},
{"pred": '$\\frac{100}{3}$', "gt": '$33.3$'},
{"pred": '$\\begin{pmatrix}0.290243531202435\\\\0.196008371385084\\\\-0.186381278538813\\end{pmatrix}$',
"gt": '$(\\begin{pmatrix}0.29\\\\0.196\\\\-0.186\\\\\\end{pmatrix})$', "timeout": True},
{"pred": '$\\frac{\\sqrt{\\sqrt{11}+\\sqrt{194}}}{2\\sqrt{33}+15}$',
"gt": '$\\frac{\\sqrt{\\sqrt{11}+\\sqrt{194}}}{15+2\\sqrt{33}}$', "timeout": True},
{"pred": '$(+5)(b+2)$', "gt": '$(a+5)(b+2)$', "timeout": True},
{"pred": '$\\frac{1+\\sqrt{5}}{2}$', "gt": '$2$', "timeout": True},
{"pred": '$\\frac{34}{16}+\\frac{\\sqrt{1358}}{16}$',
"gt": '$4$', "timeout": True},
{"pred": '$1$', "gt": '$1\\\\sqrt{19}$', "timeout": True},
{"pred": '$(0.6,2.6667]$',
"gt": "$(\\frac{3}{5},\\frac{8}{3}]$", "timeout": True},
{"pred": '$x+2n+1$', "gt": '$x+1$', "timeout": True},
{"pred": "$1$", "gt": "$2\\frac{1}{2}$"}
] And the output is:
|
|
The mixed fractions should be working now :) |
Thank you for your excellent work on this project! I have been testing the evaluation functionality with various mathematical cases, but I noticed some discrepancies in the results.
Here is the test case I used:
The output I received is:
The evaluation results of 1,2,3,5,6,10,14,15,17 are all wrong.
The text was updated successfully, but these errors were encountered: