Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow precission inference from gold #4

Closed
hynky1999 opened this issue Jan 28, 2025 · 1 comment
Closed

Allow precission inference from gold #4

hynky1999 opened this issue Jan 28, 2025 · 1 comment

Comments

@hynky1999
Copy link
Collaborator

Thank You for Your Prompt Reply!

Let me first address the issue with the code snippet provided:

eval_dict = [
    {"pred": "$0.0833333333333333$", "gt": "$\\frac{1}{12}$"},
    {"pred": "$1,4.5$", "gt": "$1,\\frac{9}{2}$"},
    {"pred": "$\\frac{x}{7}+\\frac{2}{7}$",
        "gt": "$\\frac{x+2}{7}$", "timeout": True},
    {"pred": "$\\sec^2(y)$", "gt": "$\\tan^2(y)+1$", "timeout": True},
    {"pred": "$\\begin{pmatrix}-\\frac{7}{4}&-2\\\\4&\\frac{1}{4}\\end{pmatrix}$",
        "gt": "$(\\begin{pmatrix}-\\frac{7}{4}&-2\\\\4&\\frac{1}{4}\\\\\\end{pmatrix})$", "timeout": True},
    {"pred": '$\\begin{pmatrix}\\frac{1}{3x^{2/3}}&0&0\\\\0&1&0\\\\-\\sin(x)&0&0\\end{pmatrix}$',
     "gt": '$(\\begin{pmatrix}\\frac{1}{3\\sqrt[3]{x}^2}&0&0\\\\0&1&0\\\\-\\sin(x)&0&0\\\\\\end{pmatrix})$', "timeout": True},
    {"pred": '$-\\frac{8x^2}{9(x^2-2)^{5/3}}+\\frac{2}{3(x^2-2)^{2/3}}$',
     "gt": '$-\\frac{2(x^2+6)}{9(x^2-2)\\sqrt[3]{x^2-2}^2}$', "timeout": True},
    {"pred": '$-34x-45y+20z-100=0$', "gt": '$34x+45y-20z+100=0$'},
    {"pred": '$\\frac{100}{3}$', "gt": '$33.3$'},
    {"pred": '$\\begin{pmatrix}0.290243531202435\\\\0.196008371385084\\\\-0.186381278538813\\end{pmatrix}$',
        "gt": '$(\\begin{pmatrix}0.29\\\\0.196\\\\-0.186\\\\\\end{pmatrix})$', "timeout": True},
    {"pred": '$\\frac{\\sqrt{\\sqrt{11}+\\sqrt{194}}}{2\\sqrt{33}+15}$',
        "gt": '$\\frac{\\sqrt{\\sqrt{11}+\\sqrt{194}}}{15+2\\sqrt{33}}$', "timeout": True},
    {"pred": '$(+5)(b+2)$', "gt": '$(a+5)(b+2)$', "timeout": True},
    {"pred": '$\\frac{1+\\sqrt{5}}{2}$', "gt": '$2$', "timeout": True},
    {"pred": '$\\frac{34}{16}+\\frac{\\sqrt{1358}}{16}$',
        "gt": '$4$', "timeout": True},
    {"pred": '$1$', "gt": '$1\\\\sqrt{19}$', "timeout": True},
    {"pred": '$(0.6,2.6667]$',
     "gt": "$(\\frac{3}{5},\\frac{8}{3}]$", "timeout": True},
    {"pred": '$x+2n+1$', "gt": '$x+1$', "timeout": True},
    {"pred": "$1$", "gt": "$2\\frac{1}{2}$"}
]

And the output is:
[0] pred: $0.0833333333333333$, ground truth: $\frac{1}{12}$, result: True
[1] pred: $1,4.5$, ground truth: $1,\frac{9}{2}$, result: True
[2] pred: $\frac{x}{7}+\frac{2}{7}$, ground truth: $\frac{x+2}{7}$, result: True
[3] pred: $\sec^2(y)$, ground truth: $\tan^2(y)+1$, result: True
[4] pred: $\begin{pmatrix}-\frac{7}{4}&-2\4&\frac{1}{4}\end{pmatrix}$, ground truth: $(\begin{pmatrix}-\frac{7}{4}&-2\4&\frac{1}{4}\\end{pmatrix})$, result: True
[5] pred: $\begin{pmatrix}\frac{1}{3x^{2/3}}&0&0\0&1&0\-\sin(x)&0&0\end{pmatrix}$, ground truth: $(\begin{pmatrix}\frac{1}{3\sqrt[3]{x}^2}&0&0\0&1&0\-\sin(x)&0&0\\end{pmatrix})$, result: True
[6] pred: $-\frac{8x^2}{9(x^2-2)^{5/3}}+\frac{2}{3(x^2-2)^{2/3}}$, ground truth: $-\frac{2(x^2+6)}{9(x^2-2)\sqrt[3]{x^2-2}^2}$, result: True
[7] pred: $-34x-45y+20z-100=0$, ground truth: $34x+45y-20z+100=0$, result: True
[8] pred: $\frac{100}{3}$, ground truth: $33.3$, result: False
[9] pred: $\begin{pmatrix}0.290243531202435\0.196008371385084\-0.186381278538813\end{pmatrix}$, ground truth: $(\begin{pmatrix}0.29\0.196\-0.186\\end{pmatrix})$, result: False
[10] pred: $\frac{\sqrt{\sqrt{11}+\sqrt{194}}}{2\sqrt{33}+15}$, ground truth: $\frac{\sqrt{\sqrt{11}+\sqrt{194}}}{15+2\sqrt{33}}$, result: True
[11] pred: $(+5)(b+2)$, ground truth: $(a+5)(b+2)$, result: False
[12] pred: $\frac{1+\sqrt{5}}{2}$, ground truth: $2$, result: False
[13] pred: $\frac{34}{16}+\frac{\sqrt{1358}}{16}$, ground truth: $4$, result: False
[14] pred: $1$, ground truth: $1\sqrt{19}$, result: False
[15] pred: $(0.6,2.6667]$, ground truth: $(\frac{3}{5},\frac{8}{3}]$, result: False
[16] pred: $x+2n+1$, ground truth: $x+1$, result: False
[17] pred: $1$, ground truth: $2\frac{1}{2}$, result: True

  • Cases 8, 9, 15: I think they should be considered correct? As the differences are within numerical precision limits.
  • Case 17: The result is incorrect because $2\frac{1}{2}$ corresponds to a value of 2.5, which does not match $1$.

Originally posted by @xiaobanni in #2

  1. When comparing fraction and float, we should infer the rounding precision from float and use it to round the fraction.
    e.g \frac{1}{3} = 0.3
@hynky1999
Copy link
Collaborator Author

I decided not to implement this as it could create a lot of false positives, while the library trying to min them. The users has always full control over the gold labels and can prompt the evluated subject (student/llm) with proper instruction for number of decimals to use. It should be therefore handled on this level instead of using some heuristics that can be wrong.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant