Skip to content

Fix _str_to_int precision loss above 2^53 and add math_eval tests#29

Merged
crazydonkey200 merged 1 commit into
google-deepmind:mainfrom
chenchenpan:fix/math-eval-int-precision
Jun 14, 2026
Merged

Fix _str_to_int precision loss above 2^53 and add math_eval tests#29
crazydonkey200 merged 1 commit into
google-deepmind:mainfrom
chenchenpan:fix/math-eval-int-precision

Conversation

@chenchenpan

Copy link
Copy Markdown
Contributor

Summary

math_eval._str_to_int converted strings to int via float(x), which
silently loses precision for integers larger than 2^53 — the largest integer a
float64 can represent exactly. This resolves the existing TODO on that
function and adds the first test coverage for math_eval.py.

The bug

float('9007199254740993') (2^53 + 1) rounds to 9007199254740992.0, so
_str_to_int returned 9007199254740992 — off by one. Larger values drift
further (e.g. '12345678901234567890' -> 12345678901234567168).

This matters for MATH-style answer grading: _str_to_int feeds _normalize,
so two distinct large integers could normalize to the same string and be graded
as equal.

The fix

Parse integer-formatted strings directly with int(x) (arbitrary precision),
and fall back to float() only for non-integer formats:

def _str_to_int(x: str) -> int:
  x = x.replace(',', '')
  try:
    return int(x)
  except ValueError:
    return int(float(x))

'1.0' and '5e3' raise ValueError on int() and fall through to the float
path, so existing behavior for those is preserved.

Tests

Adds simply/utils/math_eval_test.py (absltest + parameterized, matching
the repo convention) — the first tests for this module. Covers answer
extraction, \boxed{}/\fbox{} parsing, normalization helpers, eval gating,
split_tuple, the >2^53 regression cases, and an end-to-end match test
(skipped when sympy is unavailable). 66 tests, all passing.

Notes

  • No public API change; _str_to_int is module-private.
  • No new dependencies.

@google-cla

google-cla Bot commented Jun 14, 2026

Copy link
Copy Markdown

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

`_str_to_int` converted strings via `float(x)` before `int(x)`, which
silently lost precision for integers larger than 2^53 (the largest
integer a float64 represents exactly). For example '9007199254740993'
(2^53 + 1) returned 9007199254740992.

Parse integer-formatted strings directly with `int(x)`, which has
arbitrary precision, and fall back to `float()` only for non-integer
formats like '1.0' or '5e3'. This resolves the existing TODO.

Also add simply/utils/math_eval_test.py, the first test coverage for
this module: answer extraction, boxed-answer parsing, normalization
helpers, eval gating, the >2^53 regression cases, and an end-to-end
`match` test that skips when sympy is unavailable. 66 tests, all pass.
@chenchenpan chenchenpan force-pushed the fix/math-eval-int-precision branch from 6e73c05 to 069ee68 Compare June 14, 2026 05:54
chenchenpan added a commit to chenchenpan/simply-googledeepmind that referenced this pull request Jun 14, 2026
@crazydonkey200 crazydonkey200 merged commit b1064e6 into google-deepmind:main Jun 14, 2026
5 checks passed
chenchenpan added a commit to chenchenpan/simply-googledeepmind that referenced this pull request Jun 14, 2026
@chenchenpan chenchenpan deleted the fix/math-eval-int-precision branch June 14, 2026 23:47
chenchenpan added a commit to chenchenpan/simply-googledeepmind that referenced this pull request Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants