-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle undetermined energies in BAR calculations #1098
Conversation
Even after preventing BAR calculations from failing when there are undetermined energies, there is a separate issue that bootstrapped In 54e4b5c I modified the bisection cost function to bisect pairs of states with lowest MBAR overlap, rather than maximum bootstrapped |
This was previously optimized for BAR
df is undefined in this case
Was intended to cap computational effort, but can lead to nondeterminism
Upgrade to pymbar>=3.0.6 which fixed a bug causing maximum_iterations to be ignored: choderalab/pymbar#425
|
||
bar_result = df_from_u_kln( | ||
u_kln_sample, | ||
initial_f_k=mbar.f_k, # warm start |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: would it make sense to add bootstrap_maximum_iterations to signature of bootstrap_bar, then forward it here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, it seems like useful flexibility to be able to specify max iterations here. Added in f7a1ab4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure whether it would be useful to be able to specify max iterations separately for the point estimate and the bootstrap samples, but this can probably be added later if useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unsure whether it would be useful to be able to specify max iterations separately for the point estimate and the bootstrap samples, but this can probably be added later if useful.
Separately makes sense to me (to control expense), but can be added later if needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments, overall LGTM.
@@ -102,7 +102,7 @@ def build_extension(self, ext): | |||
"jaxlib>0.4.1", | |||
"networkx", | |||
"numpy", | |||
"pymbar>3.0.4,<4", | |||
"pymbar>=3.0.6,<4", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Should this be 3.1.0 or higher? Not sure why this is different from requirements
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
<3.0.6
definitely won't work because of choderalab/pymbar#425, which was merged in 3.0.6
. In general I prefer to use version constraints in setup.py
only to exclude known-incompatible versions.
When upgrading, I opted to use the latest release <4.
tests/test_bar.py
Outdated
print(f"bootstrap uncertainty = {bootstrap_sigma}, pymbar.MBAR uncertainty = {df_err_ref}") | ||
assert df_0 == df_ref | ||
assert df_1 == df_ref | ||
assert len(bootstrap_samples) == n_bootstrap, "timed out on default problem size!" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: Is this still relevant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kept assertion but removed misleading message in 28c6d9f
# should give the same result with inf | ||
result_with_inf = estimate_free_energy_bar(np.array([u_kln_with_inf]), DEFAULT_TEMP) | ||
assert result_with_nan.dG == result_with_inf.dG | ||
assert result_with_nan.dG_err == result_with_inf.dG_err |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: assert finite
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added in e52fd30
# As of pymbar 3.1.0, computation of the covariance matrix can raise an exception on incomplete convergence. | ||
# In this case, return the unconverged estimate with NaN as uncertainty. | ||
df = mbar.getFreeEnergyDifferences(compute_uncertainty=False)[0] | ||
return df[0, 1], np.nan |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: nan or inf?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leaning toward NaN as more appropriate here, since we can't be sure of the failure reason. The choice here doesn't affect bisection, since we now use overlap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense.
* Change to MBAR in #1098 produces much more significant differences than BAR
* Change to MBAR in #1098 produces much more significant differences than BAR in cases of poor convergence
Since #1084, we detect numerical overflows during the evaluation of potentials and represent undetermined energies as NaNs. This has exposed some cases where we previously returned invalid energies (for example, overflows appear to occur frequently during the initial evaluation of the BAR df error and overlap between the end states, before the first iteration of bisection).
Now, when we encounter an energy overflow, the resulting NaN in the$\Delta f$ and uncertainty estimates when there are NaN work values.
u_kln
matrix causespymbar.MBAR
(used to compute overlap) to fail with aLinAlgError
, crashing the simulation. Also,pymbar.BAR
warns and returns zeros for theThis PR addresses the latter issues by:
u_kln
. When we detect a NaN, we raise a warning and replace NaNs withnp.inf
(representing a configuration with zero probability).pymbar.BAR
estimator withpymbar.MBAR
(on a 2-stateu_kn
matrix). The latter interpretsnp.inf
as a configuration with zero weight (as desired), while the former returns(0.0, 0.0)
if there are any infs or NaNs.This uncovered a related issue where the
timeout
parameter inbootstrap_bar
, intended to cap computational cost, can lead to nondeterminism. This was addressed bytimeout
parameter frombootstrap_bar
andbar_with_bootstrapped_uncertainty
relative_tolerance
andmaximum_iterations
to reduce costpymbar
from3.0.5
to3.1.0
(3.0.6
fixed a bug that preventedmaximum_iterations
from being respected)Todo:
seems robust for non-overlapping states in testing: uncertainty estimate produced by MBAR is finite and large