-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Robust proposals, applied to pre-equilibration #978
Robust proposals, applied to pre-equilibration #978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks awesome! I left some initial comments, but am planning to take another pass on the minimization-specific stuff
Partially addressing suggestions in #978 (comment) and #978 (comment)
Applies suggestion to vectorize test #978 (comment) for performance, and make test less brittle w.r.t. random seed #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>
…github.com/proteneer/timemachine into robust-proposals-for-initial-equilibration
Addressing #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>
timemachine/md/minimizer.py
Outdated
flat_params = np.concatenate([param.reshape(-1) for param in params]) | ||
|
||
generic_potentials = [generic.from_gpu(p) for p in potentials] | ||
summed_potential = generic.SummedPotential(generic_potentials, params) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: could this be simplified by using timemachine.lib.potentials.SummedPotential
directly, rather than converting from GPU to generic and back to GPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, good call! Will fix, no reason for the round-trip -- just this is the code snippet I had most readily at-hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Round-trip avoided in 11ab177
du_dx_host_fxn = make_host_du_dx_fxn(mols, host_system, host_coords, ff, box, mol_coords) | ||
grad_log_q = lambda x_host: -du_dx_host_fxn(x_host) / (BOLTZ * temperature) | ||
|
||
# TODO: if needed, revisit choice to omit Metropolis correction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?
In the limit of proposal_stddev
approaching 0, I think this proposal process should sample from the finite temperature distribution even without Metropolis correction. (Following the "local balancing" argument in the reference, and based on numerical experiments.)
Omitting the Metropolis correction when proposal_stddev > 0
is not something that I think was intended by the authors, and I don't think it's mentioned / recommended in the paper.
Anecdotally, I would not expect this to outperform our default approximate sampler that omits a Metropolis correction (BAOAB) once equilibrated.
(I would expect the sampling bias would be worse than BAOAB's at a comparable "step size". I'd also expect this move to be much less productive per gradient evaluation, since it's not inertial -- similar to how BAOAB with friction=+np.inf
is less efficient than BAOAB with friction=1.0
.)
The main advantage is robustness w.r.t. clashy initialization.
Anecdotally, the unadjusted Barker proposal with proposal_stddev = 0.0001 nm appears to be good enough to resolve Lennard-Jones clashes, and can make progress even when |force| = +inf
.
...
I'll flesh out some comments in the docstring, add optional argument to use Metropolis correction, and add some other guardrails now that proposal_stddev
is exposed to a user of host_equilibrate_barker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good. For this PR I think just a note in the docstring would be sufficient.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added these notes, and modified test to exercise the case of temperature == 0 (aka grad_log_q is all +/-infs) 38e76d1
Did not add metropolis correction, but added assertion preventing user from setting proposal_stddev larger than max tested value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, long-term organization: it feels a bit weird to add this to md.minimizer
, since it's not really minimization. Would it make sense to put in a different file, or generalize the name of this submodule? (don't immediately have ideas that I like for this..)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wanted to put near the most similar existing functions minimize_host_4d
, equilibrate_host
, which are currently in minimizer.py -- may be better to split this into multiple files, possibly adding a new file equilibrate.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense. I don't think any reorganization is necessary for this PR, maybe just to keep in mind for the future.
tests/test_barker.py
Outdated
|
||
Z = np.trapz(pdf_grid, y_grid) | ||
|
||
assert pytest.approx(Z) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit/question: Should we prefer np over pytest for numerical comparisons?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference -- we use both widely in the existing tests, can switch this file to np.testing if preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No preference as long as we think they are of identical 'quality' in their assertions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Took a look at pytest recommendations here and noticed I was using this backwards. (pytest.approx(actual) == expected
instead of actual == pytest.approx(expected)
) Fixed in d1449c2
timemachine/md/barker.py
Outdated
def sample(self, x): | ||
"""y ~ p(. | x)""" | ||
gauss = np.random.randn(*x.shape) | ||
unif = np.random.rand(*x.shape) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the class should take in a seed and have an rng rather than using the numpy global random rng
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call -- will fix!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to note another alternative, I personally like the explicit pattern that JAX uses, where all random functions must take in a random state as an additional argument. (Here it could look like adding an argument rng: numpy.random.Generator
to the sample
method). I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit (downside is there's a little more boilerplate required to pass the rng around).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here it could look like adding an argument rng: numpy.random.Generator to the sample method)
That would also match
timemachine/timemachine/integrator.py
Lines 145 to 158 in 80b7480
def _step(self, x, v, noise): | |
"""Intended to match https://github.com/proteneer/timemachine/blob/37e60205b3ae3358d9bb0967d03278ed184b8976/timemachine/cpp/src/integrator.cu#L71-L74""" | |
v_mid = v + self.cb * self.force_fxn(x) | |
new_v = (self.ca * v_mid) + (self.cc * noise) | |
new_x = x + 0.5 * self.dt * (v_mid + new_v) | |
return new_x, new_v | |
def step(self, x, v, rng): | |
return self._step(x, v, rng.normal(size=x.shape)) | |
def step_lax(self, key, x, v): | |
return self._step(x, v, jax.random.normal(key, x.shape)) |
I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit
Good points
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about this a bit more, having random states scattered around in different objects vs. a single global random state seems potentially a bit dangerous; for example, a user might reasonably expect that setting np.random.seed(123)
would make the following code deterministic. But with hidden random state, we have to track down all objects that maintain their own random state and be sure to reset/reconstruct them too. Don't mean to derail this PR, but it seems to me worth returning to this in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for example, a user might reasonably expect that setting np.random.seed(123) would make the following code deterministic.
Agreed that would be an easy mistake to make. But it would also be relatively easy to detect. (User could call the "deterministic" code twice and usually observe that the outputs aren't the same.)
A possible benefit of having the object maintain random state is that certain other kinds of subtler mistakes become harder to make. (E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop for _ in range(T): x = prop.sample(x)
.)
Don't mean to derail this PR, but it seems to me worth returning to this in the future.
This sounds like a good area to consider more carefully in the future. Maybe we can take a more unified approach to random state in timemachine's python layer.
(I don't think we currently have a unified approach here: some objects maintain their own random state, some have random functions that accept numpy rng
's or jax rng key
's, and some sites reference global numpy random state.)
(And a user who is being extra careful about random state can still supply draws to the deterministic function _sample(x, gaussian_rvs, uniform_rvs)
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop
That's a good point, this does seem like a benefit of the current approach.
I don't think we currently have a unified approach here
Agreed that we currently have mixed conventions. (Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)
Sounds good to me to continue this discussion in another thread to not distract from this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)
Agreed -- not obvious to me what's the best approach overall.
Sounds good to me to continue this discussion in another thread to not distract from this PR.
Agreed! Migrated to #980
timemachine/md/minimizer.py
Outdated
@@ -26,6 +28,15 @@ class MinimizationError(Exception): | |||
pass | |||
|
|||
|
|||
def check_force_norm(du_dx, threshold=MAX_FORCE_NORM): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is called force_norm
but takes in du_dx
, would be good to make that consistent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
norm(x) == norm(-x), but yeah, no reason to be inconsistent -- will fix
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made consistent in c516cbf
timemachine/md/minimizer.py
Outdated
if len(mols) == 1: | ||
top = topology.BaseTopology(mols[0], ff) | ||
elif len(mols) == 2: | ||
top = topology.DualTopologyMinimization(mols[0], mols[1], ff) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be DualTopology
rather than DualTopologyMinimization
? The behavior looks identical (at lamb == 0, no difference if I am reading it right), but conceptual BaseTopology and DualTopologyMinimization seem different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call -- I don't see a difference at lamb = 0, this is derived from the choice in minimize_host_4d
timemachine/timemachine/md/minimizer.py
Line 130 in 195ce1c
top = topology.DualTopologyMinimization(mols[0], mols[1], ff) |
DualTopology
in case DualTopology
and DualTopologyMinimization
diverge in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switched in 51a918d
timemachine/md/minimizer.py
Outdated
norm = np.linalg.norm(du_dx, axis=-1) | ||
if not np.all(norm < threshold): | ||
raise MinimizationError( | ||
f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {threshold}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for readability might be nice to either do the max norm or sort them. When the number of particles is large you may not be able to see the force that is large.
On line 415 in the previous it would print the last.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ahh, good call -- this was derived from the version in minimize_host_4d
timemachine/timemachine/md/minimizer.py
Lines 175 to 179 in 195ce1c
norm = np.linalg.norm(du_dx, axis=-1) | |
if not np.all(norm < MAX_FORCE_NORM): | |
raise MinimizationError( | |
f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {MAX_FORCE_NORM}" | |
) |
but the version in
local_minimize
is more informative timemachine/timemachine/md/minimizer.py
Lines 414 to 419 in 195ce1c
# note that this over the local atoms only, as this function is not concerned | |
max_frc = np.amax(per_atom_force_norms) | |
if not max_frc < MAX_FORCE_NORM: | |
raise MinimizationError( | |
f"Minimization failed to reduce large forces below threshold: |frc| = {max_frc} > {MAX_FORCE_NORM}" | |
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Expanded error message in c516cbf -- report max force norm, as well as the number of atoms that exceed threshold
timemachine/md/minimizer.py
Outdated
|
||
|
||
def equilibrate_host_barker( | ||
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000 | |
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=constants.DEFAULT_TEMP, proposal_stddev=0.0001, n_steps=1000 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Applied in 97196fb
conf_list = [np.array(host_coords)] | ||
|
||
if mol_coords is not None: | ||
for mc in mol_coords: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Know this is duplicated from else where, but would be good to validate len(mol_coords) == len(mols)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Input validation expanded in 757323d
timemachine/md/minimizer.py
Outdated
|
||
# wrap gpu_impl, partially applying box, mol coords | ||
num_host_atoms = host_coords.shape[0] | ||
assert box.shape == (3, 3) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: validate this at the start of the function before constructing potentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to start of function in 757323d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! (pending addition to docstring explaining assumptions around step size and finite-temperature sampling bias)
Following #978 (comment) and #978 (comment)
Partially addressing #978 (comment)
Applying #978 (comment) Co-Authored-By: Forrest York <[email protected]>
Addressing #978 (comment) and #978 (comment)
Oops, usage was backwards
Thanks for the reviews! Merging |
Adds a reference implementation of the Barker proposal, based on the paper:
[Livingstone, Zanella, 2020] The Barker proposal: combining robustness and efficiency in gradient-based MCMC https://arxiv.org/abs/1908.11812
This method is very appealing to me, and could likely find several interesting FEP-related applications -- I've recently prototyped some of these for my own curiosity, and discussed some possible future directions with @mcwitt and others.
One immediate application may be to avoid some arbitrariness in methods for pre-equilibration.
minimize_host_4d
)equilibrate_host_barker
) (clashy initial conditions pose no problem, and it even appears stable when the Metropolis check is omitted...)This seems like an improvement in all the aspects I can think to care about for an initial minimizer (finds a configuration with lower force norm, in less wall time, requiring a smaller change to initial coords, using fewer lines of simpler code, ...). But so far I've only tested on the hif2a example in test_minimizer.py and on made-up toy examples locally (e.g. initializing a bunch of LJ particles on top of each other)...
Comparison to minimize_host_4d on the hif2a example:
Also not sure if we can further simplify from "equilibrate at temp ~ 0 K, then equilibrate at temp ~ 300 K" to "just equilibrate at temp ~ 300 K". This possibility is successfully exercised in the tests, but I may be overlooking other requirements that rule out this simplification.