Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Robust proposals, applied to pre-equilibration #978

Merged
merged 27 commits into from
Mar 17, 2023

Conversation

maxentile
Copy link
Collaborator

@maxentile maxentile commented Mar 15, 2023

Adds a reference implementation of the Barker proposal, based on the paper:

[Livingstone, Zanella, 2020] The Barker proposal: combining robustness and efficiency in gradient-based MCMC https://arxiv.org/abs/1908.11812

This method is very appealing to me, and could likely find several interesting FEP-related applications -- I've recently prototyped some of these for my own curiosity, and discussed some possible future directions with @mcwitt and others.


One immediate application may be to avoid some arbitrariness in methods for pre-equilibration.

  • Currently: "4D annealing to the desired state, followed by FIRE minimization at that state" (minimize_host_4d)
  • Option added in this PR: "just run a robust sampler at the desired state" (equilibrate_host_barker) (clashy initial conditions pose no problem, and it even appears stable when the Metropolis check is omitted...)

This seems like an improvement in all the aspects I can think to care about for an initial minimizer (finds a configuration with lower force norm, in less wall time, requiring a smaller change to initial coords, using fewer lines of simpler code, ...). But so far I've only tested on the ⁠hif2a example in test_minimizer.py and on made-up toy examples locally (e.g. initializing a bunch of LJ particles on top of each other)...

Comparison to minimize_host_4d on the hif2a example:

using unadjusted Barker proposal @ temperature = 0.001 K...
        force norm after low-temperature equilibration: 498.011 kJ/mol / nm
        max distance traveled = 0.085 nm
        done in 4.504 s
using 4D FIRE annealing
        force norm after 4D FIRE annealing: 673.744 kJ/mol / nm
        max distance traveled = 0.445 nm
        done in 38.187 s

Also not sure if we can further simplify from "equilibrate at temp ~ 0 K, then equilibrate at temp ~ 300 K" to "just equilibrate at temp ~ 300 K". This possibility is successfully exercised in the tests, but I may be overlooking other requirements that rule out this simplification.

using unadjusted Barker proposal @ temperature = 300.0 K...
        force norm after room-temperature equilibration: 7372.178 kJ/mol / nm
        max distance traveled = 0.032 nm
        done in 6.255 s

@maxentile maxentile marked this pull request as ready for review March 15, 2023 23:52
@maxentile maxentile requested review from mcwitt and badisa March 16, 2023 18:01
Copy link
Collaborator

@mcwitt mcwitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks awesome! I left some initial comments, but am planning to take another pass on the minimization-specific stuff

timemachine/md/minimizer.py Outdated Show resolved Hide resolved
timemachine/md/minimizer.py Outdated Show resolved Hide resolved
tests/test_barker.py Outdated Show resolved Hide resolved
tests/test_barker.py Outdated Show resolved Hide resolved
maxentile and others added 3 commits March 16, 2023 15:21
Applies suggestion to vectorize test #978 (comment) for performance,
and make test less brittle w.r.t. random seed #978 (comment)

Co-Authored-By: Matt Wittmann <[email protected]>
timemachine/md/minimizer.py Outdated Show resolved Hide resolved
flat_params = np.concatenate([param.reshape(-1) for param in params])

generic_potentials = [generic.from_gpu(p) for p in potentials]
summed_potential = generic.SummedPotential(generic_potentials, params)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: could this be simplified by using timemachine.lib.potentials.SummedPotential directly, rather than converting from GPU to generic and back to GPU?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good call! Will fix, no reason for the round-trip -- just this is the code snippet I had most readily at-hand.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Round-trip avoided in 11ab177

du_dx_host_fxn = make_host_du_dx_fxn(mols, host_system, host_coords, ff, box, mol_coords)
grad_log_q = lambda x_host: -du_dx_host_fxn(x_host) / (BOLTZ * temperature)

# TODO: if needed, revisit choice to omit Metropolis correction
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?

In the limit of proposal_stddev approaching 0, I think this proposal process should sample from the finite temperature distribution even without Metropolis correction. (Following the "local balancing" argument in the reference, and based on numerical experiments.)

Omitting the Metropolis correction when proposal_stddev > 0 is not something that I think was intended by the authors, and I don't think it's mentioned / recommended in the paper.

Anecdotally, I would not expect this to outperform our default approximate sampler that omits a Metropolis correction (BAOAB) once equilibrated.

(I would expect the sampling bias would be worse than BAOAB's at a comparable "step size". I'd also expect this move to be much less productive per gradient evaluation, since it's not inertial -- similar to how BAOAB with friction=+np.inf is less efficient than BAOAB with friction=1.0.)

The main advantage is robustness w.r.t. clashy initialization.

Anecdotally, the unadjusted Barker proposal with proposal_stddev = 0.0001 nm appears to be good enough to resolve Lennard-Jones clashes, and can make progress even when |force| = +inf.

...

I'll flesh out some comments in the docstring, add optional argument to use Metropolis correction, and add some other guardrails now that proposal_stddev is exposed to a user of host_equilibrate_barker.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. For this PR I think just a note in the docstring would be sufficient.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added these notes, and modified test to exercise the case of temperature == 0 (aka grad_log_q is all +/-infs) 38e76d1

Did not add metropolis correction, but added assertion preventing user from setting proposal_stddev larger than max tested value.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit, long-term organization: it feels a bit weird to add this to md.minimizer, since it's not really minimization. Would it make sense to put in a different file, or generalize the name of this submodule? (don't immediately have ideas that I like for this..)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wanted to put near the most similar existing functions minimize_host_4d, equilibrate_host , which are currently in minimizer.py -- may be better to split this into multiple files, possibly adding a new file equilibrate.py

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I don't think any reorganization is necessary for this PR, maybe just to keep in mind for the future.


Z = np.trapz(pdf_grid, y_grid)

assert pytest.approx(Z) == 1
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit/question: Should we prefer np over pytest for numerical comparisons?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preference -- we use both widely in the existing tests, can switch this file to np.testing if preferred.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No preference as long as we think they are of identical 'quality' in their assertions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Took a look at pytest recommendations here and noticed I was using this backwards. (pytest.approx(actual) == expected instead of actual == pytest.approx(expected)) Fixed in d1449c2

def sample(self, x):
"""y ~ p(. | x)"""
gauss = np.random.randn(*x.shape)
unif = np.random.rand(*x.shape)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the class should take in a seed and have an rng rather than using the numpy global random rng

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- will fix!

Copy link
Collaborator

@mcwitt mcwitt Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to note another alternative, I personally like the explicit pattern that JAX uses, where all random functions must take in a random state as an additional argument. (Here it could look like adding an argument rng: numpy.random.Generator to the sample method). I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit (downside is there's a little more boilerplate required to pass the rng around).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here it could look like adding an argument rng: numpy.random.Generator to the sample method)

That would also match

def _step(self, x, v, noise):
"""Intended to match https://github.com/proteneer/timemachine/blob/37e60205b3ae3358d9bb0967d03278ed184b8976/timemachine/cpp/src/integrator.cu#L71-L74"""
v_mid = v + self.cb * self.force_fxn(x)
new_v = (self.ca * v_mid) + (self.cc * noise)
new_x = x + 0.5 * self.dt * (v_mid + new_v)
return new_x, new_v
def step(self, x, v, rng):
return self._step(x, v, rng.normal(size=x.shape))
def step_lax(self, key, x, v):
return self._step(x, v, jax.random.normal(key, x.shape))

I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit

Good points

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for both suggestions.

In the interest of keeping the signature simple, applied @badisa 's suggestion of having the object maintain random state b8dc43b

@mcwitt : more control is still available by providing own source of gaussian and uniform r.v.'s to _sample

Copy link
Collaborator

@mcwitt mcwitt Mar 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about this a bit more, having random states scattered around in different objects vs. a single global random state seems potentially a bit dangerous; for example, a user might reasonably expect that setting np.random.seed(123) would make the following code deterministic. But with hidden random state, we have to track down all objects that maintain their own random state and be sure to reset/reconstruct them too. Don't mean to derail this PR, but it seems to me worth returning to this in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for example, a user might reasonably expect that setting np.random.seed(123) would make the following code deterministic.

Agreed that would be an easy mistake to make. But it would also be relatively easy to detect. (User could call the "deterministic" code twice and usually observe that the outputs aren't the same.)

A possible benefit of having the object maintain random state is that certain other kinds of subtler mistakes become harder to make. (E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop for _ in range(T): x = prop.sample(x).)

Don't mean to derail this PR, but it seems to me worth returning to this in the future.

This sounds like a good area to consider more carefully in the future. Maybe we can take a more unified approach to random state in timemachine's python layer.

(I don't think we currently have a unified approach here: some objects maintain their own random state, some have random functions that accept numpy rng's or jax rng key's, and some sites reference global numpy random state.)

(And a user who is being extra careful about random state can still supply draws to the deterministic function _sample(x, gaussian_rvs, uniform_rvs).)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop

That's a good point, this does seem like a benefit of the current approach.

I don't think we currently have a unified approach here

Agreed that we currently have mixed conventions. (Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)

Sounds good to me to continue this discussion in another thread to not distract from this PR.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)

Agreed -- not obvious to me what's the best approach overall.

Sounds good to me to continue this discussion in another thread to not distract from this PR.

Agreed! Migrated to #980

@@ -26,6 +28,15 @@ class MinimizationError(Exception):
pass


def check_force_norm(du_dx, threshold=MAX_FORCE_NORM):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is called force_norm but takes in du_dx, would be good to make that consistent

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

norm(x) == norm(-x), but yeah, no reason to be inconsistent -- will fix

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made consistent in c516cbf

if len(mols) == 1:
top = topology.BaseTopology(mols[0], ff)
elif len(mols) == 2:
top = topology.DualTopologyMinimization(mols[0], mols[1], ff)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be DualTopology rather than DualTopologyMinimization? The behavior looks identical (at lamb == 0, no difference if I am reading it right), but conceptual BaseTopology and DualTopologyMinimization seem different.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call -- I don't see a difference at lamb = 0, this is derived from the choice in minimize_host_4d

top = topology.DualTopologyMinimization(mols[0], mols[1], ff)
. Will switch to DualTopology in case DualTopology and DualTopologyMinimization diverge in the future.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Switched in 51a918d

norm = np.linalg.norm(du_dx, axis=-1)
if not np.all(norm < threshold):
raise MinimizationError(
f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {threshold}"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for readability might be nice to either do the max norm or sort them. When the number of particles is large you may not be able to see the force that is large.

On line 415 in the previous it would print the last.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, good call -- this was derived from the version in minimize_host_4d

norm = np.linalg.norm(du_dx, axis=-1)
if not np.all(norm < MAX_FORCE_NORM):
raise MinimizationError(
f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {MAX_FORCE_NORM}"
)

but the version in local_minimize is more informative
# note that this over the local atoms only, as this function is not concerned
max_frc = np.amax(per_atom_force_norms)
if not max_frc < MAX_FORCE_NORM:
raise MinimizationError(
f"Minimization failed to reduce large forces below threshold: |frc| = {max_frc} > {MAX_FORCE_NORM}"
)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded error message in c516cbf -- report max force norm, as well as the number of atoms that exceed threshold



def equilibrate_host_barker(
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000
mols, host_system, host_coords, ff, box, mol_coords=None, temperature=constants.DEFAULT_TEMP, proposal_stddev=0.0001, n_steps=1000

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Applied in 97196fb

conf_list = [np.array(host_coords)]

if mol_coords is not None:
for mc in mol_coords:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Know this is duplicated from else where, but would be good to validate len(mol_coords) == len(mols)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Input validation expanded in 757323d


# wrap gpu_impl, partially applying box, mol coords
num_host_atoms = host_coords.shape[0]
assert box.shape == (3, 3)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: validate this at the start of the function before constructing potentials

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to start of function in 757323d

Copy link
Collaborator

@mcwitt mcwitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! (pending addition to docstring explaining assumptions around step size and finite-temperature sampling bias)

@maxentile
Copy link
Collaborator Author

Thanks for the reviews! Merging

@maxentile maxentile merged commit 6f9ebcc into master Mar 17, 2023
@maxentile maxentile deleted the robust-proposals-for-initial-equilibration branch March 17, 2023 21:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants