Robust proposals, applied to pre-equilibration #978

maxentile · 2023-03-15T20:53:22Z

Adds a reference implementation of the Barker proposal, based on the paper:

[Livingstone, Zanella, 2020] The Barker proposal: combining robustness and efficiency in gradient-based MCMC https://arxiv.org/abs/1908.11812

This method is very appealing to me, and could likely find several interesting FEP-related applications -- I've recently prototyped some of these for my own curiosity, and discussed some possible future directions with @mcwitt and others.

One immediate application may be to avoid some arbitrariness in methods for pre-equilibration.

Currently: "4D annealing to the desired state, followed by FIRE minimization at that state" (minimize_host_4d)
Option added in this PR: "just run a robust sampler at the desired state" (equilibrate_host_barker) (clashy initial conditions pose no problem, and it even appears stable when the Metropolis check is omitted...)

This seems like an improvement in all the aspects I can think to care about for an initial minimizer (finds a configuration with lower force norm, in less wall time, requiring a smaller change to initial coords, using fewer lines of simpler code, ...). But so far I've only tested on the ⁠hif2a example in test_minimizer.py and on made-up toy examples locally (e.g. initializing a bunch of LJ particles on top of each other)...

Comparison to minimize_host_4d on the hif2a example:

using unadjusted Barker proposal @ temperature = 0.001 K...
        force norm after low-temperature equilibration: 498.011 kJ/mol / nm
        max distance traveled = 0.085 nm
        done in 4.504 s
using 4D FIRE annealing
        force norm after 4D FIRE annealing: 673.744 kJ/mol / nm
        max distance traveled = 0.445 nm
        done in 38.187 s

Also not sure if we can further simplify from "equilibrate at temp ~ 0 K, then equilibrate at temp ~ 300 K" to "just equilibrate at temp ~ 300 K". This possibility is successfully exercised in the tests, but I may be overlooking other requirements that rule out this simplification.

using unadjusted Barker proposal @ temperature = 300.0 K...
        force norm after room-temperature equilibration: 7372.178 kJ/mol / nm
        max distance traveled = 0.032 nm
        done in 6.255 s

…github.com/proteneer/timemachine into robust-proposals-for-initial-equilibration

mcwitt

This looks awesome! I left some initial comments, but am planning to take another pass on the minimization-specific stuff

timemachine/md/minimizer.py

tests/test_barker.py

Partially addressing suggestions in #978 (comment) and #978 (comment)

Applies suggestion to vectorize test #978 (comment) for performance, and make test less brittle w.r.t. random seed #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>

…github.com/proteneer/timemachine into robust-proposals-for-initial-equilibration

timemachine/md/minimizer.py

Addressing #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>

mcwitt · 2023-03-16T21:24:27Z

timemachine/md/minimizer.py

+    flat_params = np.concatenate([param.reshape(-1) for param in params])
+
+    generic_potentials = [generic.from_gpu(p) for p in potentials]
+    summed_potential = generic.SummedPotential(generic_potentials, params)


nit: could this be simplified by using timemachine.lib.potentials.SummedPotential directly, rather than converting from GPU to generic and back to GPU?

Ahh, good call! Will fix, no reason for the round-trip -- just this is the code snippet I had most readily at-hand.

Round-trip avoided in 11ab177

mcwitt · 2023-03-16T21:35:08Z

timemachine/md/minimizer.py

+    du_dx_host_fxn = make_host_du_dx_fxn(mols, host_system, host_coords, ff, box, mol_coords)
+    grad_log_q = lambda x_host: -du_dx_host_fxn(x_host) / (BOLTZ * temperature)
+
+    # TODO: if needed, revisit choice to omit Metropolis correction


Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?

Can you comment on why it seems valid to ignore the Metropolis correction? Do we expect this to approximately correctly sample from the finite temperature distribution without the correction?

In the limit of proposal_stddev approaching 0, I think this proposal process should sample from the finite temperature distribution even without Metropolis correction. (Following the "local balancing" argument in the reference, and based on numerical experiments.)

Omitting the Metropolis correction when proposal_stddev > 0 is not something that I think was intended by the authors, and I don't think it's mentioned / recommended in the paper.

Anecdotally, I would not expect this to outperform our default approximate sampler that omits a Metropolis correction (BAOAB) once equilibrated.

(I would expect the sampling bias would be worse than BAOAB's at a comparable "step size". I'd also expect this move to be much less productive per gradient evaluation, since it's not inertial -- similar to how BAOAB with friction=+np.inf is less efficient than BAOAB with friction=1.0.)

The main advantage is robustness w.r.t. clashy initialization.

Anecdotally, the unadjusted Barker proposal with proposal_stddev = 0.0001 nm appears to be good enough to resolve Lennard-Jones clashes, and can make progress even when |force| = +inf.

...

I'll flesh out some comments in the docstring, add optional argument to use Metropolis correction, and add some other guardrails now that proposal_stddev is exposed to a user of host_equilibrate_barker.

Sounds good. For this PR I think just a note in the docstring would be sufficient.

Added these notes, and modified test to exercise the case of temperature == 0 (aka grad_log_q is all +/-infs) 38e76d1

Did not add metropolis correction, but added assertion preventing user from setting proposal_stddev larger than max tested value.

mcwitt · 2023-03-16T21:42:00Z

timemachine/md/minimizer.py

nit, long-term organization: it feels a bit weird to add this to md.minimizer, since it's not really minimization. Would it make sense to put in a different file, or generalize the name of this submodule? (don't immediately have ideas that I like for this..)

Wanted to put near the most similar existing functions minimize_host_4d, equilibrate_host , which are currently in minimizer.py -- may be better to split this into multiple files, possibly adding a new file equilibrate.py

Makes sense. I don't think any reorganization is necessary for this PR, maybe just to keep in mind for the future.

badisa · 2023-03-17T13:51:13Z

tests/test_barker.py

+
+    Z = np.trapz(pdf_grid, y_grid)
+
+    assert pytest.approx(Z) == 1


nit/question: Should we prefer np over pytest for numerical comparisons?

No preference -- we use both widely in the existing tests, can switch this file to np.testing if preferred.

No preference as long as we think they are of identical 'quality' in their assertions.

Took a look at pytest recommendations here and noticed I was using this backwards. (pytest.approx(actual) == expected instead of actual == pytest.approx(expected)) Fixed in d1449c2

badisa · 2023-03-17T13:52:19Z

timemachine/md/barker.py

+    def sample(self, x):
+        """y ~ p(. | x)"""
+        gauss = np.random.randn(*x.shape)
+        unif = np.random.rand(*x.shape)


Seems like the class should take in a seed and have an rng rather than using the numpy global random rng

Good call -- will fix!

Just to note another alternative, I personally like the explicit pattern that JAX uses, where all random functions must take in a random state as an additional argument. (Here it could look like adding an argument rng: numpy.random.Generator to the sample method). I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit (downside is there's a little more boilerplate required to pass the rng around).

Here it could look like adding an argument rng: numpy.random.Generator to the sample method)

That would also match

timemachine/timemachine/integrator.py

Lines 145 to 158 in 80b7480

def _step(self, x, v, noise):

"""Intended to match https://github.com/proteneer/timemachine/blob/37e60205b3ae3358d9bb0967d03278ed184b8976/timemachine/cpp/src/integrator.cu#L71-L74"""

v_mid = v + self.cb * self.force_fxn(x)

new_v = (self.ca * v_mid) + (self.cc * noise)

new_x = x + 0.5 * self.dt * (v_mid + new_v)

return new_x, new_v

def step(self, x, v, rng):

return self._step(x, v, rng.normal(size=x.shape))

def step_lax(self, key, x, v):

return self._step(x, v, jax.random.normal(key, x.shape))

I think it's a little more flexible than objects maintaining random state (e.g. simplifies running independent chains or parallelizing sampling) and more explicit

Good points

Thanks for both suggestions.

In the interest of keeping the signature simple, applied @badisa 's suggestion of having the object maintain random state b8dc43b

@mcwitt : more control is still available by providing own source of gaussian and uniform r.v.'s to _sample

Thinking about this a bit more, having random states scattered around in different objects vs. a single global random state seems potentially a bit dangerous; for example, a user might reasonably expect that setting np.random.seed(123) would make the following code deterministic. But with hidden random state, we have to track down all objects that maintain their own random state and be sure to reset/reconstruct them too. Don't mean to derail this PR, but it seems to me worth returning to this in the future.

for example, a user might reasonably expect that setting np.random.seed(123) would make the following code deterministic.

Agreed that would be an easy mistake to make. But it would also be relatively easy to detect. (User could call the "deterministic" code twice and usually observe that the outputs aren't the same.)

A possible benefit of having the object maintain random state is that certain other kinds of subtler mistakes become harder to make. (E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop for _ in range(T): x = prop.sample(x).)

Don't mean to derail this PR, but it seems to me worth returning to this in the future.

This sounds like a good area to consider more carefully in the future. Maybe we can take a more unified approach to random state in timemachine's python layer.

(I don't think we currently have a unified approach here: some objects maintain their own random state, some have random functions that accept numpy rng's or jax rng key's, and some sites reference global numpy random state.)

(And a user who is being extra careful about random state can still supply draws to the deterministic function _sample(x, gaussian_rvs, uniform_rvs).)

E.g. it's ~impossible to accidentally reuse the same random seed on each iteration of a simulation loop

That's a good point, this does seem like a benefit of the current approach.

I don't think we currently have a unified approach here

Agreed that we currently have mixed conventions. (Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)

Sounds good to me to continue this discussion in another thread to not distract from this PR.

(Just wanted to express some reservations over "objects that use randomness maintain their own implicit random state" becoming the convention without further discussion.)

Agreed -- not obvious to me what's the best approach overall.

Sounds good to me to continue this discussion in another thread to not distract from this PR.

Agreed! Migrated to #980

badisa · 2023-03-17T13:55:13Z

timemachine/md/minimizer.py

@@ -26,6 +28,15 @@ class MinimizationError(Exception):
    pass


+def check_force_norm(du_dx, threshold=MAX_FORCE_NORM):


This is called force_norm but takes in du_dx, would be good to make that consistent

norm(x) == norm(-x), but yeah, no reason to be inconsistent -- will fix

Made consistent in c516cbf

badisa · 2023-03-17T13:59:31Z

timemachine/md/minimizer.py

+    if len(mols) == 1:
+        top = topology.BaseTopology(mols[0], ff)
+    elif len(mols) == 2:
+        top = topology.DualTopologyMinimization(mols[0], mols[1], ff)


Should this be DualTopology rather than DualTopologyMinimization? The behavior looks identical (at lamb == 0, no difference if I am reading it right), but conceptual BaseTopology and DualTopologyMinimization seem different.

Good call -- I don't see a difference at lamb = 0, this is derived from the choice in minimize_host_4d

timemachine/timemachine/md/minimizer.py

Line 130 in 195ce1c

top = topology.DualTopologyMinimization(mols[0], mols[1], ff)

. Will switch to DualTopology in case DualTopology and DualTopologyMinimization diverge in the future.

Switched in 51a918d

badisa · 2023-03-17T14:03:12Z

timemachine/md/minimizer.py

+    norm = np.linalg.norm(du_dx, axis=-1)
+    if not np.all(norm < threshold):
+        raise MinimizationError(
+            f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {threshold}"


nit: for readability might be nice to either do the max norm or sort them. When the number of particles is large you may not be able to see the force that is large.

On line 415 in the previous it would print the last.

Ahh, good call -- this was derived from the version in minimize_host_4d

timemachine/timemachine/md/minimizer.py

Lines 175 to 179 in 195ce1c

norm = np.linalg.norm(du_dx, axis=-1)

if not np.all(norm < MAX_FORCE_NORM):

raise MinimizationError(

f"Minimization failed to reduce large forces below threshold: |frc| = {norm} > {MAX_FORCE_NORM}"

)

but the version in local_minimize is more informative

timemachine/timemachine/md/minimizer.py

Lines 414 to 419 in 195ce1c

# note that this over the local atoms only, as this function is not concerned

max_frc = np.amax(per_atom_force_norms)

if not max_frc < MAX_FORCE_NORM:

raise MinimizationError(

f"Minimization failed to reduce large forces below threshold: |frc| = {max_frc} > {MAX_FORCE_NORM}"

)

Expanded error message in c516cbf -- report max force norm, as well as the number of atoms that exceed threshold

badisa · 2023-03-17T14:05:19Z

timemachine/md/minimizer.py

+
+
+def equilibrate_host_barker(
+    mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000


Suggested change

mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000

mols, host_system, host_coords, ff, box, mol_coords=None, temperature=constants.DEFAULT_TEMP, proposal_stddev=0.0001, n_steps=1000

Applied in 97196fb

badisa · 2023-03-17T14:51:14Z

timemachine/md/minimizer.py

+    conf_list = [np.array(host_coords)]
+
+    if mol_coords is not None:
+        for mc in mol_coords:


Know this is duplicated from else where, but would be good to validate len(mol_coords) == len(mols)

Input validation expanded in 757323d

badisa · 2023-03-17T14:51:49Z

timemachine/md/minimizer.py

+
+    # wrap gpu_impl, partially applying box, mol coords
+    num_host_atoms = host_coords.shape[0]
+    assert box.shape == (3, 3)


nit: validate this at the start of the function before constructing potentials

Moved to start of function in 757323d

timemachine/md/barker.py

mcwitt

LGTM! (pending addition to docstring explaining assumptions around step size and finite-temperature sampling bias)

Following #978 (comment) and #978 (comment)

Following #978 (comment)

Partially addressing #978 (comment)

Applying #978 (comment)

Applying #978 (comment) Co-Authored-By: Forrest York <[email protected]>

Addressing #978 (comment) and #978 (comment)

Following #978 (comment)

Oops, usage was backwards

maxentile · 2023-03-17T21:11:53Z

Thanks for the reviews! Merging

maxentile added 10 commits March 15, 2023 12:42

Add Barker proposal

a218450

Test proposal normalization

7b9dd35

Remove batching, revert from jax.numpy to original numpy

e2773b0

Add and test utility for initial clash resolution

6b144f1

Fix minus sign

dde9976

Fix Gaussian logpdf

7559e77

Simplify and expand tests

4b0d2fd

Add test for accurate MCMC

1d9eff4

Fix force unit typo in test prints

ec138d2

Merge branch 'master' into robust-proposals-for-initial-equilibration

6651066

maxentile marked this pull request as ready for review March 15, 2023 23:52

maxentile added 3 commits March 16, 2023 10:39

Merge branch 'master' into robust-proposals-for-initial-equilibration

02699ff

Test proposal magnitude independent of |force|

3f07931

Merge branch 'robust-proposals-for-initial-equilibration' of https://…

af070e3

…github.com/proteneer/timemachine into robust-proposals-for-initial-equilibration

maxentile requested review from mcwitt and badisa March 16, 2023 18:01

Merge branch 'master' into robust-proposals-for-initial-equilibration

e72e00b

mcwitt reviewed Mar 16, 2023

View reviewed changes

timemachine/md/minimizer.py Outdated Show resolved Hide resolved

timemachine/md/minimizer.py Outdated Show resolved Hide resolved

tests/test_barker.py Outdated Show resolved Hide resolved

tests/test_barker.py Outdated Show resolved Hide resolved

maxentile and others added 3 commits March 16, 2023 15:21

Expose optional parameters, revert early stopping

811557f

Partially addressing suggestions in #978 (comment) and #978 (comment)

Improve test-- multiple random seeds, vectorize

96e035d

Applies suggestion to vectorize test #978 (comment) for performance, and make test less brittle w.r.t. random seed #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>

Merge branch 'robust-proposals-for-initial-equilibration' of https://…

9030882

…github.com/proteneer/timemachine into robust-proposals-for-initial-equilibration

mcwitt reviewed Mar 16, 2023

View reviewed changes

timemachine/md/minimizer.py Outdated Show resolved Hide resolved

Global constant -> optional argument

743706d

Addressing #978 (comment) Co-Authored-By: Matt Wittmann <[email protected]>

mcwitt reviewed Mar 16, 2023

View reviewed changes

badisa reviewed Mar 17, 2023

View reviewed changes

timemachine/md/barker.py Show resolved Hide resolved

mcwitt approved these changes Mar 17, 2023

View reviewed changes

maxentile and others added 9 commits March 17, 2023 12:32

Improve MinimizationError message

c516cbf

Following #978 (comment) and #978 (comment)

Avoid round-trip to generic potential

11ab177

Following #978 (comment)

Add notes, minimizer temperature 1e-3 -> 0.0

38e76d1

Partially addressing #978 (comment)

Have object maintain random state

b8dc43b

Applying #978 (comment)

300.0 -> constants.DEFAULT_TEMP

97196fb

Applying #978 (comment) Co-Authored-By: Forrest York <[email protected]>

Better validate inputs

757323d

Addressing #978 (comment) and #978 (comment)

Merge branch 'master' into robust-proposals-for-initial-equilibration

4717e13

DualTopologyMinimization -> DualTopology

51a918d

Following #978 (comment)

pytest.approx(actual) -> pytest.approx(expected)

d1449c2

Oops, usage was backwards

badisa approved these changes Mar 17, 2023

View reviewed changes

maxentile mentioned this pull request Mar 17, 2023

Consider unifying approach to PRNG state #980

Open

maxentile merged commit 6f9ebcc into master Mar 17, 2023

maxentile deleted the robust-proposals-for-initial-equilibration branch March 17, 2023 21:12

maxentile mentioned this pull request Jul 7, 2023

Moves test minimizer test to nightly #1077

Merged

	def _step(self, x, v, noise):
	"""Intended to match https://github.com/proteneer/timemachine/blob/37e60205b3ae3358d9bb0967d03278ed184b8976/timemachine/cpp/src/integrator.cu#L71-L74"""
	v_mid = v + self.cb * self.force_fxn(x)

	new_v = (self.ca * v_mid) + (self.cc * noise)
	new_x = x + 0.5 * self.dt * (v_mid + new_v)

	return new_x, new_v

	def step(self, x, v, rng):
	return self._step(x, v, rng.normal(size=x.shape))

	def step_lax(self, key, x, v):
	return self._step(x, v, jax.random.normal(key, x.shape))

		@@ -26,6 +28,15 @@ class MinimizationError(Exception):
		pass


		def check_force_norm(du_dx, threshold=MAX_FORCE_NORM):

	norm = np.linalg.norm(du_dx, axis=-1)
	if not np.all(norm < MAX_FORCE_NORM):
	raise MinimizationError(
	f"Minimization failed to reduce large forces below threshold: \|frc\| = {norm} > {MAX_FORCE_NORM}"
	)

	# note that this over the local atoms only, as this function is not concerned
	max_frc = np.amax(per_atom_force_norms)
	if not max_frc < MAX_FORCE_NORM:
	raise MinimizationError(
	f"Minimization failed to reduce large forces below threshold: \|frc\| = {max_frc} > {MAX_FORCE_NORM}"
	)



		def equilibrate_host_barker(
		mols, host_system, host_coords, ff, box, mol_coords=None, temperature=300.0, proposal_stddev=0.0001, n_steps=1000

Robust proposals, applied to pre-equilibration #978

Robust proposals, applied to pre-equilibration #978

Conversation

maxentile commented Mar 15, 2023 • edited Loading

mcwitt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcwitt Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcwitt Mar 17, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mcwitt left a comment

Choose a reason for hiding this comment

maxentile commented Mar 17, 2023

maxentile commented Mar 15, 2023 •

edited

Loading

mcwitt Mar 17, 2023 •

edited

Loading

mcwitt Mar 17, 2023 •

edited

Loading