Allow computing all-pairs potential on subset of atoms, add consistency check #660

mcwitt · 2022-03-02T02:03:33Z

Related: #472

Adds a constructor argument atom_idxs to NonbondedAllPairs; this is used to select a subset of atoms for computing the all-pairs potential
Adds consistency check comparing the result of the full Nonbonded(exclusions, scales) potential with the sum of
- NonbondedAllPairs(host)
- NonbondedAllPairs(ligand)
- NonbondedInteractionGroup(host, ligand)
- NonbondedPairList(exclusions, scales)

Notes for review

Commits starting with Rename are purely string substitutions and should contain no other changes -- these were mainly to reflect that we're no longer just doing permutations, since the potential can be computed on a subset of atoms
Most of the implementation changes to allow computing on a subset have been squashed into 71cc0e5

timemachine/cpp/src/nonbonded_all_pairs.cu

timemachine/cpp/src/kernels/k_lambda_transformer_jit.cuh

timemachine/cpp/src/kernels/k_nonbonded.cuh

timemachine/cpp/src/nonbonded_all_pairs.cu

timemachine/cpp/src/nonbonded_all_pairs.hpp

mcwitt · 2022-03-04T19:46:32Z

tests/nonbonded/test_consistency.py

+
+    if interpolated:
+        # TODO: why does interpolation break bitwise equivalence?
+        np.testing.assert_allclose(du_dp_test, du_dp_ref, rtol=1e-10, atol=1e-10)


Not completely sure yet why the du_dps aren't bitwise equivalent (only in the interpolated case)

Resolved offline in discussion with @proteneer. Bitwise equivalence for the interpolated case is not possible with the current structure of the code, ultimately due to the fact that the distributive property of multiplication breaks bitwise equivalence (even when summation is done in fixed point), i.e. c * fixed_sum(a, b) != fixed_sum(c * a, c * b) (see example in gist).

Added a note in 15a17b7.

timemachine/cpp/src/nonbonded_all_pairs.cu

timemachine/cpp/src/nonbonded_all_pairs.hpp

timemachine/cpp/src/kernels/k_lambda_transformer_jit.cuh

badisa · 2022-03-14T15:54:23Z

timemachine/cpp/src/nonbonded_all_pairs.cu

@@ -376,26 +407,31 @@ void NonbondedAllPairs<RealType, Interpolated>::execute_device(

    // coords are N,3
    if (d_du_dx) {
-        k_inv_permute_accum<<<dimGrid, tpb, 0, stream>>>(N, d_perm_, d_sorted_du_dx_, d_du_dx);
+        k_scatter_accum<<<dim3(ceil_divide(K_, tpb), 3, 1), tpb, 0, stream>>>(


If I understand the scatter gather paradigm, scatter and accumulate are antonyms?

Oh, maybe this is confusing.. my intent was to generalize the naming when going from the existing K=N case to the more general K<=N as:

permute -> gather

inverse permute -> scatter

This is intended to be independent of whether we accumulate or assign to the result array. Does that make sense?

I am more familiar with the fork and join paradigm and it seemed like fork and scatter are equivalent and gather and join are equivalent. In which case this seemed backwards? I might just not understand the paradigm, was unexpected to see a scatter (or what I interpreted as a broadcast/fork) to reduce.

Doesn't matter as long as it is a consistent convention

Ah, got it. I think this PR is consistently applying the convention described here, but I'm open to other suggestions.

The picture I had for "scatter" was https://pytorch-scatter.readthedocs.io/en/latest/functions/scatter.html .

(the jax / xla documentation for scatter / gather is a bit less intuitive for me)

This is intended to be independent of whether we accumulate or assign to the result array. Does that make sense?

Will need to double-check if these are completely independent (case that allows repeated target indices needs some reduction operation ("accumulate sum"), case that disallows repeated indices can allow direct assignment).

The picture I had for "scatter" was https://pytorch-scatter.readthedocs.io/en/latest/functions/scatter.html .

Err, posted that before I saw @mcwitt 's link #660 (comment) , didn't mean to override that. (I think the only difference is the wiki definition and the current function assume no repeated scatter idxs, while the pytorch_scatter reduces over any repeated scatter idxs. For the current use, only unique idxs make sense.)

I might just not understand the paradigm, was unexpected to see a scatter (or what I interpreted as a broadcast/fork) to reduce.

I had misread the y[idxs[i]] += x[i] as a reduction, but @mcwitt clarified offline that it's always a single addition to whatever was in the output array before.

timemachine/cpp/src/kernels/k_nonbonded.cuh

maxentile

Looks good to me!

proteneer

lgtm!

Better reflects the generality of these operations; there is no restriction that idxs, input array have the same length

Also consolidate and interpolated and non-interpolated cases