upload_code

xuhuihuang · Sep 4, 2024 · 03cb226 · 03cb226
1 parent 731456d
commit 03cb226
Show file tree

Hide file tree

Showing 25 changed files with 2,147 additions and 1 deletion.
diff --git a/.github/workflows/build-code.yml b/.github/workflows/build-code.yml
@@ -0,0 +1,39 @@
+name: Build TS-DART source code
+
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+
+jobs:
+  build-code:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      max-parallel: 6
+      matrix:
+        python-version: ["3.9","3.10"]
+        os: 
+          - macOS-latest
+          - ubuntu-latest
+          - windows-latest
+    defaults:
+      run:
+        shell: bash -el {0}
+
+    steps:
+    - uses: actions/checkout@v4
+    - uses: conda-incubator/setup-miniconda@v3
+      with:
+        auto-activate-base: true
+        auto-update-conda: true
+        python-version: ${{ matrix.python-version }}
+        activate-environment: test
+    - name: Install required packages 
+      run: |
+        python -m pip install --upgrade pip
+    - name: Pip install
+      run: |
+        python -m pip install .
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,9 @@
+# Byte-compiled / optimized / DLL files
+.idea/
+__pycache__/
+
+# Distribution / packaging
+build/
+dist/
+*.egg-info/
+*.egg/
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Bojun Liu
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/MANIFEST.in b/MANIFEST.in
@@ -0,0 +1 @@
+include LICENSE.txt
diff --git a/README.md b/README.md
@@ -1 +1,78 @@
-# memnets
+# MEMnets: Memory kernel minimization-based neural networks
+
+### Abstract
+
+Identifying collective variables (CVs) that accurately capture the slowest timescales of protein conformational changes is crucial to comprehend numerous biological processes. In this work, we develop a novel algorithm, the Memory kErnel Minimization based Neural Networks (MEMnets), that accurately identifies the slow CVs of biomolecular dynamics. MEMnets is distinct from popular deep-learning approaches (such as VAMPnets/SRVs) that assume Markovian dynamics. Instead, MEMnets is built on the integrative generalized master equation (IGME) theory, which incorporates non-Markovian dynamics by encoding them in a memory kernel for continuous CVs. The key innovation of MEMnets is to identify optimal CVs by minimizing time-integrated memory kernels. To accomplish this, MEMnets process time sequences of molecular dynamics (MD) conformations by using parallel encoder neural networks that project high-dimensional MD data into a low-dimensional latent space. The time-integrated memory kernels, derived from IGME theory, are then computed in the latent space as the objective function. We demonstrate that our MEMnets algorithm can effectively identify the slow CVs involved in the folding of FIP35 WW-domain with high accuracy, and successful reveal two parallel folding pathways. Furthermore, we test MEMnets’ on the clamp opening of a bacterial RNA polymerase (RNAP), a much more complex conformational change (a system containing over 540K atoms), where sampling from all-atom MD simulations is limited. Our results demonstrate that MEMnets greatly outperforms SRVs, which is based on Markovian dynamics and may result in disconnected dynamics along the identified CVs. We anticipate that MEMnets holds promise to be widely to study biomolecular conformational changes.
+
+### Illustration
+
+![figure](./docs/figs/fig1.png)
+
+## System requires
+
+The software package can be installed and runned on Linux, Windows, and MacOS 
+
+Dependency of Python and Python packages: 
+
+(versions that has been previously tested on are also listed below, other versions should work the same)
+
+```bash
+python == 3.9
+numpy == 1.26.1
+scipy == 1.11.4
+torch == 1.13.1
+tqdm == 4.66.1
+```
+The required python packages with the latest versions will be automatically installed if these python packages are not already present in your local Python environment.
+
+## Installation from sources
+
+The source code can be installed with a local clone:
+
+The most time-consuming step is the installation of PyTorch (especially cuda version) and the whole installation procedure takes around 5 mins to complete at a local desktop.
+
+```bash
+git clone https://github.com/xuhuihuang/memnets.git
+```
+
+```bash
+python -m pip install ./memnets
+```
+
+## Quick start
+
+### Start with jupyter notebook
+
+Check this file for the demo:
+
+```
+./memnets/example/alanine_dipeptide.ipynb
+```
+
+### Start with python script (Linux)
+
+The whole training procedure of the following demo on i9-10900k cpu takes around 30mins to complete at a local desktop.
+
+```sh
+python ./memnets/scripts/train_memnets.py \
+    --seed 1 \
+    --device 'cpu' \
+    --lagtimes 1 7 8 9 10 \
+    --encoder_sizes 30 30 30 30 10 3 \
+    --decay_rate 0.002 \
+    --thres 0.05 \
+    --learning_rate 0.001 \
+    --n_epochs 100 \
+    --train_split 0.9 \
+    --train_batch_size 10000 \
+    --data_directory ./memnets/data/alanine_dipeptide/cartesian_coordinates \
+    --saving_directory .  
+```
+
+Or
+```
+sh ./memnets/scripts/train_memnets.sh
+```
+
+[Go to Top](#Abstract)
+
diff --git a/data/alanine_dipeptide/cartesian_coordinates/xyz_traj0.npy b/data/alanine_dipeptide/cartesian_coordinates/xyz_traj0.npy
diff --git a/data/alanine_dipeptide/cartesian_coordinates/xyz_traj1.npy b/data/alanine_dipeptide/cartesian_coordinates/xyz_traj1.npy
diff --git a/data/alanine_dipeptide/cartesian_coordinates/xyz_traj2.npy b/data/alanine_dipeptide/cartesian_coordinates/xyz_traj2.npy
diff --git a/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj0.npy b/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj0.npy
diff --git a/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj1.npy b/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj1.npy
diff --git a/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj2.npy b/data/alanine_dipeptide/dihedral_coordinates/dihedral_traj2.npy
diff --git a/docs/figs/fig1.png b/docs/figs/fig1.png
diff --git a/example/alanine_dipeptide.ipynb b/example/alanine_dipeptide.ipynb
diff --git a/memnets/__init__.py b/memnets/__init__.py
diff --git a/memnets/gme/__init__.py b/memnets/gme/__init__.py
diff --git a/memnets/gme/loss.py b/memnets/gme/loss.py
@@ -0,0 +1,177 @@
+import numpy as np
+import torch
+from .utils import eig_decomposition, calculate_inverse, estimate_c_tilde_matrix
+
+class MEMLoss:
+    """ Compute MEMNet loss.
+
+    Parameters
+    ----------
+    lagtimes : list
+        Encoder lag times of MEMnets.
+
+    device : torch.device, default = None
+        The device on which the torch modules are executed.
+
+    epsilon : float, default = 1e-6
+        The strength of the regularization/truncation under which matrices are inverted.
+
+    mode : str, default = 'regularize'
+        'regularize': regularize the eigenvalues by adding epsilon.
+        'trunc': truncate the eigenvalues by filtering out the eigenvalues below epsilon.
+
+    reversible : boolean, default = True
+        Whether to enforce detailed balance constraint. 
+    """
+
+    def __init__(self, lagtimes, device=None, epsilon=1e-6, mode='regularize', reversible=True):
+
+        self._lagtime_0 = lagtimes[0]
+        self._lagtime_hat = lagtimes[1:]
+        self._epsilon = epsilon
+        self._mode = mode
+        self._device = device
+        self._X = None
+        self._Y = None
+        self._W = None
+        self._m0_tilde = None
+        self._log_lambda_hat = None
+
+        self._lm_list = []
+        self._m0_tilde_list = []
+        self._log_lambda_hat_list = []
+        self._rmse_list = []
+        self._Y_list = []
+        self._Y_predict_list = []
+
+        self._reversible = reversible
+
+    @property
+    def m0_tilde(self):
+        return self._m0_tilde
+
+    @property
+    def log_lambda_hat(self):
+        return self._log_lambda_hat
+
+    @property
+    def Y(self):
+        return self._Y
+
+    def fit(self, data):
+
+        assert len(data) > 2
+
+        matrices = [estimate_c_tilde_matrix(data[0], data[i + 1], reversible=self._reversible) for i in range(len(data) - 1)]
+        log_eigvals = []
+        for i in range(len(data) - 1):
+            tmp, _ = eig_decomposition(matrices[i], epsilon=self._epsilon, mode=self._mode)
+            idx = torch.argsort(tmp, descending=True)
+            tmp = tmp[idx]
+            log_eigvals.append(torch.log(tmp))
+        log_eigvals = torch.stack(log_eigvals)
+
+        log_eigvals_0 = log_eigvals[0]
+        ones = torch.ones(len(log_eigvals) - 1).to(device=self._device)
+        lagtimes = torch.tensor(self._lagtime_hat).to(device=self._device)
+
+        self._X = torch.cat((lagtimes, ones), dim=0).reshape(2, len(log_eigvals) - 1).t()
+        self._Y = log_eigvals[1:]
+        left = calculate_inverse(torch.matmul(self._X.t(), self._X), epsilon=self._epsilon, mode=self._mode)
+        right = torch.matmul(self._X.t(), self._Y)
+        self._W = torch.matmul(left, right)
+
+        self._log_lambda_hat = self._W[0, :]
+        self._m0_tilde = torch.clip((log_eigvals_0 / self._lagtime_0 - self._log_lambda_hat),max=0)
+
+        return self
+
+    def loss(self, weight=1.):
+
+        loss = torch.sum(torch.abs(self._m0_tilde) + weight * torch.abs(self._log_lambda_hat))
+
+        return loss
+
+    def lm(self, weight=1.):
+
+        lm = torch.abs(self._m0_tilde) + weight * torch.abs(self._log_lambda_hat)
+
+        return lm
+
+    @property
+    def Y_predict(self):
+
+        Y_predict = torch.matmul(self._X, self._W)
+
+        return Y_predict
+
+    @property
+    def rmse(self):
+
+        Y_predict = self.Y_predict
+        num_lagtimes, num_modes = Y_predict.shape[0], Y_predict.shape[1]
+        delta_sq = (Y_predict - self._Y) ** 2
+        rmse = torch.sqrt(torch.sum(delta_sq, dim=0) / num_lagtimes)
+
+        return rmse
+
+    def save(self, weight=1.):
+
+        with torch.no_grad():
+
+            self._lm_list.append(self.lm(weight=weight).cpu().numpy())
+            self._m0_tilde_list.append(self.m0_tilde.cpu().numpy())
+            self._log_lambda_hat_list.append(self.log_lambda_hat.cpu().numpy())
+
+            self._rmse_list.append(self.rmse.cpu().numpy())
+            self._Y_list.append(self.Y.cpu().numpy())
+            self._Y_predict_list.append(self.Y_predict.cpu().numpy())
+
+        return self
+
+    def output_mean_lm(self):
+
+        mean_lm = np.mean(np.stack(self._lm_list), axis=0)
+
+        return mean_lm
+
+    def output_mean_m0_tilde(self):
+
+        mean_m0_tilde = np.mean(np.stack(self._m0_tilde_list), axis=0)
+
+        return mean_m0_tilde
+
+    def output_mean_log_lambda_hat(self):
+
+        mean_log_lambda_hat = np.mean(np.stack(self._log_lambda_hat_list), axis=0)
+
+        return mean_log_lambda_hat
+
+    def output_mean_rmse(self):
+
+        mean_rmse = np.mean(np.stack(self._rmse_list), axis=0)
+
+        return mean_rmse
+
+    def output_mean_Y(self):
+
+        mean_Y = np.mean(np.stack(self._Y_list), axis=0)
+
+        return mean_Y
+
+    def output_mean_Y_predict(self):
+
+        mean_Y_predict = np.mean(np.stack(self._Y_predict_list), axis=0)
+
+        return mean_Y_predict
+
+    def clear(self):
+
+        self._lm_list = []
+        self._m0_tilde_list = []
+        self._log_lambda_hat_list = []
+        self._rmse_list = []
+        self._Y_list = []
+        self._Y_predict_list = []
+
+        return self