Skip to content

Commit

Permalink
upload_code
Browse files Browse the repository at this point in the history
  • Loading branch information
bojunliu0818 committed Sep 4, 2024
1 parent 731456d commit 03cb226
Show file tree
Hide file tree
Showing 25 changed files with 2,147 additions and 1 deletion.
39 changes: 39 additions & 0 deletions .github/workflows/build-code.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
name: Build TS-DART source code

on:
push:
branches:
- main
pull_request:
branches:
- main

jobs:
build-code:
runs-on: ${{ matrix.os }}
strategy:
max-parallel: 6
matrix:
python-version: ["3.9","3.10"]
os:
- macOS-latest
- ubuntu-latest
- windows-latest
defaults:
run:
shell: bash -el {0}

steps:
- uses: actions/checkout@v4
- uses: conda-incubator/setup-miniconda@v3
with:
auto-activate-base: true
auto-update-conda: true
python-version: ${{ matrix.python-version }}
activate-environment: test
- name: Install required packages
run: |
python -m pip install --upgrade pip
- name: Pip install
run: |
python -m pip install .
9 changes: 9 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# Byte-compiled / optimized / DLL files
.idea/
__pycache__/

# Distribution / packaging
build/
dist/
*.egg-info/
*.egg/
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 Bojun Liu

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
1 change: 1 addition & 0 deletions MANIFEST.in
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
include LICENSE.txt
79 changes: 78 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,78 @@
# memnets
# MEMnets: Memory kernel minimization-based neural networks

### Abstract

Identifying collective variables (CVs) that accurately capture the slowest timescales of protein conformational changes is crucial to comprehend numerous biological processes. In this work, we develop a novel algorithm, the Memory kErnel Minimization based Neural Networks (MEMnets), that accurately identifies the slow CVs of biomolecular dynamics. MEMnets is distinct from popular deep-learning approaches (such as VAMPnets/SRVs) that assume Markovian dynamics. Instead, MEMnets is built on the integrative generalized master equation (IGME) theory, which incorporates non-Markovian dynamics by encoding them in a memory kernel for continuous CVs. The key innovation of MEMnets is to identify optimal CVs by minimizing time-integrated memory kernels. To accomplish this, MEMnets process time sequences of molecular dynamics (MD) conformations by using parallel encoder neural networks that project high-dimensional MD data into a low-dimensional latent space. The time-integrated memory kernels, derived from IGME theory, are then computed in the latent space as the objective function. We demonstrate that our MEMnets algorithm can effectively identify the slow CVs involved in the folding of FIP35 WW-domain with high accuracy, and successful reveal two parallel folding pathways. Furthermore, we test MEMnets’ on the clamp opening of a bacterial RNA polymerase (RNAP), a much more complex conformational change (a system containing over 540K atoms), where sampling from all-atom MD simulations is limited. Our results demonstrate that MEMnets greatly outperforms SRVs, which is based on Markovian dynamics and may result in disconnected dynamics along the identified CVs. We anticipate that MEMnets holds promise to be widely to study biomolecular conformational changes.

### Illustration

![figure](./docs/figs/fig1.png)

## System requires

The software package can be installed and runned on Linux, Windows, and MacOS

Dependency of Python and Python packages:

(versions that has been previously tested on are also listed below, other versions should work the same)

```bash
python == 3.9
numpy == 1.26.1
scipy == 1.11.4
torch == 1.13.1
tqdm == 4.66.1
```
The required python packages with the latest versions will be automatically installed if these python packages are not already present in your local Python environment.

## Installation from sources

The source code can be installed with a local clone:

The most time-consuming step is the installation of PyTorch (especially cuda version) and the whole installation procedure takes around 5 mins to complete at a local desktop.

```bash
git clone https://github.com/xuhuihuang/memnets.git
```

```bash
python -m pip install ./memnets
```

## Quick start

### Start with jupyter notebook

Check this file for the demo:

```
./memnets/example/alanine_dipeptide.ipynb
```

### Start with python script (Linux)

The whole training procedure of the following demo on i9-10900k cpu takes around 30mins to complete at a local desktop.

```sh
python ./memnets/scripts/train_memnets.py \
--seed 1 \
--device 'cpu' \
--lagtimes 1 7 8 9 10 \
--encoder_sizes 30 30 30 30 10 3 \
--decay_rate 0.002 \
--thres 0.05 \
--learning_rate 0.001 \
--n_epochs 100 \
--train_split 0.9 \
--train_batch_size 10000 \
--data_directory ./memnets/data/alanine_dipeptide/cartesian_coordinates \
--saving_directory .
```

Or
```
sh ./memnets/scripts/train_memnets.sh
```

[Go to Top](#Abstract)

Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file added docs/figs/fig1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
484 changes: 484 additions & 0 deletions example/alanine_dipeptide.ipynb

Large diffs are not rendered by default.

Empty file added memnets/__init__.py
Empty file.
Empty file added memnets/gme/__init__.py
Empty file.
177 changes: 177 additions & 0 deletions memnets/gme/loss.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,177 @@
import numpy as np
import torch
from .utils import eig_decomposition, calculate_inverse, estimate_c_tilde_matrix

class MEMLoss:
""" Compute MEMNet loss.
Parameters
----------
lagtimes : list
Encoder lag times of MEMnets.
device : torch.device, default = None
The device on which the torch modules are executed.
epsilon : float, default = 1e-6
The strength of the regularization/truncation under which matrices are inverted.
mode : str, default = 'regularize'
'regularize': regularize the eigenvalues by adding epsilon.
'trunc': truncate the eigenvalues by filtering out the eigenvalues below epsilon.
reversible : boolean, default = True
Whether to enforce detailed balance constraint.
"""

def __init__(self, lagtimes, device=None, epsilon=1e-6, mode='regularize', reversible=True):

self._lagtime_0 = lagtimes[0]
self._lagtime_hat = lagtimes[1:]
self._epsilon = epsilon
self._mode = mode
self._device = device
self._X = None
self._Y = None
self._W = None
self._m0_tilde = None
self._log_lambda_hat = None

self._lm_list = []
self._m0_tilde_list = []
self._log_lambda_hat_list = []
self._rmse_list = []
self._Y_list = []
self._Y_predict_list = []

self._reversible = reversible

@property
def m0_tilde(self):
return self._m0_tilde

@property
def log_lambda_hat(self):
return self._log_lambda_hat

@property
def Y(self):
return self._Y

def fit(self, data):

assert len(data) > 2

matrices = [estimate_c_tilde_matrix(data[0], data[i + 1], reversible=self._reversible) for i in range(len(data) - 1)]
log_eigvals = []
for i in range(len(data) - 1):
tmp, _ = eig_decomposition(matrices[i], epsilon=self._epsilon, mode=self._mode)
idx = torch.argsort(tmp, descending=True)
tmp = tmp[idx]
log_eigvals.append(torch.log(tmp))
log_eigvals = torch.stack(log_eigvals)

log_eigvals_0 = log_eigvals[0]
ones = torch.ones(len(log_eigvals) - 1).to(device=self._device)
lagtimes = torch.tensor(self._lagtime_hat).to(device=self._device)

self._X = torch.cat((lagtimes, ones), dim=0).reshape(2, len(log_eigvals) - 1).t()
self._Y = log_eigvals[1:]
left = calculate_inverse(torch.matmul(self._X.t(), self._X), epsilon=self._epsilon, mode=self._mode)
right = torch.matmul(self._X.t(), self._Y)
self._W = torch.matmul(left, right)

self._log_lambda_hat = self._W[0, :]
self._m0_tilde = torch.clip((log_eigvals_0 / self._lagtime_0 - self._log_lambda_hat),max=0)

return self

def loss(self, weight=1.):

loss = torch.sum(torch.abs(self._m0_tilde) + weight * torch.abs(self._log_lambda_hat))

return loss

def lm(self, weight=1.):

lm = torch.abs(self._m0_tilde) + weight * torch.abs(self._log_lambda_hat)

return lm

@property
def Y_predict(self):

Y_predict = torch.matmul(self._X, self._W)

return Y_predict

@property
def rmse(self):

Y_predict = self.Y_predict
num_lagtimes, num_modes = Y_predict.shape[0], Y_predict.shape[1]
delta_sq = (Y_predict - self._Y) ** 2
rmse = torch.sqrt(torch.sum(delta_sq, dim=0) / num_lagtimes)

return rmse

def save(self, weight=1.):

with torch.no_grad():

self._lm_list.append(self.lm(weight=weight).cpu().numpy())
self._m0_tilde_list.append(self.m0_tilde.cpu().numpy())
self._log_lambda_hat_list.append(self.log_lambda_hat.cpu().numpy())

self._rmse_list.append(self.rmse.cpu().numpy())
self._Y_list.append(self.Y.cpu().numpy())
self._Y_predict_list.append(self.Y_predict.cpu().numpy())

return self

def output_mean_lm(self):

mean_lm = np.mean(np.stack(self._lm_list), axis=0)

return mean_lm

def output_mean_m0_tilde(self):

mean_m0_tilde = np.mean(np.stack(self._m0_tilde_list), axis=0)

return mean_m0_tilde

def output_mean_log_lambda_hat(self):

mean_log_lambda_hat = np.mean(np.stack(self._log_lambda_hat_list), axis=0)

return mean_log_lambda_hat

def output_mean_rmse(self):

mean_rmse = np.mean(np.stack(self._rmse_list), axis=0)

return mean_rmse

def output_mean_Y(self):

mean_Y = np.mean(np.stack(self._Y_list), axis=0)

return mean_Y

def output_mean_Y_predict(self):

mean_Y_predict = np.mean(np.stack(self._Y_predict_list), axis=0)

return mean_Y_predict

def clear(self):

self._lm_list = []
self._m0_tilde_list = []
self._log_lambda_hat_list = []
self._rmse_list = []
self._Y_list = []
self._Y_predict_list = []

return self
Loading

0 comments on commit 03cb226

Please sign in to comment.