This repository contain the implementation of the paper Structure-based drug design by denoising voxel grids:
@inproceedings{pinheiro2024voxbind,
title={Structure-based drug design by denoising voxel grids},
author={Pinheiro, Pedro O and Jamasb, Arian and Mahmood, Omar and Sresht, Vishnu and Saremi, Saeed}
booktitle={ICML},
year={2024}
}
VoxBind is a protein pocket-conditional generative model operating on voxelized molecules. Given a protein pocket, VoxBind generate binding ligands following the (conditional) "walk-jump sampling" approach: (i) sample smoothed molecules with Langevin MCMC and (ii) estimate clean molecule with a voxel denoiser.
We assume the user have anaconda (or, preferably mamba) installed and has access to GPU.
mamba env create -f env.yaml
conda activate voxbind
pip install -e .- Download
split_by_name.ptandcrossdocked_pocket10.tar.gzfrom this link (provided by TargetDiff authors, see their README), place indataset/data/and decompress withtar xvzf crossdocked_pocket10.tar.gz. - Run the following command
cd voxbind/dataset; python preprocess_crossdocked.pyThis will take a couple of hours to be done. The script will generate the following files in dataset/data/ folder:
train_data.pt: contains the train/val splitstest_data.pt: contains the test split
To train a voxbind model with noise level 0.9, run:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py smooth_sigma=0.9To train a voxbind model with noise level 1.0, run:
CUDA_VISIBLE_DEVICES=0,1,2,3 python train.py smooth_sigma=1.0These scripts will save the results (logs and checkpoints) in exps/exp_sig0.9 and exps/exp_sig1.0, respectively. See configs/config_train.yaml for other training options. Eg, use the flag wandb=True to log experiments on wandb.
To sample with a pretrained checkpoint, run, e.g.,
python sample.py pretrained_path=exps/exp_sig0.9 wjs.split=val wjs.n_samples_per_pocket=10 wjs.n_targets=100This script will generate 10 samples for each of the 100 targets on the validation set using the checkpoint located in exps/exp_sig0.9. The generated molecules will be saved in exps/exp_sig0.9/samples/. See configs/config_sample.yaml for other sampling options.
The script sample_from_file.py allows us to easily sample with VoxBind from a given protein pocket.
As an example, we will show how to sample from the protein pockets 8UWP and 6AU3, two of the targets proposed in CACHE 6 challenge. You can find the protein pdbs and the (co-crystallized) ligand sdfs in examples/. These files have been downloaded from PDB datasbase (note that we remove all the water and heteroatoms from the pdb file).
To generate 20 ligands de novo given the protein pocket 8UWP (the default) above, simply run:
python sample_from_file.py pretrained_path=exps/exp_sig0.9/ n_samples=20This script will save the generated ligands into a single sdf file located in exps/exp_sig0.9/sample_from_file/8UWP/denovo/samples.sdf. We also save the target pdb and the ground truth ligand on the sample folder. Below we show the 8UWP pocket, the ground-truth ligand and generated samples:
See config/config_sample_from_file.yaml to see all the options for sampling from file.
For example, if you want to sample from the pocket 6AU3 starting from crystalized ligand (ie, initialize the Langevin MCMC chain with the ligand provided on PDB), run:
python sample_from_file.py \
pretrained_path=exps/exp_sig0.9/ \
target_pdb=../examples/6AU3/6au3.pdb \
ligand_sdf=../examples/6AU3/6au3_B_BWM.sdf \
wjs.chain_init=ligand \
wjs.warmup=0 \
wjs.steps=100 \
wjs.max_steps=100This project is under the Apache license, version 2.0. See LICENSE for details.

