This readme file documents all of the required steps to run CoBDock.
Note that code was implemented and tested on a Linux operating system only.
We have provided an Anaconda environment file for easy set up. If you do not have Anaconda installed, you can get Miniconda from here.
Then, install mamba
:
conda install -c conda-forge mamba
If one is experiencing issues installing mamba
in their base environment, then create a new environment for mamba
using the following:
conda create -n mamba-env -c conda-forge mamba
conda activate mamba-env
Create the cobdock-env
environment using the following command:
mamba env create -n cobdock-env -f environment.yml
mamba activate cobdock-env
Note that you may need to close and re-open the terminal to activate cobdock-env
.
Then, install additional packages with pip using the following command:
pip install -r requirements.txt
Before running CoBDock, one will first need to download and prepare all of the molecular docking and pocket identification algorithms.
Please note that if any programs are missing, then the corresponding values will be filled using mean value imputation.
Before downloading, prepare a ./bin
folder using the following commands:
mkdir bin
mkdir bin/docking
mkdir bin/location_analysis
Vina 1.1.2 [1] can be downloaded from here.
Execute this command to create a vina
subdirectory inside ./bin/docking
:
mkdir bin/docking/vina
and place the vina binary inside the newly created vina
subdirectory.
GalaxyDock3 [2] can be downloaded from here.
Download the archive file and extract the GalaxyDock3
directory into ./bin/docking
.
The PLANTS1.2 [3] and SPORES binaries are available here.
Execute this command to create a plants
subdirectory inside ./bin/docking
:
mkdir bin/docking/plants
and place both binaries inside the newly created plants
subdirectory.
The ZDOCK [4] binary is available here.
Also required are the following files: create_lig
, create.pl
, mark_sur
, and uniCHARMM
.
Execute this command to create a zdock
subdirectory inside ./bin/docking
:
mkdir bin/docking/zdock
and place the ZDOCK binary and all of the suppliementary files inside the newly created zdock
subdirectory.
FPocket4.0 [5] has is included in the provided Anaconda environment.
P2Rank 2.3.1 can be downloaded as an archive here.
Extract it to the ./bin/location_analysis
subdirectory so that p2rank_2.3.1
can be found inside ./bin/location_analysis
.
Confirm that the prank
binary can be found at ./bin/location_analysis/p2rank_2.3.1/prank
.
We have provided MGLTools-1.5.6 at ./MGLTools-1.5.6
.
Modify the .env
file to set the maximum number of processes for:
- Preparing targets
- Executing Vina
- Executing GalaxyDock
- Executing PLANTS
- Executing ZDOCK
We strongly advise setting a low number of processes for PLANTS since it is memory intensive. This file can also be used to configure timeouts for all of the molecular docking programs (set to "1h" (one hour) by default). To set the timeout to 30 minutes, change the timeout to "30m". To set the timeout to 10 seconds, change the timeout to "10s".
Run the following command:
python cobdock/run_cobdock.py
to execute CoBDock for aspirin and Uniprot target P23219. Crystal structures will be selected and prepared automatically.
If the run is successful, CoBDock successfully completed
will be printed to the console.
Please allow some time for a run to complete as some molecular docking algoritms can take some time.
The primary function to run CoBDock is execute_cobdock
in the file ./cobdock/run_cobdock.py
.
This function has the following input arguments:
* ligands_to_targets: a data structure mapping ligand IDs and structures to target lists.
* output_dir: directory to execute CoBDock in.
* map_uniprot_to_pdb: boolean flag to indicate that the targets given in the ligands_to_targets data structure are Uniprot accession IDs. Must be set to `False` if the input targets are given as PDB IDs.
* number_of_pdb: the number of crystal structures to select for each Uniprot target. Has no effect when map_uniprot_to_pdb is set to `False`.
* num_top_pockets: the number of top-ranked pockets to dock into.
* num_poses: the number of poses to generate at each of the top ranked pockets. The total number of poses will be num_top_pockets * num_poses.
* num_complexes: the number of top poses to convert into complexes by combining with the structure of the target.
Set
commercial_use_only = True
in ./cobdock/run_cobdock.py
to run the commercial_use
model from the article.
This model requires AutoDock Vina, Fpocket and P2rank only.
Edit the ligands_to_targets
data structure defined in ./cobdock/run_cobdock.py
to configure all of the pairs that will be input into CoBDock.
SMILES strings must be provided for all input ligands.
Targets must either be all:
- PDB IDs of crystal structure(s) of interest. Optionally, a chain can be appended with an underscore:
6NNA_B
will select only chainB
in PDB structure6NNA
. - Uniprot accession IDs of the target(s) of interest. Crystal structures for any given Uniprot targets will be selected automatically.
There is not requirement for all ligands to be docked into the same set of targets, but all targets given must be either PDB IDs OR Uniprot accession IDs.
One cannot give a list containing both PDB IDs and Uniprot accession IDs.
If Uniprot accession IDs are given, be sure to set the argument map_uniprot_to_pdb
to True
before running.
The directory example_output
provides a sample output from a run.
Output directories are structured as follows:
The directory contains four subdirectories, and it is the following two:
- pocket_locations
- docking
that are the most important.
The pocket_locations
subdirectory contains all of the primary outputs of the run:
-
pocket_locations/top_XXXX_pockets.json
provides details of the top XXXX (where XXXX is equal to the setting of the input paramternum_top_pockets
) predicted binding sites for all ligand, target pairs. The data structure is a nested set of key-value maps, mapping ligand_id to Uniprot accession ID to PDB ID with chain to pocket ID to pocket data. The pocket data is comprised of the location and size of the pocket, its machine-learning predicted pocket score, its rank, as well as its original ID for the binding site identification algorithm that predicted it. -
pocket_locations/pose_data.json
provides a summary of all of the poses that were generated after executing docking at all of the pockets defined inpocket_locations/top_XXXX_pockets.json
. The data structure is similar to that ofpocket_locations/top_XXXX_pockets.json
, however, instead of pocket data, each pocket ID now maps to pose ID, which maps to pose data. Pose data provides the location and size of the bounding box of the pose, and its binding energy (given under the keyenergy
).
The pocket_locations
subdirectory also contains a local_docking
subdirectory that contains PDB files of all of the predicted poses (and complexes).
The poses for a ligand with the ligand ID LIGAND
, docked into target ACCESSION
(crystal structure PDBID
), pocket number POCKETNUM
can be found at:
pocket_locations/local_docking/LIGAND/ACCESSION/PDBID/POCKETNUM/poses
and, if num_complexes
> 0, then any complexes can be found at:
pocket_locations/local_docking/LIGAND/ACCESSION/PDBID/POCKETNUM/complexes
We sincerely thank the authors of all of the software that comprises the CoBDock pipeline.
We thank the authors of CBDock [7] for sharing their data.
[1] Trott, Oleg, and Arthur J. Olson. "AutoDock Vina: improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading." Journal of computational chemistry 31.2 (2010): 455-461.
[2] Yang, Jinsol, Minkyung Baek, and Chaok Seok. "GalaxyDock3: Protein–ligand docking that considers the full ligand conformational flexibility." Journal of Computational Chemistry 40.31 (2019): 2739-2748.
[3] Korb, Oliver, Thomas Stützle, and Thomas E. Exner. "An ant colony optimization approach to flexible protein–ligand docking." Swarm Intelligence 1 (2007): 115-134.
[4] Chen, Rong, Li Li, and Zhiping Weng. "ZDOCK: an initial‐stage protein‐docking algorithm." Proteins: Structure, Function, and Bioinformatics 52.1 (2003): 80-87.
[5] Le Guilloux, Vincent, Peter Schmidtke, and Pierre Tuffery. "Fpocket: an open source platform for ligand pocket detection." BMC bioinformatics 10.1 (2009): 1-11.
[6] Krivák, Radoslav, and David Hoksza. "P2Rank: machine learning based tool for rapid and accurate prediction of ligand binding sites from protein structure." Journal of cheminformatics 10 (2018): 1-12.
[7] Liu, Yang, et al. "CB-Dock: A web server for cavity detection-guided protein–ligand blind docking." Acta Pharmacologica Sinica 41.1 (2020): 138-144.