Skip to content

Commit e198bd5

Browse files
committedAug 19, 2022
merging vanilla and CA ProteinMPNN versions into one script
1 parent 015ff82 commit e198bd5

File tree

128 files changed

+456
-39406
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

128 files changed

+456
-39406
lines changed
 

‎README.md

+45-139
Original file line numberDiff line numberDiff line change
@@ -4,152 +4,58 @@ Read [ProteinMPNN paper](https://www.biorxiv.org/content/10.1101/2022.06.03.4945
44

55
To run ProteinMPNN clone this github repo and install Python>=3.0, PyTorch, Numpy.
66

7-
Full protein backbone models: `vanilla_proteinmpnn`.
7+
Full protein backbone models: `vanilla_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt, v_48_030.pt`.
88

9-
CA only models: `ca_proteinmpnn`.
9+
CA only models: `ca_model_weights/v_48_002.pt, v_48_010.pt, v_48_020.pt`. Enable flag `--ca_only` to use these models.
1010

1111
Helper scripts: `helper_scripts` - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
1212

1313
Code organization:
14-
* `vanilla_proteinmpnn/protein_mpnn_run.py` - the main script to initialialize and run the model.
15-
* `vanilla_proteinmpnn/protein_mpnn_utils.py` - utility functions for the main script.
16-
* `vanilla_proteinmpnn/examples/` - simple code examples.
14+
* `protein_mpnn_run.py` - the main script to initialialize and run the model.
15+
* `protein_mpnn_utils.py` - utility functions for the main script.
16+
* `examples/` - simple code examples.
17+
* `inputs/` - input PDB files for examples
18+
* `outputs/` - outputs from examples
19+
* `colab_notebooks/` - Google Colab examples
1720
-----------------------------------------------------------------------------------------------------
18-
Input flags:
19-
```
20-
argparser.add_argument("--path_to_model_weights", type=str, default="", help="Path to model weights folder;")
21-
argparser.add_argument("--model_name", type=str, default="v_48_020", help="ProteinMPNN model name: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise")
22-
23-
argparser.add_argument("--save_score", type=int, default=0, help="0 for False, 1 for True; save score=-mean[log_probs] to npy files")
24-
argparser.add_argument("--save_probs", type=int, default=0, help="0 for False, 1 for True; save MPNN predicted probabilites per position")
25-
argparser.add_argument("--score_only", type=int, default=0, help="0 for False, 1 for True; score input backbone-sequence pairs")
26-
argparser.add_argument("--conditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)")
27-
argparser.add_argument("--conditional_probs_only_backbone", type=int, default=0, help="0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)")
28-
argparser.add_argument("--unconditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass")
29-
30-
argparser.add_argument("--backbone_noise", type=float, default=0.00, help="Standard deviation of Gaussian noise to add to backbone atoms during the inference.")
31-
argparser.add_argument("--num_seq_per_target", type=int, default=1, help="Number of sequences to generate per target.")
32-
argparser.add_argument("--batch_size", type=int, default=1, help="Batch size when using GPUs.")
33-
argparser.add_argument("--max_length", type=int, default=20000, help="Maximum sequence length.")
34-
argparser.add_argument("--sampling_temp", type=str, default="0.1", help="A string of temperatures, 0.1 0.3 0.5. Sampling temperature for amino acids, T=0.0 means taking argmax, T>>1.0 means sampling randomly.")
35-
36-
argparser.add_argument("--out_folder", type=str, help="Path to a folder to output sequences, e.g. /home/out/")
37-
argparser.add_argument("--pdb_path", type=str, default='', help="Path to a single PDB to be designed.")
38-
argparser.add_argument("--pdb_path_chains", type=str, default='', help="Define which chains need to be designed for a single PDB.")
39-
argparser.add_argument("--jsonl_path", type=str, help="Path to a folder with parsed PDBs into jsonl.")
40-
argparser.add_argument("--chain_id_jsonl",type=str, default='', help="Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed.")
41-
argparser.add_argument("--fixed_positions_jsonl", type=str, default='', help="Path to a dictionary with fixed positions.")
42-
argparser.add_argument("--omit_AAs", type=list, default='X', help="Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.")
43-
argparser.add_argument("--bias_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies AA composion bias, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely.")
44-
argparser.add_argument("--bias_by_res_jsonl", default='', help="Path to dictionary with per position bias.")
45-
argparser.add_argument("--omit_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies which amino acids need to be omited from design at specific chain indices.")
46-
argparser.add_argument("--pssm_jsonl", type=str, default='', help="Path to a dictionary with pssm.")
47-
argparser.add_argument("--pssm_multi", type=float, default=0.0, help="A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions.")
48-
argparser.add_argument("--pssm_threshold", type=float, default=0.0, help="A value between -inf + inf to restric per position AAs.")
49-
argparser.add_argument("--pssm_log_odds_flag", type=int, default=0, help="0 for False, 1 for True.")
50-
argparser.add_argument("--pssm_bias_flag", type=int, default=0, help="0 for False, 1 for True.")
51-
argparser.add_argument("--tied_positions_jsonl", type=str, default='', help="Path to a dictionary with tied positions for symmetric design.")
52-
```
53-
-----------------------------------------------------------------------------------------------------
54-
Example from `vanilla_proteinmpnn/examples/` to design a single PDB file:
55-
```
56-
path_to_PDB="../PDB_complexes/pdbs/3HTN.pdb"
57-
58-
output_dir="../PDB_complexes/example_3_outputs"
59-
if [ ! -d $output_dir ]
60-
then
61-
mkdir -p $output_dir
62-
fi
63-
64-
chains_to_design="A B" #design only chains A and B while using the context of other chains
65-
66-
python ../protein_mpnn_run.py \
67-
--pdb_path $path_to_PDB \
68-
--pdb_path_chains "$chains_to_design" \
69-
--out_folder $output_dir \
70-
--num_seq_per_target 2 \
71-
--sampling_temp "0.1" \
72-
--batch_size 1
73-
```
74-
-----------------------------------------------------------------------------------------------------
75-
Example from `vanilla_proteinmpnn/examples/` to design some monomers:
76-
```
77-
folder_with_pdbs="../PDB_monomers/pdbs/"
78-
79-
output_dir="../PDB_monomers/example_1_outputs"
80-
if [ ! -d $output_dir ]
81-
then
82-
mkdir -p $output_dir
83-
fi
84-
85-
path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
86-
87-
python ../../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
21+
Input flags for `protein_mpnn_run.py`:
22+
```
23+
argparser.add_argument("--ca_only", action="store_true", default=False, help="Parse CA-only structures and use CA-only models (default: false)")
24+
argparser.add_argument("--path_to_model_weights", type=str, default="", help="Path to model weights folder;")
25+
argparser.add_argument("--model_name", type=str, default="v_48_020", help="ProteinMPNN model name: v_48_002, v_48_010, v_48_020, v_48_030; v_48_010=version with 48 edges 0.10A noise")
26+
argparser.add_argument("--seed", type=int, default=0, help="If set to 0 then a random seed will be picked;")
27+
argparser.add_argument("--save_score", type=int, default=0, help="0 for False, 1 for True; save score=-log_prob to npy files")
28+
argparser.add_argument("--save_probs", type=int, default=0, help="0 for False, 1 for True; save MPNN predicted probabilites per position")
29+
argparser.add_argument("--score_only", type=int, default=0, help="0 for False, 1 for True; score input backbone-sequence pairs")
30+
argparser.add_argument("--conditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output conditional probabilities p(s_i given the rest of the sequence and backbone)")
31+
argparser.add_argument("--conditional_probs_only_backbone", type=int, default=0, help="0 for False, 1 for True; if true output conditional probabilities p(s_i given backbone)")
32+
argparser.add_argument("--unconditional_probs_only", type=int, default=0, help="0 for False, 1 for True; output unconditional probabilities p(s_i given backbone) in one forward pass")
33+
argparser.add_argument("--backbone_noise", type=float, default=0.00, help="Standard deviation of Gaussian noise to add to backbone atoms")
34+
argparser.add_argument("--num_seq_per_target", type=int, default=1, help="Number of sequences to generate per target")
35+
argparser.add_argument("--batch_size", type=int, default=1, help="Batch size; can set higher for titan, quadro GPUs, reduce this if running out of GPU memory")
36+
argparser.add_argument("--max_length", type=int, default=200000, help="Max sequence length")
37+
argparser.add_argument("--sampling_temp", type=str, default="0.1", help="A string of temperatures, 0.2 0.25 0.5. Sampling temperature for amino acids. Suggested values 0.1, 0.15, 0.2, 0.25, 0.3. Higher values will lead to more diversity.")
38+
argparser.add_argument("--out_folder", type=str, help="Path to a folder to output sequences, e.g. /home/out/")
39+
argparser.add_argument("--pdb_path", type=str, default='', help="Path to a single PDB to be designed")
40+
argparser.add_argument("--pdb_path_chains", type=str, default='', help="Define which chains need to be designed for a single PDB ")
41+
argparser.add_argument("--jsonl_path", type=str, help="Path to a folder with parsed pdb into jsonl")
42+
argparser.add_argument("--chain_id_jsonl",type=str, default='', help="Path to a dictionary specifying which chains need to be designed and which ones are fixed, if not specied all chains will be designed.")
43+
argparser.add_argument("--fixed_positions_jsonl", type=str, default='', help="Path to a dictionary with fixed positions")
44+
argparser.add_argument("--omit_AAs", type=list, default='X', help="Specify which amino acids should be omitted in the generated sequence, e.g. 'AC' would omit alanine and cystine.")
45+
argparser.add_argument("--bias_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies AA composion bias if neededi, e.g. {A: -1.1, F: 0.7} would make A less likely and F more likely.")
46+
argparser.add_argument("--bias_by_res_jsonl", default='', help="Path to dictionary with per position bias.")
47+
argparser.add_argument("--omit_AA_jsonl", type=str, default='', help="Path to a dictionary which specifies which amino acids need to be omited from design at specific chain indices")
48+
argparser.add_argument("--pssm_jsonl", type=str, default='', help="Path to a dictionary with pssm")
49+
argparser.add_argument("--pssm_multi", type=float, default=0.0, help="A value between [0.0, 1.0], 0.0 means do not use pssm, 1.0 ignore MPNN predictions")
50+
argparser.add_argument("--pssm_threshold", type=float, default=0.0, help="A value between -inf + inf to restric per position AAs")
51+
argparser.add_argument("--pssm_log_odds_flag", type=int, default=0, help="0 for False, 1 for True")
52+
argparser.add_argument("--pssm_bias_flag", type=int, default=0, help="0 for False, 1 for True")
53+
argparser.add_argument("--tied_positions_jsonl", type=str, default='', help="Path to a dictionary with tied positions")
8854
89-
python ../protein_mpnn_run.py \
90-
--jsonl_path $path_for_parsed_chains \
91-
--out_folder $output_dir \
92-
--num_seq_per_target 2 \
93-
--sampling_temp "0.1" \
94-
--batch_size 1
9555
```
9656
-----------------------------------------------------------------------------------------------------
97-
Example from `vanilla_proteinmpnn/examples/` to design some homomers:
98-
```
99-
folder_with_pdbs="../PDB_homooligomers/pdbs/"
100-
101-
output_dir="../PDB_homooligomers/example_6_outputs"
102-
if [ ! -d $output_dir ]
103-
then
104-
mkdir -p $output_dir
105-
fi
106-
107-
path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
108-
path_for_tied_positions=$output_dir"/tied_pdbs.jsonl"
109-
110-
python ../../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
111-
112-
python ../../helper_scripts/make_tied_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_tied_positions --homooligomer 1
113-
114-
python ../protein_mpnn_run.py \
115-
--jsonl_path $path_for_parsed_chains \
116-
--tied_positions_jsonl $path_for_tied_positions \
117-
--out_folder $output_dir \
118-
--num_seq_per_target 2 \
119-
--sampling_temp "0.2" \
120-
--batch_size 1
121-
```
122-
-----------------------------------------------------------------------------------------------------
123-
Example from `vanilla_proteinmpnn/examples/` to design some complexes:
124-
```
125-
folder_with_pdbs="../PDB_complexes/pdbs/"
126-
127-
output_dir="../PDB_complexes/example_4_outputs"
128-
if [ ! -d $output_dir ]
129-
then
130-
mkdir -p $output_dir
131-
fi
132-
133-
path_for_parsed_chains=$output_dir"/parsed_pdbs.jsonl"
134-
path_for_assigned_chains=$output_dir"/assigned_pdbs.jsonl"
135-
path_for_fixed_positions=$output_dir"/fixed_pdbs.jsonl"
136-
chains_to_design="A C"
137-
#The first amino acid in the chain corresponds to 1 and not PDB residues index for now.
138-
fixed_positions="1 2 3 4 5 6 7 8 23 25, 10 11 12 13 14 15 16 17 18 19 20 40" #fixing/not designing residues 1 2 3...25 in chain A and residues 10 11 12...40 in chain C
139-
140-
python ../../helper_scripts/parse_multiple_chains.py --input_path=$folder_with_pdbs --output_path=$path_for_parsed_chains
141-
142-
python ../../helper_scripts/assign_fixed_chains.py --input_path=$path_for_parsed_chains --output_path=$path_for_assigned_chains --chain_list "$chains_to_design"
143-
144-
python ../../helper_scripts/make_fixed_positions_dict.py --input_path=$path_for_parsed_chains --output_path=$path_for_fixed_positions --chain_list "$chains_to_design" --position_list "$fixed_positions"
145-
146-
python ../protein_mpnn_run.py \
147-
--jsonl_path $path_for_parsed_chains \
148-
--chain_id_jsonl $path_for_assigned_chains \
149-
--fixed_positions_jsonl $path_for_fixed_positions \
150-
--out_folder $output_dir \
151-
--num_seq_per_target 2 \
152-
--sampling_temp "0.1" \
153-
--batch_size 1
154-
```
57+
For example to make a conda environment to run ProteinMPNN:
58+
* `conda create --name mlfold` - this creates conda environment called `mlfold`
59+
* `source activate mlfold` - this activate environment
60+
* `conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch` - install pytorch following steps from https://pytorch.org/
15561

File renamed without changes.

0 commit comments

Comments
 (0)
Please sign in to comment.