Paper (ICML 2025 GenBio Workshop) | ToxBench Dataset
This repository contains the PyTorch implementation of DualBind, a 3D structure-based deep learning model with a dual-loss framework for accurate and fast protein-ligand binding affinity prediction, along with scripts to benchmark DualBind on the ToxBench AB-FEP dataset.
Clone this repo and then use env.sh to create the conda environment.
bash env.sh
conda activate dualbindToxBench is the first large-scale AB-FEP dataset designed for ML development and focused on a single pharmaceutically critical target, Human Estrogen Receptor Alpha (ERα). ToxBench provides 8,770 ERα-ligand complexes with AB-FEP caculated binding free energies. The dataset includes:
- Protein-ligand structures in PDB and SDF format.
- Binding affinities computed via AB-FEP in CSV format.
- Predefined training/validation/test splits to ensure robust model evaluation.
More details about the ToxBench dataset can be found in our paper. The full dataset is publicly available on Hugging Face.
DualBind integrates supervised mean squared error (MSE) with unsupervised denoising score matching (DSM) to effectively learn the protein-ligand binding energy function.
- Download the ToxBench dataset
- Configure training parameters in
conf/train_toxbench.yaml - Run training:
python train_toxbench.pyYou can use our ToxBench-trained DualBind checkpoint (available on NGC) for inference.
- Download the DualBind checkpoint
- Configure inference parameters in
conf/inference_toxbench.yaml, especially for protein_files and ligand_files - Run inference:
cd DualBind
python inference_toxbench.pyThe results will be saved in a CSV file containing predicted binding affinities.
If you use DualBind or ToxBench in your research, please cite:
@inproceedings{
liu2025toxbench,
title={{ToxBench}: A Binding Affinity Prediction Benchmark with {AB}-{FEP}-Calculated Labels for Human Estrogen Receptor Alpha},
author={Meng Liu and Karl Leswing and Simon K.S. Chu and Farha Ramezanghorbani and Griffin Young and Gabriel Marques and Prerna Das and Anjali Panikar and Esther Jamir and Mohammed Sulaiman Shamsudeen and K. Shawn Watts and Ananya Sen and Hari Priya Devannagari and Edward B. Miller and Muyun Lihan and Howook Hwang and Janet Paulsen and Xin Yu and Kyle Gion and Timur Rvachov and Emine Kucukbenli and Saee Gopal Paliwal},
booktitle={ICML 2025 Generative AI and Biology (GenBio) Workshop},
year={2025},
url={https://openreview.net/forum?id=5lpHuVsE94}
}The DualBind source code and checkpoint are released under an NVIDIA license for non-commercial or research purposes only. Please refer to the LICENSE file for details.

