FoldFusion automates ligand transplantation by combining AlphaFold protein models with donor complexes from experimental structures. The pipeline orchestrates external tools for pocket detection, structural alignment, ligand extraction, refinement, and evaluation. It is designed for structural biologists who need reproducible ligand placement and scoring on AlphaFold predictions.
- End-to-end pipeline that fetches AlphaFold models, identifies binding pockets, aligns donor complexes, transplants ligands, and refines poses.
- Modular tool wrappers for DoGSite3, SIENA, LigandExtractor, and JAMDA with consistent logging and error handling.
- Resume-aware execution that skips completed UniProt IDs and records per-stage success or failure markers.
- Configurable timeouts, memory ceilings, and concurrency options for tuning to local workstations or compute clusters.
- Structured outputs, evaluation metrics, and logs to inspect pipeline progress and quality.
- Linux or macOS with Python 3.12.
- GNU Make.
- External binaries installed and accessible: DoGSite3 (DoGSiteScorer), SIENA plus its database generator, LigandExtractor, and JAMDA scorer.
- Network access to download AlphaFold models unless cached locally.
- Hardware guidance: 4 CPU cores and 16 GB RAM recommended; ensure ~50 GB free storage for models, intermediate files, and results.
-
Clone the repository and enter the project folder:
git clone https://github.com/mariusrueve/foldfusion.git cd foldfusion -
Create the virtual environment and install dependencies:
make venv source .venv/bin/activate make install-devUse
make installif you only need the runtime requirements.
-
Review and update
config.tomlwith local paths to data directories and external tool executables. -
Run the pipeline with GNU Make or the CLI:
make run # or foldfusion config.toml -
Find outputs under the directory specified by
output_dir(defaultanalysis/data/foldfusion_output). Logs are saved tofoldfusion_pipeline.log. -
Optionally rebuild reports and plots:
make analysis
log_level,log_file: adjust verbosity and log location.uniprot_ids_file: newline-delimited UniProt accessions to process.output_dir: base directory for per-UniProt results.- Tool executables: absolute paths for DoGSite3, SIENA, LigandExtractor, JAMDA, plus SIENA database settings.
pipeline_concurrency: number of UniProt IDs processed in parallel.- Robustness options: set
resumeandskip_failed_idsto control reruns; enable optional timeouts and memory caps per tool as needed.
Refer to config.toml for the full list of options and defaults. The LaTeX manual in docs/documentation.tex provides deeper operational guidance.
The installed script exposes foldfusion as an entry point:
foldfusion /path/to/config.tomlThe CLI reads the configuration, executes each pipeline stage in order, and writes structured output under the configured directory. Use --help for runtime usage details.
foldfusion/: core pipeline package, including orchestrator, tool wrappers, evaluation utilities, and helpers.main.py: thin executable that forwards to the package entry point.analysis/: generated datasets, notebooks, and figures.docs/: user manual and supporting documentation.scripts/: helper scripts for data preparation and automation.tests/:pytestsuites covering core functionality.Makefile: convenience targets for environment setup, running the pipeline, linting, testing, and analysis regeneration.
- Run tests and coverage:
pytest --cov=foldfusion --cov-report=term-missing. - Lint and type-check:
make lint,mypy foldfusion/. - Apply Ruff and Black formatting via
make fmtif the target exists. - The optional
vizdependency group installs plotting extras for analysis notebooks.
- Download failures: confirm UniProt accessions exist and that network access is available; the fetcher falls back from AlphaFold v4 to v3 automatically.
- Missing donor ligands: inspect the generated
ligand_structure.jsonto confirm ligands are present for the chosen chains. - SIENA database errors: verify the database path exists and that the generator has write permissions; rebuild the database if PDB sources have changed.
- JAMDA crashes or memory errors: increase
jamda_memory_mbor adjust concurrency to reduce peak usage. - Resume markers: remove
Results/<UniProt>/_SUCCESSor_FAILEDunder the output directory to force a rerun for individual proteins.
FoldFusion is released under the MIT License. See LICENSE for details.