[ arXiv | Blog | Data | Cite ]
HESCAPE s a large-scale, pan-organ benchmark for cross-modal contrastive pretraining in spatial transcriptomics (6 gene panels, 54 donors). We evaluate state-of-the-art image and gene encoders across multiple pretraining strategies on two downstream tasks: gene-mutation classification and gene-expression prediction. We find alignment is driven primarily by the gene encoder, with spatially pretrained gene models outperforming non-spatial and simple baselines. Paradoxically, contrastive pretraining improves mutation classification but degrades expression prediction, likely due to batch effects. HESCAPE provides standardized datasets, evaluation protocols, and tools to advance batch-robust multimodal learning.
- Install (uv):
curl -LsSf https://astral.sh/uv/install.sh | sh git clone https://github.com/peng-lab/hescape.git cd hescape uv sync
- Load a small sample dataset (make sure you have obtained access to hescape-pyarrow)
from datasets import load_dataset ds = load_dataset("Peng-AI/hescape-pyarrow", name="human-lung-healthy-panel", split="train", num_proc=4) print(ds) # peek at the attributes
- Run a 60-second smoke test training (single GPU, local)
uv run experiments/hescape_pretrain/train.py \ --config-name=local_config.yaml \ launcher=local \ training.lightning.trainer.max_steps=200 \ training.lightning.trainer.devices=1 \ datamodule.batch_size=8 \ datamodule.num_workers=4
Note
The launcher=local parameter is used to run the training locally. This can be useful for debugging or running experiments on a local machine with 1+ gpu. For distributed training on HPC with Slurm, take a look at running_sweeps.md.
Supported: uv (recommended), Conda, pip(PyPI):
-
curl -LsSf https://astral.sh/uv/install.sh | sh git clone https://github.com/peng-lab/hescape.git cd hescape uv sync # contributions are welcome! uv pip install -e ".[dev]"
-
conda create -n "hescape" python=3.11 conda activate hescape git clone https://github.com/peng-lab/hescape.git cd hescape pip install -e .
We provide 5 independent datasets to use with HESCAPE, each based on a specific 10x Xenium gene panel. These datasets are loaded as follows:
from datasets import load_dataset
# Example: load the human breast panel
ds = load_dataset(
"Peng-AI/hescape-pyarrow",
name="human-breast-panel",
split="train",
streaming=True,
# cache_dir="/path/to/cache",
# num_proc=4
)
print(ds)While you can stream the data to perform training, it is recommended to store the dataset locally for faster access and easier management. As you run the training script, the dataset gets downloaded automatically to the default HuggingFace cache.
To store the dataset locally for other uses, disable streaming by setting streaming=False and specify a cache_dir to store the dataset locally. You can also specify a number of processes to use for data loading by setting num_proc in the load_dataset function.
Check the huggingface hescape-pyarrow DatasetCard for more information
The HESCAPE repository takes pretrained weights for pre-built images and genes to train the model. The directory structure is crucial for the training process to work correctly. The repository is structured as follows:
├── hescape (from github)
│ ├── README.md
│ ├── data
│ ├── experiments
│ ├── notebooks
│ ├── pyproject.toml
│ ├── src
│ ├── tests
│ ├── uv.lock
│ └── ...
├── pretrain_weights
│ ├── gene
│ │ ├── nicheformer
│ │ ├── drvi
│ │ └── <predefined gene models> ...
│ └── image
│ ├── h0-mini
│ ├── uni
│ └── <predefined image models> ...All DRVI models can be downloaded from huggingface. For remaining image and gene models, we direct users to obtain them from corresponding repositories and then place the weights in the pretrain_weights/(image | gene) directory respectively
- Single-GPU local
source .venv/bin/activate uv run experiments/hescape_pretrain/train.py \ --config-name=local_config.yaml \ launcher=local \ model.litmodule.img_enc_name=h0-mini \ model.litmodule.gene_enc_name=drvi \ training.lightning.trainer.devices=1 \ datamodule.batch_size=256 \ datamodule.num_workers=8 - Multi-GPU local DDP(Lightning)
uv run experiments/hescape_pretrain/train.py \ --config-name=local_config.yaml \ launcher=local \ training.lightning.trainer.devices=4 \ training.lightning.trainer.strategy=ddp\ datamodule.batch_size=256 \ datamodule.num_workers=8
- Slurm example (Quick recipe)
srun --nodes=1 --ntasks-per-node=4 --cpus-per-task=12 --gres=gpu:4 \ --mem=480G --time=02:00:00 --partition=<part> ... \ bash -lc ' export WANDB_MODE=offline export HYDRA_FULL_ERROR=1 export CUDA_VISIBLE_DEVICES=0,1,2,3 export NCCL_DEBUG=INFO source .venv/bin/activate uv run experiments/hescape_pretrain/train.py \ --config-name=default_config.yaml '
Training is launched via Hydra-based configuration with default configurations stored in local_config.yaml.
To modify the default parameters, you can modify the local_config.yaml file directly or override arguments from the CLI as shown above. The paramteres that can be modified are listed in the next section.
Our framework uses Hydra for flexible experiment configuration.
| Config Key | Description | Values |
|---|---|---|
model.litmodule.img_enc_name |
Vision encoder backbone | h0-mini, gigapath, ctranspath, uni, conch, custom |
model.litmodule.gene_enc_name |
Gene encoder architecture | mlp, scfoundation, nicheformer, drvi, custom |
model.litmodule.img_proj |
Projection head for image features | mlp, linear, transformer |
model.litmodule.gene_proj |
Projection head for gene features | mlp, linear |
model.litmodule.loss |
Contrastive loss type | CLIP, SIGLIP |
model.litmodule.optimizer.lr |
Learning rate | 1e-3, 3e-4, etc. |
model.litmodule.temperature |
CLIP temperature parameter | 0.05, 0.07, etc. |
training.train / training.test |
Toggle training or test mode | true, false |
training.lightning.trainer.max_steps |
Number of steps during training | 20_000 etc. |
datamodule.batch_size |
Batch size for Dataloader | 64, 256, etc. |
datamodule.num_workers |
Subprocesses to use for data loading | 4, 8, etc. |
Benchmark Sweeps with different parameters are only possible in a slurm environment with a ddp setup. Hydra automatically runs grid search over all specified values. For example:
model.litmodule.img_enc_name: h0-mini, uni
model.liftmodule.gene_enc_name: drvi, nicheformer
This will run all combinations:
h0-mini + drvi, h0-mini + nicheformer, uni + drvi, uni + nicheformer
Running Sweeps have been explained in running_sweeps.md.
We provide a Jupyter notebook image_model_loading.ipynb that demonstrates how to load a pretrained model and extract features from histology images for mutation and gene expression prediction.
Test Recall@5 subset for both Image-to-Gene (I2G) and Gene-to-Image (G2I) tasks across different tissue panels. Note: “—” indicates out-of-memory during training. Bold = best result, Underlined = second-best.
| Model | 5K I2G | 5K G2I | Multi-Tissue I2G | Multi-Tissue G2I | ImmOnc I2G | ImmOnc G2I | Colon I2G | Colon G2I | Breast I2G | Breast G2I | Lung I2G | Lung G2I |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| mlp-gigapath | 0.257 | 0.257 | 0.297 | 0.215 | 0.179 | 0.132 | 0.313 | 0.297 | 0.390 | 0.288 | 0.510 | 0.493 |
| mlp-optimus | 0.235 | 0.235 | 0.209 | 0.153 | 0.173 | 0.119 | 0.296 | 0.291 | 0.309 | 0.235 | 0.358 | 0.336 |
| scfoundation-gigapath | — | — | — | — | 0.251 | 0.207 | 0.294 | 0.249 | 0.348 | 0.365 | 0.590 | 0.543 |
| scfoundation-optimus | — | — | — | — | 0.206 | 0.171 | 0.315 | 0.272 | 0.388 | 0.377 | 0.427 | 0.345 |
| nicheformer-gigapath | 0.241 | 0.255 | 0.274 | 0.285 | 0.247 | 0.267 | 0.261 | 0.269 | 0.414 | 0.447 | 0.473 | 0.554 |
| nicheformer-optimus | 0.243 | 0.273 | 0.261 | 0.277 | 0.212 | 0.215 | 0.290 | 0.278 | 0.418 | 0.451 | 0.424 | 0.498 |
| drvi-gigapath | 0.315 | 0.359 | 0.322 | 0.417 | 0.344 | 0.334 | 0.388 | 0.394 | 0.461 | 0.436 | 0.649 | 0.709 |
| drvi-optimus | 0.299 | 0.321 | 0.271 | 0.342 | 0.287 | 0.267 | 0.412 | 0.397 | 0.465 | 0.461 | 0.562 | 0.612 |
| drvi-uni | 0.322 | 0.341 | 0.312 | 0.396 | 0.326 | 0.318 | 0.404 | 0.401 | 0.450 | 0.436 | 0.610 | 0.678 |
We provide more details about our full collection of results for all multi-modal combinations here.
- 02.09.25: 6 new datasets released. You can find them on huggingface.
- Benchmark your own model
- Documentation
- New Xenium datasets
- New Visium datasets
- GitHub issues are prefered
- If GitHub issues are not possible, email
rushin.gindra@helmholtz-munich.de
- We are open to contributions from the multi-modal community.
- Feel free to reach out with a pull-request or via email if you have a prospective idea and need some assistance with implementing it.
The project was built as an adaptation of functions from cool repositories such as OpenClip, HuggingFace Datasets and Timm . We thank all authors and open-source developers for their contribution.
Gindra, R. H., Palla, G., Nguyen, M., Wagner, S. J., Tran, M., Theis, F. J., Saur, D., Crawford, L., & Peng, T. A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics. arXiv preprint arXiv:2508.01490, August 2025.
@misc{gindra2025largescalebenchmarkcrossmodallearning,
title={A Large-Scale Benchmark of Cross-Modal Learning for Histology and Gene Expression in Spatial Transcriptomics},
author={Rushin H. Gindra and Giovanni Palla and Mathias Nguyen and Sophia J. Wagner and Manuel Tran and Fabian J Theis and Dieter Saur and Lorin Crawford and Tingying Peng},
year={2025},
eprint={2508.01490},
archivePrefix={arXiv},
primaryClass={q-bio.GN},
url={https://arxiv.org/abs/2508.01490},
}
