Skip to content


finalised presentation
Browse files Browse the repository at this point in the history
alessandrofelder committed Jan 7, 2025
1 parent d98cba6 commit c5d9419
Showing 7 changed files with 196 additions and 47 deletions.
Binary file added img/ANTs-Template-Construction.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/Figure3.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/allen_mouse_annotation.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/allen_mouse_template.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/black_cap_whole.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added img/malkemper-lab.png
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
243 changes: 196 additions & 47 deletions index.qmd
Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
title: A template presentation
subtitle: so much fun
author: SWC Neuroinformatics Unit
title: Parallelising template building
subtitle: with qbatch
author: Alessandro Felder, Niko Sirmpilatze
enabled: true
theme: [default, niu-dark.scss]
logo: img/logo_niu_dark.png
footer: "Edit this footer | 2023-07-26"
theme: [default, niu-light.scss]
logo: img/logo_niu_light.png
footer: "Parallelising template building | 2025-01-08"
slide-number: c
numbers: true
@@ -28,7 +28,7 @@ format:
theme: [default, niu-dark.scss]
logo: img/logo_niu_dark.png
date: "2023-07-05"
date: "2025-01-08"
toc: true
code-overflow: scroll
highlight-style: atom-one
@@ -43,69 +43,218 @@ my-custom-stuff:
my-reuseable-variable: "I can use this wherever I want in the markdown, and change it in only once place :)"

## Contents
## What is an atlas? {.smaller}

:::: {.columns}

::: {.column width="50%"}

::: {.column width="50%"}


[(Neuro-anatomical) Atlases]( consist of a template image and an annotations image (E.g. [The Allen Mouse Brain Common Coordinate Framework](

## Why is it useful?

A standardised annotation, standardised coordinate system to which experimental data can be registered, facilitate data comparability and (re-)combination, and therefore collaboration and data sharing.

[Have accelerated neuroscientific discovery, for species where there is an atlas!](

## How are they made?

![](img/ANTs-Template-Construction.png){width=900 fig-align=center}

Some example slides - [also look at example RevealJS slides in the Quarto docs](
Good template images are an unbiased average of many (~10-1000s) of individuals. This requires many computationally intensive image registrations.

* Non-executable and executable code-blocks
* bullet points with highlighting
* two-column slides
* how to include a slide from a separate MD file
* preview and link to a webpage
## A new template image {.smaller}

## Just a code block, nothing gets executed...
:::: {.columns}

... but there is some fancy highlighting
::: {.column width="30%"}

We made a template image of the Eurasian Blackcap from 18 hemispheres. At 25 um resolution, this ran (sequentially) on the HPC for **~two weeks**.

::: {.column width="70%"}


## A more difficult challenge {.smaller}

:::: {.columns}

::: {.column width="30%"}
We have 45 molerat hemispheres.

Averaging just 6 of them at low-res sequentially took >2 days.

We estimate that for 45 at high-res it would take **months**.

::: {.column width="70%"}

```{.python code-line-numbers="1|3|4-9"}
from pathlib import Path

home_path = Path.home()
if home_path.exists():
data_path = home_path / "data"
# raise some error maybe?
## Registration steps are independent

![](img/ANTs-Template-Construction.png){width=900 fig-align=center}

So maybe we can parallelise?

## The template building tech stack

- `brainglobe-template-builder`
- preprocessing GUIs and high-level bash scripts/slurm jobs
- [**``** from optimisedANTs (CoBrALab)](
- which wraps ANTs (Advanced Normalisation Tools)
- which is built on top of ITK (Insight ToolKit)

## A look inside ``

Write files containing bash commands
echo --clobber \
... # lots of arguments...
then execute them with `qbatch`
qbatch ${_arg_block} \
... # more arguments...

## A code block that's actually executed at render-time
> qbatch is a tool for executing commands in parallel across a compute cluster.
#| echo: true
#| code-fold: true
[`qbatch`]( is also developed by the CoBrA lab.

from pathlib import Path
## Example registration file
``` {.bash}
... # export lots of variables --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-b07_hemi-L_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-b07_hemi-L/sub-b07_hemi-L_res-40um_orig-asr_N4_aligned_padded_use4template/sub-b07_hemi-L_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-b07_hemi-L_res-40um_sym-brain_ --clobber --no-fast --histogram-matching --skip-nonlinear --linear-type affine --no-mask-extract --moving-mask /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-mask.nii.gz --initial-transform /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/transforms/sub-d07_hemi-R_res-40um_sym-brain_0GenericAffine.mat --convergence 1e-9 -o /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/resample/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/derivatives/sub-d07_hemi-R/sub-d07_hemi-R_res-40um_orig-asr_N4_aligned_padded_use4template/sub-d07_hemi-R_res-40um_sym-brain.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/0/average/template_sharpen_shapeupdate.nii.gz /ceph/neuroinformatics/neuroinformatics/atlas-forge/MoleRat/templates/template_sym_res-40um_n-45/affine/1/transforms/sub-d07_hemi-R_res-40um_sym-brain_
... # 43 more calls to antsRegistration

print("Hello world")
## Example registration file (more simplified)
``` {.bash}
... # export lots of variables ... ...
... # 43 more calls to antsRegistration

## You can execute code without showing that you have by using #|echo: false
#| echo: false
## qbatch

- works with slurm 🎉 (and other job managers)
- confusing terminology and limited, but good, documentation 😕

from pathlib import Path
## qbatch

print("Hello world")
``` {.bash code-line-numbers="0-3|7|10-11"}
$ export QBATCH_PPJ=12 # requested processors per job
$ export QBATCH_CHUNKSIZE=$QBATCH_PPJ # commands to run per job
$ export QBATCH_CORES=$QBATCH_PPJ # commonds to run in parallel per job
$ export QBATCH_NODES=1 # number of compute nodes to request for the job, typically for MPI jobs
$ export QBATCH_MEM="0" # requested memory per job
$ export QBATCH_MEMVARS="mem" # memory request variable to set
$ export QBATCH_SYSTEM="pbs" # queuing system to use ("pbs", "sge","slurm", or "local")
$ export QBATCH_NODES=1 # (PBS-only) nodes to request per job
$ export QBATCH_SGE_PE="smp" # (SGE-only) parallel environment name
$ export QBATCH_QUEUE="1day" # Name of submission queue
$ export QBATCH_OPTIONS="" # Arbitrary cluster options to embed in all jobs
$ export QBATCH_SCRIPT_FOLDER=".qbatch/" # Location to generate jobfiles for submission
$ export QBATCH_SHELL="/bin/sh" # Shell to use to evaluate jobfile

{{< include slides/extra_slide.qmd >}}
## Easy wins

## An example image
``` {.bash }
$ export QBATCH_SYSTEM="slurm"
$ export QBATCH_QUEUE="cpu"

Include an image:
## A naive attempt

![](img/bg_logo_wide.png){width=900 fig-align=center}
"Run 15 registrations at a time, on 15 processors, on the same node"
``` {.bash }
$ export QBATCH_PPJ=15
$ export QBATCH_CORES=15
Parallel `antsRegistration` commands compete with each other for processing resources on the same node, making this even slower than a sequential run. This is [built into ITK](

## Optimisation 🎉

## Link and a preview a webpage:
``` {.bash code-line-numbers="1-3"}
$ export QBATCH_PPJ=12 # each antsRegistration call can use 12 processors
$ export QBATCH_CORES=1
Split 45 jobs into chunks of 3, run each chunk of 3 in a separate job (so use 15 nodes), and run sequentially within job.

::: {style="text-align: center; margin-top: 1em"}
[interoperable Python-based tools for computational neuroanatomy]({preview-link="true" style="text-align: center"}
Massive speed-up.

## Use a variable several times
## More optimisation 🎉

Variables defined in the metadata is re-useable anywhere
Exclude two of our CPU nodes that are a lot slower (and weirdly named starting with "gpu")

``` {.bash }
export QBATCH_OPTIONS="--exclude=gpu-380-24,gpu-380-25"

## Customise time-outs for larger jobs
Expand default wall times for short, medium and long jobs

``` {.bash code-line-numbers="8-10"}
bash --output-dir "${working_dir}" \
--starting-target first \
--stages rigid,similarity,affine,nlin \
--masks "${working_dir}/mask_paths.txt" \
--average-type "${average_type}" \
--average-prog "${average_prog}" \
--reuse-affines \
--walltime-short "01:30:00" \
--walltime-linear "02:15:00" \
--walltime-nonlinear "13:30:00"\
--no-dry-run \

## Struggles ⚠️
- time to understand brief docs
- antsRegistration from ants is built to run across processors on a node (empirically verified)
- can lead to massive slowdowns with threads from different antsRegistration processes competing
- doesn't seem maintained, last commit 4 years ago (but "works"?!)

## Tricks 🪄
* exclude some nodes
* increase slurm memory
* increase walltimes

## Conclusions {.smaller}

:::: {.columns}

::: {.column width="30%"}
* big step forward in template-making at NIU
* 45 mole rats for the price of 3 🤑
* maybe useful elsewhere
* atlas packaging? (maybe even from Python)

::: {.column width="70%"}

* {{< meta >}}
* {{< meta >}}

0 comments on commit c5d9419

Please sign in to comment.