Use the CLI when you want config-driven, manifest-based batch processing with artifacts written to disk.
If you are deciding between the Python API and the CLI:
- use the Python API for interactive in-memory work
- use the CLI for repeatable manifest-driven batch runs that save artifacts to disk
The Python API is usually the better fit for:
- interactive analysis in notebooks
- embedding one or a few slides directly in memory
- downstream workflows that immediately consume arrays or tensors
The CLI is usually the better fit for:
- batch processing many slides from a manifest
- reproducible config-file-driven runs
- generating on-disk embedding artifacts for later use
- running tiling-only or full preprocessing + embedding jobs from the terminal
python -m slide2vec --config-file /path/to/config.yamlThis command:
- loads the config file
- builds a
Model,PreprocessingConfig, andPipeline - runs
Pipeline.run(manifest_path=cfg.csv)
The manifest must use the hs2p schema. mask_path and spacing_at_level_0 are optional.
sample_id,image_path,mask_path,spacing_at_level_0
slide-1,/path/to/slide-1.svs,/path/to/mask-1.png,0.25
slide-2,/path/to/slide-2.svs,,Use spacing_at_level_0 when you need to override the slide's native level-0 spacing metadata for tiling.
Set csv: in your config file to point to this manifest.
The main bundled defaults live under:
slide2vec/configs/preprocessing/default.yamlslide2vec/configs/models/default.yamlslide2vec/configs/models/*.yaml
In practice, the config controls:
- which model preset to use
- preprocessing/tiling parameters
- output directory
- batch size, workers, mixed precision, and GPU count
- whether to save tile artifacts alongside slide-level outputs
You can override config values from the command line with path.key=value syntax:
python -m slide2vec \
--config-file /path/to/config.yaml \
output_dir=/tmp/slide2vec-run \
speed.num_gpus=4 \
model.name=virchow2 \
model.level=regionCommon overrides:
output_dir=/path/to/outputspeed.num_gpus=4speed.num_workers_embedding=8model.name=...model.level=tile|region|slide
--run-on-cpuForces CPU inference and disables mixed precision.--tiling-onlyRuns preprocessing/tiling without feature extraction.--output-dir /path/to/outputOverridesoutput_dirfrom the config file.--skip-datetimeSkips the timestamp-based run subdirectory suffix.
By default, the CLI uses all available GPUs.
To cap GPU usage, set:
python -m slide2vec --config-file /path/to/config.yaml speed.num_gpus=4If you pass --run-on-cpu, the CLI uses CPU execution instead.
The CLI writes explicit artifact directories under the run output directory:
tile_embeddings/<sample_id>.ptor.npztile_embeddings/<sample_id>.meta.jsonslide_embeddings/<sample_id>.ptor.npzslide_embeddings/<sample_id>.meta.json- optional
slide_latents/<sample_id>.ptor.npz process_list.csv- the resolved saved config file for the run
logs/with the main log plus distributed worker stdout/stderr captures when multi-GPU workers are used
When stdout is an interactive terminal, the CLI shows live rich progress for:
- tiling discovery and completion
- overall slide embedding progress
- current-slide tile or region progress
- slide-level aggregation when the model pools tile features into slide embeddings
When stdout is not interactive, the CLI falls back to plain text stage updates and summaries.
Full batch run:
python -m slide2vec --config-file /path/to/config.yamlFull batch run with limited GPU count:
python -m slide2vec --config-file /path/to/config.yaml speed.num_gpus=2Tiling only:
python -m slide2vec --config-file /path/to/config.yaml --tiling-onlyCPU run:
python -m slide2vec --config-file /path/to/config.yaml --run-on-cpu