Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/pr-test.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
name: Test WSI to embedding consistency
name: Test suite

on:
pull_request:
Expand Down
2 changes: 2 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
cmake \
zlib1g-dev \
libnuma1 \
curl \
vim screen \
zip unzip \
Expand Down Expand Up @@ -104,6 +105,7 @@ ENV PATH="/home/user/.local/bin:${PATH}"
RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
zlib1g-dev \
libnuma1 \
curl \
vim screen \
zip unzip \
Expand Down
1 change: 1 addition & 0 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ WORKDIR /opt/app
RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
zlib1g-dev \
libnuma1 \
curl \
cmake \
vim screen \
Expand Down
156 changes: 0 additions & 156 deletions Dockerfile.coding-agents

This file was deleted.

32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,38 +21,37 @@ pip install "slide2vec[models]"
## Python API

```python
from slide2vec import Model, PreprocessingConfig
from slide2vec import Model
from slide2vec.utils.config import hf_login

model = Model.from_pretrained("virchow2", level="tile")
preprocessing = PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
tissue_threshold=0.1,
)
embedded = model.embed_slide(
"/path/to/slide.svs",
preprocessing=preprocessing,
)
hf_login()

model = Model.from_preset("virchow2")
embedded = model.embed_slide("/path/to/slide.svs")

tile_embeddings = embedded.tile_embeddings
coordinates = embedded.coordinates
```

By default, `ExecutionOptions()` uses all available GPUs. Set `ExecutionOptions(num_gpus=4)` when you want to cap the sharding explicitly.

Use `Pipeline(...)` for manifest-driven batch processing when you want artifacts written to disk instead of only in-memory outputs:

```python
from slide2vec import ExecutionOptions, Pipeline
from slide2vec import ExecutionOptions, Pipeline, PreprocessingConfig

pipeline = Pipeline(
model=model,
preprocessing=preprocessing,
preprocessing=PreprocessingConfig(
target_spacing_um=0.5,
target_tile_size_px=224,
tissue_threshold=0.1,
),
execution=ExecutionOptions(output_dir="outputs/demo"),
)
result = pipeline.run(manifest_path="/path/to/slides.csv")
```

By default, `ExecutionOptions()` uses all available GPUs. Set `ExecutionOptions(num_gpus=4)` when you want to cap the sharding explicitly.

### Input Manifest

Manifest-driven runs use the schema below. `mask_path` and `spacing_at_level_0` are optional.
Expand Down Expand Up @@ -81,7 +80,7 @@ The package writes explicit artifact directories:

### Supported Models

`slide2vec` currently ships preset configs for 10 tile-level models and 3 slide-level models.
`slide2vec` currently ships preset configs for 20 tile-level models and 3 slide-level models.
For the full catalog and preset names, see [`docs/models.md`](docs/models.md).

## CLI
Expand Down Expand Up @@ -115,4 +114,5 @@ docker run --rm -it \

- [`docs/cli.md`](docs/cli.md) for the config-driven CLI guide
- [`docs/python-api.md`](docs/python-api.md) for the detailed API reference
- [`tutorials/api_walkthrough.ipynb`](tutorials/api_walkthrough.ipynb) for a notebook walkthrough of the API
- [`docs/models.md`](docs/models.md) for the full supported-model catalog
31 changes: 31 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,37 @@ Common overrides:

## GPU Behavior

### GPU-accelerated tile decoding (`gpu_decode`)

When using the on-the-fly cucim backend (`tiling.on_the_fly: true`, `tiling.backend: cucim` or `auto`), slide2vec can decode tiles on the GPU during embedding.

Enable it in your config:

```yaml
tiling:
gpu_decode: true # default
```

Or override from the command line:

```shell
python -m slide2vec --config-file /path/to/config.yaml tiling.gpu_decode=true
```

When enabled, two things happen:
1. `ENABLE_CUSLIDE2=1` is set in the process environment before CuCIM is imported, activating NVIDIA's cuSlide2 GPU-accelerated SVS/TIFF reader.
2. `device="cuda"` is passed to cucim's `read_region`, so batch JPEG decoding runs on the GPU via nvImageCodec.

This can give a significant speedup (~3.8× for batch decoding) on `.svs` and `.tif` files.

**Note:** decoded pixels are currently converted back to CPU via `np.asarray` before being fed into the DataLoader. The speedup is real (GPU decoding is faster than CPU) but the data still round-trips through CPU before reaching the model. A true zero-copy path would require bypassing the DataLoader entirely and is tracked in `ideas-to-explore.md`.

**Requirements:** `libnuma1` must be installed and `nvImageCodec` must be available (included with `cucim-cu12`). If the installed CuCIM version does not support `device="cuda"`, slide2vec falls back silently to CPU decoding.

**Default:** `true` — disable with `tiling.gpu_decode: false` if needed.

### GPU count

By default, the CLI uses all available GPUs.

To cap GPU usage, set:
Expand Down
Loading
Loading