Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
5844aa1
various-fixes
clemsgrs Mar 18, 2026
3db9312
drop benchmark bookkeeping from pr
clemsgrs Mar 18, 2026
4c5dff4
[codex] Optimize embedding loader pipeline
clemsgrs Mar 19, 2026
f559bf4
improve ux
clemsgrs Mar 19, 2026
f80c55e
separate preprocessing from embedding backend
clemsgrs Mar 19, 2026
01e8bdf
Read embedding tiles from tar archives
clemsgrs Mar 20, 2026
3480913
make slide2vec default aligned with new hs2p tile store format
clemsgrs Mar 20, 2026
0aec7b3
Merge branch 'codex/various-fixes' of https://github.com/clemsgrs/sli…
clemsgrs Mar 20, 2026
a3c1e67
add on-the-fly CuCIM tile reading with super tile support
clemsgrs Mar 21, 2026
84d16f5
fix review issues in on-the-fly tile reading
clemsgrs Mar 21, 2026
c2278fc
remove unused tile_indices and crop_offsets from _SuperTile
clemsgrs Mar 21, 2026
899ed9a
add turbojpeg + nvimgcode register to dockerfiles
clemsgrs Mar 21, 2026
85b9622
install libjpeg-turbo 3.x in all Dockerfiles
clemsgrs Mar 21, 2026
31fc1a6
add slide2vec[cucim] extra for on-the-fly tile reading deps
clemsgrs Mar 21, 2026
2154426
fix failing tests
clemsgrs Mar 21, 2026
a30bc8a
adapt to hs2p 2.4.1: use plan.tile_indices in supertile index build
clemsgrs Mar 22, 2026
e465997
expose jpeg_backend option and use pil in consistency test
clemsgrs Mar 22, 2026
b7dbcd4
decouple cucim and DataLoader worker counts in on-the-fly path
clemsgrs Mar 22, 2026
619bde8
separate preprocessing and DataLoader worker counts
clemsgrs Mar 22, 2026
e2ffbd3
various changes
clemsgrs Mar 23, 2026
e015a71
Default backend to auto
clemsgrs Mar 23, 2026
a72304d
Untrack local documentation log
clemsgrs Mar 23, 2026
e79622d
make cucim workers a speed config arg
clemsgrs Mar 23, 2026
bd26f64
add missing data file
clemsgrs Mar 23, 2026
91b62a6
fix output consistency test: align resize interpolation across tile r…
clemsgrs Mar 23, 2026
7317b17
add leftover test
clemsgrs Mar 23, 2026
9b3d44e
Enforce recommended model input settings
clemsgrs Mar 23, 2026
d62ceaa
trim model configs
clemsgrs Mar 23, 2026
6991a07
strip comments
clemsgrs Mar 23, 2026
0fa61b7
Align model defaults and precision handling
clemsgrs Mar 23, 2026
f228768
Allow CPU runs to bypass precision checks
clemsgrs Mar 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .dockerignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
data/
output/
docker/
outputs/
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -168,3 +168,4 @@ archive/
tasks/
docs/documentation.md
docs/20*-*.md
data/
27 changes: 23 additions & 4 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@
ARG UBUNTU_VERSION=22.04
ARG CUDA_MAJOR_VERSION=11.8.0
ARG CUDNN_MAJOR_VERSION=8
ARG CUDA_MAJOR_VERSION=12.8.1

########################
# Stage 1: build stage #
########################
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn${CUDNN_MAJOR_VERSION}-devel-ubuntu${UBUNTU_VERSION} AS build
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn-devel-ubuntu${UBUNTU_VERSION} AS build

ARG USER_UID=1001
ARG USER_GID=1001
Expand All @@ -29,6 +28,7 @@ ENV PATH="/home/user/.local/bin:${PATH}"

RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
cmake \
zlib1g-dev \
curl \
vim screen \
Expand All @@ -40,6 +40,16 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# libjpeg-turbo 3.x (required by PyTurboJPEG>=2)
ARG LIBJPEG_TURBO_VERSION=3.1.0
RUN curl -fsSL https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/${LIBJPEG_TURBO_VERSION}/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}.tar.gz \
| tar xz -C /tmp \
&& cd /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION} \
&& cmake -G"Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr/local . \
&& make -j"$(nproc)" && make install \
&& ldconfig \
&& rm -rf /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}

WORKDIR /opt/app/

# core deps live in requirements.in; model runtime extras live in requirements-models.in
Expand Down Expand Up @@ -70,7 +80,7 @@ RUN python -m pip install 'flash-attn>=2.7.1,<=2.8.0' --no-build-isolation
##########################
# Stage 2: runtime stage #
##########################
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn${CUDNN_MAJOR_VERSION}-runtime-ubuntu${UBUNTU_VERSION}
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn-runtime-ubuntu${UBUNTU_VERSION}

ARG USER_UID=1001
ARG USER_GID=1001
Expand Down Expand Up @@ -104,6 +114,11 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# libjpeg-turbo 3.x (copied from build stage)
COPY --from=build /usr/local/lib/libjpeg* /usr/local/lib/
COPY --from=build /usr/local/lib/libturbojpeg* /usr/local/lib/
RUN ldconfig

# install ASAP
ARG ASAP_URL=https://github.com/computationalpathologygroup/ASAP/releases/download/ASAP-2.2-(Nightly)/ASAP-2.2-Ubuntu2204.deb
RUN apt-get update && curl -L ${ASAP_URL} -o /tmp/ASAP.deb && apt-get install --assume-yes /tmp/ASAP.deb && \
Expand All @@ -116,6 +131,10 @@ RUN apt-get update && curl -L ${ASAP_URL} -o /tmp/ASAP.deb && apt-get install --
COPY --from=build /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
COPY --from=build /usr/local/bin /usr/local/bin

# register libnvimgcodec so cucim can use GPU-accelerated JPEG decoding
RUN echo "/usr/local/lib/python3.10/dist-packages/nvidia/nvimgcodec" > /etc/ld.so.conf.d/nvimgcodec.conf && \
ldconfig

# copy app code
COPY --from=build /opt/app /opt/app

Expand Down
15 changes: 15 additions & 0 deletions Dockerfile.ci
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
zlib1g-dev \
curl \
cmake \
vim screen \
zip unzip \
git \
Expand All @@ -31,6 +32,16 @@ RUN apt-get update && apt-get install -y --no-install-recommends \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# libjpeg-turbo 3.x (required by PyTurboJPEG>=2)
ARG LIBJPEG_TURBO_VERSION=3.1.0
RUN curl -fsSL https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/${LIBJPEG_TURBO_VERSION}/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}.tar.gz \
| tar xz -C /tmp \
&& cd /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION} \
&& cmake -G"Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr/local . \
&& make -j"$(nproc)" && make install \
&& ldconfig \
&& rm -rf /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}

# ASAP
ARG ASAP_URL=https://github.com/computationalpathologygroup/ASAP/releases/download/ASAP-2.2-(Nightly)/ASAP-2.2-Ubuntu2204.deb
RUN set -eux; \
Expand Down Expand Up @@ -65,5 +76,9 @@ COPY --chown=user:user LICENSE /opt/app/LICENSE

RUN python -m pip install /opt/app

# register libnvimgcodec so cucim can use GPU-accelerated JPEG decoding
RUN echo "/usr/local/lib/python3.10/dist-packages/nvidia/nvimgcodec" > /etc/ld.so.conf.d/nvimgcodec.conf && \
ldconfig

USER user
WORKDIR /opt/app
156 changes: 156 additions & 0 deletions Dockerfile.coding-agents
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
ARG UBUNTU_VERSION=22.04
ARG CUDA_MAJOR_VERSION=12.8.1

########################
# Stage 1: build stage #
########################
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn-devel-ubuntu${UBUNTU_VERSION} AS build

ARG USER_UID=1001
ARG USER_GID=1001

# ensures that Python output to stdout/stderr is not buffered: prevents missing information when terminating
ENV PYTHONUNBUFFERED=1
ENV DEBIAN_FRONTEND=noninteractive TZ=Europe/Amsterdam

USER root

RUN groupadd --gid ${USER_GID} user \
&& useradd -m --no-log-init --uid ${USER_UID} --gid ${USER_GID} user

# create input/output directory
RUN mkdir /input /output && \
chown user:user /input /output

# set /home/user as working directory
WORKDIR /home/user
ENV PATH="/home/user/.local/bin:${PATH}"

RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
cmake \
zlib1g-dev \
curl \
vim screen \
zip unzip \
git \
openssh-server \
python3-pip python3-dev python-is-python3 \
&& mkdir /var/run/sshd \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# libjpeg-turbo 3.x (required by PyTurboJPEG>=2)
ARG LIBJPEG_TURBO_VERSION=3.1.0
RUN curl -fsSL https://github.com/libjpeg-turbo/libjpeg-turbo/releases/download/${LIBJPEG_TURBO_VERSION}/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}.tar.gz \
| tar xz -C /tmp \
&& cd /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION} \
&& cmake -G"Unix Makefiles" -DCMAKE_INSTALL_PREFIX=/usr/local . \
&& make -j"$(nproc)" && make install \
&& ldconfig \
&& rm -rf /tmp/libjpeg-turbo-${LIBJPEG_TURBO_VERSION}

WORKDIR /opt/app/

# core deps live in requirements.in; model runtime extras live in requirements-models.in
RUN python -m pip install --upgrade pip setuptools pip-tools \
&& rm -rf /home/user/.cache/pip

# install slide2vec
COPY --chown=user:user requirements.in /opt/app/requirements.in
COPY --chown=user:user requirements-models.in /opt/app/requirements-models.in
RUN python -m pip install \
--no-cache-dir \
--no-color \
--requirement /opt/app/requirements-models.in \
&& rm -rf /home/user/.cache/pip

COPY --chown=user:user slide2vec /opt/app/slide2vec
COPY --chown=user:user setup.py /opt/app/setup.py
COPY --chown=user:user setup.cfg /opt/app/setup.cfg
COPY --chown=user:user pyproject.toml /opt/app/pyproject.toml
COPY --chown=user:user MANIFEST.in /opt/app/MANIFEST.in
COPY --chown=user:user README.md /opt/app/README.md
COPY --chown=user:user LICENSE /opt/app/LICENSE

RUN python -m pip install /opt/app
RUN python -m pip install 'flash-attn>=2.7.1,<=2.8.0' --no-build-isolation


##########################
# Stage 2: runtime stage #
##########################
FROM nvidia/cuda:${CUDA_MAJOR_VERSION}-cudnn-runtime-ubuntu${UBUNTU_VERSION}

ARG USER_UID=1001
ARG USER_GID=1001

ENV PYTHONUNBUFFERED=1
ENV DEBIAN_FRONTEND=noninteractive TZ=Europe/Amsterdam

USER root

RUN groupadd --gid ${USER_GID} user \
&& useradd -m --no-log-init --uid ${USER_UID} --gid ${USER_GID} user

# create input/output directory
RUN mkdir /input /output && \
chown user:user /input /output

# set /home/user as working directory
WORKDIR /home/user
ENV PATH="/home/user/.local/bin:${PATH}"

RUN apt-get update && apt-get install -y --no-install-recommends \
libtiff-dev \
zlib1g-dev \
curl \
vim screen \
zip unzip \
git \
openssh-server \
python3-pip python3-dev python-is-python3 \
&& mkdir /var/run/sshd \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/*

# libjpeg-turbo 3.x (copied from build stage)
COPY --from=build /usr/local/lib/libjpeg* /usr/local/lib/
COPY --from=build /usr/local/lib/libturbojpeg* /usr/local/lib/
RUN ldconfig

RUN curl -fsSL https://deb.nodesource.com/setup_lts.x | bash - \
&& apt-get install -y --no-install-recommends nodejs

# install ASAP
ARG ASAP_URL=https://github.com/computationalpathologygroup/ASAP/releases/download/ASAP-2.2-(Nightly)/ASAP-2.2-Ubuntu2204.deb
RUN apt-get update && curl -L ${ASAP_URL} -o /tmp/ASAP.deb && apt-get install --assume-yes /tmp/ASAP.deb && \
SITE_PACKAGES=`python3 -c "import sysconfig; print(sysconfig.get_paths()['purelib'])"` && \
printf "/opt/ASAP/bin/\n" > "${SITE_PACKAGES}/asap.pth" && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*

# install codex
RUN npm i -g @openai/codex

# install claude
RUN curl -fsSL https://claude.ai/install.sh | bash

# copy Python libs & entrypoints from build stage (includes flash-attn, your deps, ASAP .pth)
COPY --from=build /usr/local/lib/python3.10/dist-packages /usr/local/lib/python3.10/dist-packages
COPY --from=build /usr/local/bin /usr/local/bin

# register libnvimgcodec so cucim can use GPU-accelerated JPEG decoding
RUN echo "/usr/local/lib/python3.10/dist-packages/nvidia/nvimgcodec" > /etc/ld.so.conf.d/nvimgcodec.conf && \
ldconfig

# copy app code
COPY --from=build /opt/app /opt/app

# expose port for ssh and jupyter
EXPOSE 22 8888

WORKDIR /opt/app/

# switch to user
USER user
120 changes: 120 additions & 0 deletions docs/benchmarking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,120 @@
# Benchmarking

`slide2vec` includes a benchmark runner for end-to-end embedding throughput sweeps across different GPU environments and multiple model configs.

The script samples a balanced subset of your manifest, runs untimed warmups plus repeated measured trials, tunes only:

- `model.batch_size`
- `speed.num_workers_embedding`

It keeps the rest of each model config fixed, disables previews / resume / Weights & Biases, and writes:

- `trial_results.csv`
- `best_results.csv`
- `throughput_by_gpu.png`
- `throughput_by_gpu_and_size.png`
- `tuning_<gpu>_<model>.png`

Default sweep values:

- `--n-slides 0` to benchmark the full manifest by default
- `--batch-sizes 1 32 64 128 256`
- `--embedding-workers 4 8 16 32 64 128`

## Basic Usage

```shell
python scripts/benchmark_embedding_throughput.py \
--config-files /path/to/pathojepa-small.yaml /path/to/pathojepa-base.yaml /path/to/pathojepa-large.yaml \
--model-labels PathoJEPA-S PathoJEPA-B PathoJEPA-L \
--size-labels S B L \
--csv /path/to/slides.csv \
--gpu-label "A100-80GB" \
--batch-sizes 1 32 64 128 256 \
--embedding-workers 4 8 16 32 64 128 \
--repeat 3 \
--n-slides 0 \
--output-dir /tmp/slide2vec-benchmark
```

Notes:

- the benchmark measures the full `Pipeline.run(...)` path, including tiling
- stage timings for tiling, embedding, and aggregation are also recorded when progress events are available
- embedding trials also record per-batch timing summaries from `embedding.batch.timing` events, including mean loader wait, mean ready-wait after async copy/preprocess, mean preprocess time, mean forward time, and a loader-wait fraction
- every compared model reuses the same sampled manifest within a run
- each config gets an untimed warmup before measured repeats
- benchmark config files are loaded through the same default-merge and validation path as the regular CLI, so omitted standard keys inherit the usual defaults

Single-model usage is still supported:

```shell
python scripts/benchmark_embedding_throughput.py \
--config-file /path/to/model-config.yaml \
--csv /path/to/slides.csv \
--gpu-label "A100-80GB"
```

In multi-model mode:

- `--config-files` is the primary interface
- `--model-labels` must match the config count
- `--size-labels` must match the config count
- size labels are explicit metadata like `S`, `B`, `L`, `XL`; the script does not infer them

## Merging GPU Runs

Run the benchmark once per GPU environment, then regenerate the cross-GPU comparison chart from multiple `trial_results.csv` files:

```shell
python scripts/benchmark_embedding_throughput.py \
--chart-only \
/tmp/a100-benchmark/trial_results.csv \
/tmp/h100-benchmark/trial_results.csv \
--output-dir /tmp/slide2vec-benchmark-merged
```

The merged outputs include:

- `throughput_by_gpu.png` for best tuned model entries per GPU
- `throughput_by_gpu_and_size.png` for grouped GPU-vs-size bars, choosing the winning model for each `(gpu, size)` bucket

Use `--copy-locally` when your slide source lives on network storage and you want to reduce I/O variance during the sweep.

## End-to-End Path Comparison

For a direct full-pipeline comparison between:

- tar-based embedding (`on_the_fly=false`)
- on-the-fly `wsd_single` embedding (`backend=asap`, `use_supertiles=false`)
- on-the-fly `cucim_supertiles` embedding

use:

```shell
python scripts/benchmark_end_to_end_paths.py \
--csv /path/to/slides.csv \
--config-file /path/to/model-config.yaml \
--batch-size 256 \
--repeat 1 \
--output-dir /tmp/slide2vec-end-to-end
```

The model is taken from `--config-file`; the script does not accept a separate `--model` override.

This benchmark runs the three paths independently from raw slide input to final embedding artifact and writes:

- `trial_results.csv`
- `summary.csv`
- `end_to_end_by_path.png`
- `stage_breakdown.png`
- `embedding_subpath_breakdown.png`

The summary also now includes an embedding subpath split derived from per-batch timing
events:

- `mean_data_pipeline_seconds`: timed embedding seconds spent in loader wait, ready
wait, and preprocessing
- `mean_forward_seconds`: timed embedding seconds spent in model forward
- `mean_data_pipeline_fraction` / `mean_forward_fraction`: shares of the timed
embedding batches accounted for by those two buckets
Loading
Loading