clemsgrs · clemsgrs · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026 · Apr 11, 2026
diff --git a/docs/models.md b/docs/models.md
@@ -8,7 +8,7 @@ The canonical model presets are registered in code and documented below. Use the
 
 Preset-specific behavior lives in registry metadata and, where supported, `model.output_variant`.
 
-## Tile-level models (17)
+## Tile-level models (18)
 
 | Preset | Model | Supported Spacing (um) | Notes |
 | --- | --- | --- | --- |
@@ -20,6 +20,7 @@ Preset-specific behavior lives in registry metadata and, where supported, `model
 | `h0-mini` | [H0-mini](https://huggingface.co/bioptimus/H0-mini) | `0.5` | Supports `output_variant="cls"` or `"cls_patch_mean"` |
 | `hibou-b` | [Hibou-B](https://huggingface.co/histai/hibou-b) | `0.5` | |
 | `hibou-l` | [Hibou-L](https://huggingface.co/histai/hibou-L) | `0.5` | |
+| `lunit` | [Lunit ViT-S/8](https://huggingface.co/1aurent/vit_small_patch8_224.lunit_dino) | `0.5` | 384-dim; used as tile backbone for MOOZY |
 | `midnight` | [MidNight12k](https://huggingface.co/kaiko-ai/midnight) | `0.25`, `0.5`, `1.0`, `2.0` | Alias: `kaiko-midnight` |
 | `musk` | [MUSK](https://huggingface.co/xiangjx/musk) | `0.25`, `0.5`, `1.0` | Supports `output_variant="ms_aug"` (2048-dim, default) or `"cls"` (1024-dim). |
 | `phikon` | [Phikon](https://huggingface.co/owkin/phikon) | `0.5` | |
@@ -30,10 +31,36 @@ Preset-specific behavior lives in registry metadata and, where supported, `model
 | `virchow` | [Virchow](https://huggingface.co/paige-ai/Virchow) | `0.5` | Supports `output_variant="cls"` or `"cls_patch_mean"` |
 | `virchow2` | [Virchow2](https://huggingface.co/paige-ai/Virchow2) | `0.5`, `1.0`, `2.0` | Supports `output_variant="cls"` or `"cls_patch_mean"` |
 
-## Slide-level models (3)
+## Slide-level models (4)
 
-| Preset | Model | Tile Encoder | Supported Spacing (um) |
-| --- | --- | --- | --- |
-| `gigapath-slide` | [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | `gigapath` | `0.5` |
-| `prism` | [PRISM](https://huggingface.co/paige-ai/PRISM) | `virchow` (cls_patch_mean) | `0.5` |
-| `titan` | [TITAN](https://huggingface.co/MahmoodLab/TITAN) | `conchv15` | `0.5` |
+| Preset | Model | Tile Encoder | Supported Spacing (um) | Notes |
+| --- | --- | --- | --- | --- |
+| `gigapath-slide` | [Prov-GigaPath](https://huggingface.co/prov-gigapath/prov-gigapath) | `gigapath` | `0.5` | |
+| `moozy-slide` | [MOOZY](https://huggingface.co/AtlasAnalyticsLab/MOOZY) | `lunit` | `0.5` | 768-dim slide embedding; standalone slide encoder from the MOOZY stage-2 checkpoint |
+| `prism` | [PRISM](https://huggingface.co/paige-ai/PRISM) | `virchow` (cls_patch_mean) | `0.5` | |
+| `titan` | [TITAN](https://huggingface.co/MahmoodLab/TITAN) | `conchv15` | `0.5` | |
+
+## Patient-level models (1)
+
+Patient-level models aggregate multiple slide embeddings for the same patient into a single patient-level embedding. They require a `patient_id` column in the input manifest CSV (or `patient_id` keys in each slide dict when using the Python API).
+
+| Preset | Model | Tile Encoder | Supported Spacing (um) | Notes |
+| --- | --- | --- | --- | --- |
+| `moozy` | [MOOZY](https://huggingface.co/AtlasAnalyticsLab/MOOZY) | `lunit` | `0.5` | 768-dim patient embedding; runs Lunit tile encoder → MOOZY slide encoder → CaseAggregator transformer |
+
+### Patient manifest format
+
+Add a `patient_id` column to the standard manifest CSV to group slides by patient:
+
+```csv
+sample_id,image_path,patient_id
+slide_1a,/data/slide_1a.svs,patient_1
+slide_1b,/data/slide_1b.svs,patient_1
+slide_2a,/data/slide_2a.svs,patient_2
+```
+
+`sample_id` remains the unique slide identifier. Multiple rows may share the same `patient_id`.
+
+### Per-slide embeddings
+
+When running a patient-level model via `Pipeline`, the intermediate per-slide MOOZY embeddings can be saved alongside the patient embeddings by setting `save_slide_embeddings: true` in config (or `ExecutionOptions(save_slide_embeddings=True)` in the Python API). Saved slide embeddings are written to `slide_embeddings/` in the output directory.
diff --git a/docs/python-api.md b/docs/python-api.md
@@ -2,7 +2,7 @@
 
 `slide2vec` exposes two main workflows:
 
-- direct in-memory embedding with `Model.embed_slide(...)` and `Model.embed_slides(...)`
+- direct in-memory embedding with `Model.embed_slide(...)`, `Model.embed_slides(...)`, `Model.embed_patient(...)`, and `Model.embed_patients(...)`
 - artifact generation with `Pipeline.run(...)`
 
 ## Minimal interactive usage
@@ -108,12 +108,60 @@ Common fields:
 - `output_dir`
 - `output_format` - `"pt"` (default) or `"npz"`
 - `save_tile_embeddings` - persist tile embeddings for slide-level models (default `False`)
+- `save_slide_embeddings` - persist per-slide embeddings when running a patient-level model (default `False`)
 - `save_latents` - persist latent representations when available (default `False`)
 
 `num_gpus` defaults to all available GPUs. `embed_slide(...)` uses tile sharding for one slide, and `embed_slides(...)` balances whole slides across GPUs while preserving input order.
 
 If you need persisted artifact generation without using `Pipeline.run(...)`, use `Model.embed_tiles(...)` and `Model.aggregate_tiles(...)`.
 
+## Patient-level embedding
+
+For patient-level models (e.g. `moozy`), use `Model.embed_patient(...)` for a single patient or `Model.embed_patients(...)` for a batch of patients.
+
+### Single patient
+
+```python
+from slide2vec import Model
+
+model = Model.from_preset("moozy")
+result = model.embed_patient(
+    ["/data/slide_1a.svs", "/data/slide_1b.svs"],
+    patient_id="patient_1",
+)
+
+print(result.patient_id)              # "patient_1"
+print(result.patient_embedding.shape) # torch.Size([768])
+print(result.slide_embeddings)        # {"slide_1a": tensor, "slide_1b": tensor}
+```
+
+`embed_patient(...)` returns a single `EmbeddedPatient`. The `patient_id` argument is optional — when omitted, it is read from `patient_id` keys in the slide dicts, or falls back to `sample_id`.
+
+### Multiple patients
+
+```python
+results = model.embed_patients(
+    [
+        {"sample_id": "slide_1a", "image_path": "/data/slide_1a.svs", "patient_id": "patient_1"},
+        {"sample_id": "slide_1b", "image_path": "/data/slide_1b.svs", "patient_id": "patient_1"},
+        {"sample_id": "slide_2a", "image_path": "/data/slide_2a.svs", "patient_id": "patient_2"},
+    ]
+)
+
+for r in results:
+    print(r.patient_id, r.patient_embedding.shape)
+```
+
+`embed_patients(...)` returns one `EmbeddedPatient` per unique patient, ordered by first appearance. Pass an explicit `patient_id_map` dict (`{sample_id: patient_id}`) to override the per-slide `patient_id` keys.
+
+Each `EmbeddedPatient` has:
+
+- `patient_id`
+- `patient_embedding` — tensor of shape `(D,)` (768 for MOOZY)
+- `slide_embeddings` — `{sample_id: tensor}` for each contributing slide
+
+Both methods raise a `ValueError` if called on a non-patient-level model.
+
 ## Hierarchical Feature Extraction
 
 Hierarchical mode spatially groups tiles into regions before embedding, producing outputs with shape `(num_regions, tiles_per_region, feature_dim)`. This is useful for downstream models that consume region-level spatial structure rather than flat tile bags.
@@ -170,9 +218,10 @@ result = pipeline.run(manifest_path="/path/to/slides.csv")
 - `tile_artifacts`
 - `hierarchical_artifacts`
 - `slide_artifacts`
+- `patient_artifacts` — populated when using a patient-level model (e.g. `moozy`); one entry per unique patient, written to `patient_embeddings/` in the output directory
 - `process_list_path`
 
-The manifest schema matches HS2P and accepts optional `mask_path` and `spacing_at_level_0` columns.
+The manifest schema matches HS2P and accepts optional `mask_path` and `spacing_at_level_0` columns. Patient-level models additionally require a `patient_id` column; see [Patient manifest format](models.md#patient-manifest-format).
 
 ### Reusing pre-extracted coordinates
 

diff --git a/pyproject.toml b/pyproject.toml
@@ -71,6 +71,9 @@ hibou = [
     "scipy~=1.8.1",
     "scikit-image~=0.19.3",
 ]
+moozy = [
+    "moozy",
+]
 titan = [
     "torch==2.0.1",
     "timm==1.0.3",
@@ -106,6 +109,7 @@ fm = [
     "scikit-survival",
     "scikit-learn",
     "fairscale",
+    "moozy",
     "packaging==23.2",
     "ninja==1.11.1.1",
     "psutil<6",

diff --git a/slide2vec/api.py b/slide2vec/api.py
@@ -11,6 +11,7 @@
 
 from slide2vec.artifacts import (
     HierarchicalEmbeddingArtifact,
+    PatientEmbeddingArtifact,
     SlideEmbeddingArtifact,
     TileEmbeddingArtifact,
 )
@@ -127,6 +128,7 @@ class ExecutionOptions:
     prefetch_factor: int = 4
     persistent_workers: bool = True
     save_tile_embeddings: bool = False
+    save_slide_embeddings: bool = False
     save_latents: bool = False
 
     @classmethod
@@ -151,6 +153,7 @@ def from_config(cls, cfg: Any, *, run_on_cpu: bool = False) -> "ExecutionOptions
             prefetch_factor=prefetch_factor,
             persistent_workers=persistent_workers,
             save_tile_embeddings=bool(cfg.model.save_tile_embeddings),
+            save_slide_embeddings=bool(cfg.model.save_slide_embeddings),
             save_latents=bool(cfg.model.save_latents),
         )
 
@@ -200,9 +203,17 @@ class RunResult:
     tile_artifacts: list[TileEmbeddingArtifact]
     hierarchical_artifacts: list[HierarchicalEmbeddingArtifact]
     slide_artifacts: list[SlideEmbeddingArtifact]
+    patient_artifacts: list[PatientEmbeddingArtifact] = field(default_factory=list)
     process_list_path: Path | None = None
 
 
+@dataclass(frozen=True, kw_only=True)
+class EmbeddedPatient:
+    patient_id: str
+    patient_embedding: Any  # torch.Tensor [D]
+    slide_embeddings: dict[str, Any]  # {sample_id: torch.Tensor [D]}
+
+
 @dataclass(frozen=True, kw_only=True)
 class EmbeddedSlide:
     sample_id: str
@@ -343,6 +354,82 @@ def embed_slides(
                 execution=resolved,
             )
 
+    def embed_patient(
+        self,
+        slides: SlideSequence,
+        patient_id: str | None = None,
+        *,
+        preprocessing: PreprocessingConfig | None = None,
+        execution: ExecutionOptions | None = None,
+    ) -> "EmbeddedPatient":
+        """Embed a single patient's slides and return one ``EmbeddedPatient``.
+
+        Convenience wrapper around :meth:`embed_patients` for the common case
+        where all *slides* belong to the same patient.
+
+        Args:
+            slides: All slides for this patient.
+            patient_id: Optional patient identifier applied to every slide.
+                When omitted, ``patient_id`` is read from slide dict keys or
+                object attributes; slides that carry no ``patient_id`` fall
+                back to ``sample_id``.
+        """
+        patient_id_map: dict | None = None
+        if patient_id is not None:
+            patient_id_map = {}
+            for s in slides:
+                if isinstance(s, (str, Path)):
+                    patient_id_map[Path(s).stem] = patient_id
+                elif isinstance(s, dict):
+                    patient_id_map[str(s["sample_id"])] = patient_id
+                else:
+                    patient_id_map[str(s.sample_id)] = patient_id
+        return self.embed_patients(
+            slides,
+            patient_id_map=patient_id_map,
+            preprocessing=preprocessing,
+            execution=execution,
+        )[0]
+
+    def embed_patients(
+        self,
+        slides: SlideSequence,
+        patient_id_map: dict | None = None,
+        *,
+        preprocessing: PreprocessingConfig | None = None,
+        execution: ExecutionOptions | None = None,
+    ) -> "list[EmbeddedPatient]":
+        """Embed slides and aggregate them into patient-level embeddings.
+
+        Requires a patient-level model (e.g. ``moozy``).  For each patient
+        all contributing slide embeddings are aggregated by the model's
+        ``encode_patient`` method.
+
+        Args:
+            slides: Slides to process.  Each entry may be a path, a
+                ``SlideSpec``, or a dict with ``sample_id`` / ``image_path``
+                keys.  When *patient_id_map* is ``None`` a ``patient_id``
+                key in each dict is used to group slides.
+            patient_id_map: Optional explicit ``{sample_id: patient_id}``
+                mapping.  When provided it takes precedence over any
+                ``patient_id`` key embedded in the slide dicts.  When
+                omitted and the slide dicts carry no ``patient_id``, each
+                slide is treated as its own patient.
+        """
+        from slide2vec.inference import embed_patients
+
+        resolved = _coerce_execution_options(execution, model=self)
+        resolved_preprocessing = _resolve_direct_api_preprocessing(self, preprocessing)
+        with _auto_progress_reporting(output_dir=resolved.output_dir):
+            _validate_model_config(self, resolved_preprocessing, resolved)
+            return embed_patients(
+                self,
+                slides,
+                patient_id_map=patient_id_map,
+                preprocessing=resolved_preprocessing,
+                execution=resolved,
+            )
+
     def _load_backend(self) -> LoadedModel:
         if self._backend is None:
             from slide2vec.inference import load_model

diff --git a/slide2vec/artifacts.py b/slide2vec/artifacts.py
@@ -35,6 +35,20 @@ def metadata(self) -> dict[str, Any]:
         return load_metadata(self.metadata_path)
 
 
+@dataclass(frozen=True, kw_only=True)
+class PatientEmbeddingArtifact:
+    patient_id: str
+    path: Path
+    metadata_path: Path
+    format: str
+    feature_dim: int
+    num_slides: int
+
+    @property
+    def metadata(self) -> dict[str, Any]:
+        return load_metadata(self.metadata_path)
+
+
 @dataclass(frozen=True, kw_only=True)
 class HierarchicalEmbeddingArtifact:
     sample_id: str
@@ -223,6 +237,45 @@ def write_slide_embeddings(
     )
 
 
+def write_patient_embeddings(
+    patient_id: str,
+    embedding,
+    *,
+    output_dir: str | Path,
+    output_format: str = "pt",
+    metadata: dict[str, Any] | None = None,
+    num_slides: int = 0,
+) -> PatientEmbeddingArtifact:
+    output_format = _validate_output_format(output_format)
+    artifact_path, metadata_path = _setup_artifact_paths(
+        output_dir, "patient_embeddings", patient_id, output_format
+    )
+    embedding_array = _ensure_array(embedding)
+    if output_format == "pt":
+        torch.save(_ensure_tensor(embedding), artifact_path)
+    else:
+        np.savez_compressed(artifact_path, features=embedding_array)
+
+    patient_metadata = {
+        "patient_id": patient_id,
+        "artifact_type": "patient_embeddings",
+        "format": output_format,
+        "feature_dim": int(embedding_array.shape[-1]) if embedding_array.ndim else 1,
+        "num_slides": num_slides,
+    }
+    if metadata:
+        patient_metadata.update(metadata)
+    _write_metadata(metadata_path, patient_metadata)
+    return PatientEmbeddingArtifact(
+        patient_id=patient_id,
+        path=artifact_path,
+        metadata_path=metadata_path,
+        format=output_format,
+        feature_dim=patient_metadata["feature_dim"],
+        num_slides=num_slides,
+    )
+
+
 def write_hierarchical_embeddings(
     sample_id: str,
     features,

diff --git a/slide2vec/configs/default.yaml b/slide2vec/configs/default.yaml
@@ -13,6 +13,7 @@ model:
   output_variant: # requested output variant for presets that expose multiple outputs
   batch_size: 32
   save_tile_embeddings: false # whether to save tile embeddings alongside the pooled slide embedding when level is "slide"
+  save_slide_embeddings: false # whether to save per-slide embeddings when level is "patient" (e.g. moozy); requires a 'patient_id' column in the input CSV
   save_latents: false # whether to save the latent representations from the model alongside the slide embedding (only supported for 'prism')
   allow_non_recommended_settings: false # when true, non-recommended spacing / tile size / precision combinations warn instead of erroring
 

diff --git a/slide2vec/encoders/__init__.py b/slide2vec/encoders/__init__.py
@@ -6,6 +6,7 @@
 
 from slide2vec.encoders.base import (
     Encoder,
+    PatientEncoder,
     SlideEncoder,
     TileEncoder,
     TimmTileEncoder,
@@ -24,6 +25,7 @@
 
 __all__ = [
     "Encoder",
+    "PatientEncoder",
     "TileEncoder",
     "SlideEncoder",
     "TimmTileEncoder",