Skip to content
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

### Added

- (dataset-viewer) add a trame app for dataset visual exploration.
- (sample/features) add_field: check field size consistency with geometrical support.
- (sample) add `set_trees` to `Sample` delegated methods: `sample.set_trees(...)` now works as a direct proxy to `SampleFeatures.set_trees`, consistent with other delegated tree methods.

Expand Down
1 change: 1 addition & 0 deletions docs/source/core_concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,3 +20,4 @@ For more details and examples, see the :doc:`core_concepts` and :doc:`examples_t
core_concepts/defaults
core_concepts/disk_format
core_concepts/interoperability
core_concepts/viewer
178 changes: 178 additions & 0 deletions docs/source/core_concepts/viewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,178 @@
# Dataset viewer

The dataset viewer is a small trame/VTK web application that lets
you browse PLAID datasets stored on disk and inspect their samples in 3D.
It ships as the `plaid-viewer` console script.

## Architecture

The viewer runs as a single trame server process:

- `plaid.viewer.services.PlaidDatasetService` discovers datasets and
loads `plaid.Sample` instances. It uses
`plaid.storage.init_from_disk` to obtain `(dataset_dict,
converter_dict)` and materialises a sample on demand with
`converter.to_plaid(dataset, index)`, so every PLAID backend
(`hf_datasets`, `cgns`, `zarr`, ...) is supported uniformly.
Hugging Face Hub datasets are also supported: when a dataset id is
registered as a repo id, the service dispatches to
`plaid.storage.init_streaming_from_hub` instead, so samples are
streamed lazily without a full local copy.
- `plaid.viewer.services.ParaviewArtifactService` writes each selected
sample to a CGNS file (or `.cgns.series` sidecar for time-dependent
samples) in a per-process cache directory.
- `plaid.viewer.trame_app.server.build_server` assembles the UI
(Vuetify side drawer with dataset/split/sample selectors and display
options) and a VTK pipeline: `vtkCGNSReader` → optional cut plane →
optional threshold → composite-data geometry → mapper/actor.

There is no separate FastAPI backend and no second port: dataset
discovery, CGNS export and the 3D view are all served by trame.

## Launching the viewer

```bash
uv run plaid-viewer --datasets-root /path/to/datasets
```

Useful options:

| Option | Default | Description |
| ----------------- | ----------- | ------------------------------------------------------------------------------------------------ |
| `--datasets-root` | *required* | Directory containing one sub-directory per PLAID dataset. A single-dataset directory also works. |
| `--cache-dir` | `None` | Persistent artifact cache. When omitted, an ephemeral temp dir is used and cleaned at shutdown. |
| `--host` | `127.0.0.1` | Bind address for the trame HTTP server. |
| `--port` | `8080` | Port exposed by the trame HTTP server. |
| `--backend-id` | `disk` | PLAID backend identifier embedded in sample references and the cache key. |
| `--hub-repo` | `None` | Hugging Face Hub repo id (`namespace/name`) streamed via `init_streaming_from_hub`. Repeat the flag to pre-register multiple repos. |

Open `http://<host>:<port>/` in your browser.

### Streaming from the Hugging Face Hub

Hub datasets can be added at launch time with `--hub-repo` or from the
running UI through the **Hub** tab in the side drawer (the drawer now
groups the local datasets root and the Hugging Face repo input under a
`Local / Hub` tab selector, hidden when `--disable-root-change` is set).
Each registered repo shows up as a removable chip and as a new entry in
the **Dataset** dropdown. Samples are loaded on demand through
`plaid.storage.init_streaming_from_hub`, so only the selected sample's
shards are fetched.

```bash
# Start with one or more hub datasets pre-registered.
uv run plaid-viewer --hub-repo PLAID-lib/VKI-LS59 --hub-repo PLAID-lib/Rotor37
```

Streaming splits returned by PLAID are forward-only
`datasets.IterableDataset` objects without `__len__`. The viewer adapts
accordingly:

- A `streaming` chip appears in the toolbar to advertise the mode.
- The **Sample** slider starts at a single reachable step and grows by
one every time the user moves it to the right; each right-arrow press
consumes the next element from the iterator.
- Revisiting an already-fetched index simply re-renders the cached
sample; the slider cannot be rewound because the underlying iterator
cannot.
- Switching split or dataset rebuilds a fresh iterator from the Hub.
- When the stream is exhausted the slider caps at the last consumed
index and the counter label shows `(end of stream)`.


## Using the UI

The side drawer provides, from top to bottom:

1. **Dataset / Split** - two `VSelect` controls that pick the active
dataset and split.
2. **Sample** - a `VSlider` over the integer sample index of the current
split; the selected `sample_id` (and the total count) is shown under
the slider.
3. **Base** - a `VBtnToggle` with exclusive, mandatory selection: exactly
one renderable CGNS base exposed by `vtkCGNSReader.GetBaseSelection()`
is active at any time. Bases that contain
no `Zone_t` children (for example, a `Global` base storing only
reference scalars or free-standing tensors) are not rendered but are
summarised in the **Non-visual bases** accordion further down the
drawer: each `DataArray_t` is listed with its name, dtype, shape and a
short value preview.
4. **Field / Colormap / Show edges** - colour the geometry by any point
or cell array (all point and cell arrays are enabled on the reader
by default so every field shows up in the dropdown), pick from a set
of built-in colormaps and optionally overlay wireframe edges.
5. **Cut plane** - toggle a `vtkCutter` and interactively adjust its
normal and signed offset along that normal (the plane origin is the
current dataset's bounding-box centre).
6. **Threshold** - toggle a `vtkThreshold` filter on the currently
selected field and set the `[min, max]` range. Defaults are populated
from the field's data range.
7. **Select features** - an expandable panel listing the field paths
available for the current dataset (retrieved from the PLAID metadata
schema). Toggling checkboxes and clicking **Apply** filters the loaded
samples down to the selected fields:
- For disk-backed datasets the selection is forwarded to
`converter.to_plaid(dataset, index, features=...)`. PLAID expands
the list internally with
`plaid.utils.cgns_helper.update_features_for_CGNS_compatibility`
to preserve the CGNS conventions (coordinates, zones, grid
locations, etc. that make the kept fields renderable). The
user-facing selection is first intersected with the active split's
own feature catalogue, so paths that only live in another split
(for example a field present in `train` but not in `test`) do not
trigger a `Missing features` error.
- For streaming (Hugging Face Hub) datasets the expansion must be
done ahead of `init_streaming_from_hub`. The viewer calls
`update_features_for_CGNS_compatibility` itself and hands the
expanded list to the streaming loader, then invalidates the
current iterator so the next sample is materialised with the new
filter.
The **Clear** / **Select all** buttons in the panel header provide
shortcuts; an empty selection loads only the geometric support
(mesh + zones + metadata).
8. **Reset camera** - re-frames the current actor.

The 3D view is a server-side `VtkRemoteView` (images are rendered on the
server and streamed to the browser). Camera manipulation uses the
ParaView-like trackball style:

- Left mouse button: rotate.
- Middle mouse button (or Shift + left): pan.
- Mouse wheel (or right button drag): zoom.

A status line at the bottom of the drawer reports the last action or
error.

## Cache layout

Artifacts are written under:

```
<cache_root>/datasets/<dataset_id>/<split>/<sample_id>/<key_prefix>/
meshes/ # one CGNS per timestep (time-dependent)
meshes.cgns.series # ParaView file-series sidecar (time-dependent)
mesh.cgns # single static mesh
metadata.json # cache key, sample ref, export version, ...
```

The cache key is a SHA-256 of the sample reference, backend id, PLAID
version and `ViewerConfig.export_version`. Re-running the viewer with
the same inputs reuses existing artifacts; bumping `export_version`
invalidates them.

## Programmatic usage

```python
from pathlib import Path
from plaid.viewer.cache import CacheRoot
from plaid.viewer.config import ViewerConfig
from plaid.viewer.services import ParaviewArtifactService, PlaidDatasetService
from plaid.viewer.trame_app.server import build_server

config = ViewerConfig(datasets_root=Path("/path/to/datasets"))
with CacheRoot(persistent_dir=config.cache_dir) as cache:
datasets = PlaidDatasetService(config)
artifacts = ParaviewArtifactService(datasets, cache.path)
server = build_server(datasets, artifacts)
server.start(host="127.0.0.1", port=8080, open_browser=False)
```
9 changes: 9 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -85,9 +85,18 @@ dev = [
"sphinx-tabs>=3.4.7",
"sphinxcontrib-bibtex>=2.6.5",
]
viewer = [
"trame>=3.6,<4.0",
"trame-vtk>=2.8,<3.0",
"trame-vuetify>=2.7,<3.0",
"vtk>=9.6.1",
]

[tool.coverage.run]
omit = ["src/plaid/examples/*"]

[tool.pytest.ini_options]
filterwarnings = "ignore::DeprecationWarning"

[project.scripts]
plaid-viewer = "plaid.viewer.cli:main"
11 changes: 11 additions & 0 deletions src/plaid/viewer/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
"""Dataset viewer for PLAID.

This package hosts the raw PLAID dataset viewer: a FastAPI backend plus an
embedded trame/ParaView visualization server. PLAID owns the UI shell and
the page; PLAID owns data loading, sample interpretation, and CGNS export;
ParaView/trame owns the scientific visualization.
"""

from plaid.viewer.models import ParaviewArtifact, SampleRef

__all__ = ["ParaviewArtifact", "SampleRef"]
168 changes: 168 additions & 0 deletions src/plaid/viewer/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,168 @@
"""Ephemeral-by-default artifact cache for the dataset viewer.

The cache lives under a per-process temporary directory by default and is
removed at shutdown. Four cleanup layers cover all practical failure modes:

1. ``atexit.register`` for normal Python exit.
2. Signal handlers for ``SIGINT`` / ``SIGTERM``.
3. A FastAPI lifespan context (provided by callers).
4. An orphan sweep at startup that removes directories left behind by
previously-crashed processes (detected via ``os.kill(pid, 0)``).
"""

from __future__ import annotations

import atexit
import errno
import logging
import os
import re
import shutil
import signal
import tempfile
import uuid
from pathlib import Path

logger = logging.getLogger(__name__)

# Ephemeral tempdir naming: ``plaid-viewer-{pid}-{uuid4.hex}``.
_EPHEMERAL_PREFIX = "plaid-viewer-"
_EPHEMERAL_PATTERN = re.compile(r"^plaid-viewer-(?P<pid>\d+)-(?P<token>[0-9a-f]+)$")


def _process_is_alive(pid: int) -> bool:
"""Return ``True`` if a process with the given pid is still running."""
if pid <= 0:
return False
try:
os.kill(pid, 0)
except ProcessLookupError:
return False
except PermissionError:
# The process exists but is owned by someone else.
return True
except OSError as exc:
return exc.errno != errno.ESRCH
return True


def sweep_orphans(temp_root: Path | None = None) -> list[Path]:
"""Remove viewer tempdirs whose owning process is no longer running.

Args:
temp_root: Base temp directory to scan. Defaults to
:func:`tempfile.gettempdir`.

Returns:
List of directories that were removed.
"""
root = Path(temp_root) if temp_root is not None else Path(tempfile.gettempdir())
removed: list[Path] = []
if not root.is_dir():
return removed
for entry in root.iterdir():
if not entry.is_dir():
continue
match = _EPHEMERAL_PATTERN.match(entry.name)
if match is None:
continue
pid = int(match.group("pid"))
if _process_is_alive(pid):
continue
try:
shutil.rmtree(entry, ignore_errors=True)
removed.append(entry)
logger.info("Removed orphan viewer cache: %s", entry)
except OSError as exc:
logger.warning("Could not remove orphan viewer cache %s: %s", entry, exc)
return removed


class CacheRoot:
"""Context-manager-friendly artifact cache directory.

When ``persistent_dir`` is ``None`` (the default), a new ephemeral tempdir
named ``plaid-viewer-{pid}-{token}`` is created. The directory is
removed at process exit (``atexit``), on ``SIGINT`` / ``SIGTERM``, and
when the context manager is closed.

When ``persistent_dir`` is provided, that directory is used as-is and is
**not** removed. Callers wanting persistence pass this.
"""

def __init__(
self,
persistent_dir: Path | None = None,
*,
install_signal_handlers: bool = True,
run_orphan_sweep: bool = True,
) -> None:
self._ephemeral = persistent_dir is None
if self._ephemeral:
if run_orphan_sweep:
sweep_orphans()
token = uuid.uuid4().hex[:12]
base = Path(tempfile.gettempdir())
self._path = base / f"{_EPHEMERAL_PREFIX}{os.getpid()}-{token}"
self._path.mkdir(parents=True, exist_ok=False)
atexit.register(self._safe_cleanup)
if install_signal_handlers:
self._install_signal_handlers()
else:
self._path = Path(persistent_dir)
self._path.mkdir(parents=True, exist_ok=True)
self._closed = False

# ------------------------------------------------------------------ API

@property
def path(self) -> Path:
"""Root directory of the cache."""
return self._path

@property
def is_ephemeral(self) -> bool:
"""Whether the cache directory is automatically cleaned up."""
return self._ephemeral

def close(self) -> None:
"""Remove the cache directory if it is ephemeral."""
if self._closed:
return
self._closed = True
if self._ephemeral:
self._safe_cleanup()

def __enter__(self) -> "CacheRoot": # noqa: D105
return self

def __exit__(self, exc_type, exc, tb) -> None: # noqa: D105
self.close()

# -------------------------------------------------------------- Internals

def _safe_cleanup(self) -> None:
try:
shutil.rmtree(self._path, ignore_errors=True)
except Exception as exc:
logger.warning("Failed to clean viewer cache %s: %s", self._path, exc)

def _install_signal_handlers(self) -> None:
for sig in (signal.SIGINT, signal.SIGTERM):
try:
previous = signal.getsignal(sig)
except (ValueError, OSError):
continue

def handler(signum, frame, _prev=previous):
self._safe_cleanup()
if callable(_prev) and _prev not in (signal.SIG_DFL, signal.SIG_IGN):
_prev(signum, frame)
# Re-raise the default behaviour to keep expected exit codes.
signal.signal(signum, signal.SIG_DFL)
os.kill(os.getpid(), signum)

try:
signal.signal(sig, handler)
except (ValueError, OSError):
Comment thread
github-code-quality[bot] marked this conversation as resolved.
Fixed
pass
Loading
Loading