-
Notifications
You must be signed in to change notification settings - Fork 5
🎉 Add dataset viewer #384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
casenave
wants to merge
18
commits into
main
Choose a base branch
from
viewer
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
🎉 Add dataset viewer #384
Changes from 7 commits
Commits
Show all changes
18 commits
Select commit
Hold shift + click to select a range
8dc6c10
update env (dependabot alert)
casenave a2f20d0
init
casenave 54e9b61
merge
casenave 2d55793
dependencies
casenave ff9cf1a
update CHANGELOG
casenave e2fbc13
remove comments
casenave bb7257d
coverage
casenave 12aabf1
Potential fix for pull request finding 'Empty except'
casenave 1abf3af
Potential fix for pull request finding 'Empty except'
casenave c3fd893
code quality
casenave efe45e5
windows hack
casenave b4b2a91
win trick
casenave 7bcc1ca
windows fix
casenave 7eaab56
update deps
casenave 5e98391
update
casenave 76d7194
wip
casenave b4db6b7
wip
fabiencasenave d4dc11a
wip
casenave File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,178 @@ | ||
| # Dataset viewer | ||
|
|
||
| The dataset viewer is a small trame/VTK web application that lets | ||
| you browse PLAID datasets stored on disk and inspect their samples in 3D. | ||
| It ships as the `plaid-viewer` console script. | ||
|
|
||
| ## Architecture | ||
|
|
||
| The viewer runs as a single trame server process: | ||
|
|
||
| - `plaid.viewer.services.PlaidDatasetService` discovers datasets and | ||
| loads `plaid.Sample` instances. It uses | ||
| `plaid.storage.init_from_disk` to obtain `(dataset_dict, | ||
| converter_dict)` and materialises a sample on demand with | ||
| `converter.to_plaid(dataset, index)`, so every PLAID backend | ||
| (`hf_datasets`, `cgns`, `zarr`, ...) is supported uniformly. | ||
| Hugging Face Hub datasets are also supported: when a dataset id is | ||
| registered as a repo id, the service dispatches to | ||
| `plaid.storage.init_streaming_from_hub` instead, so samples are | ||
| streamed lazily without a full local copy. | ||
| - `plaid.viewer.services.ParaviewArtifactService` writes each selected | ||
| sample to a CGNS file (or `.cgns.series` sidecar for time-dependent | ||
| samples) in a per-process cache directory. | ||
| - `plaid.viewer.trame_app.server.build_server` assembles the UI | ||
| (Vuetify side drawer with dataset/split/sample selectors and display | ||
| options) and a VTK pipeline: `vtkCGNSReader` → optional cut plane → | ||
| optional threshold → composite-data geometry → mapper/actor. | ||
|
|
||
| There is no separate FastAPI backend and no second port: dataset | ||
| discovery, CGNS export and the 3D view are all served by trame. | ||
|
|
||
| ## Launching the viewer | ||
|
|
||
| ```bash | ||
| uv run plaid-viewer --datasets-root /path/to/datasets | ||
| ``` | ||
|
|
||
| Useful options: | ||
|
|
||
| | Option | Default | Description | | ||
| | ----------------- | ----------- | ------------------------------------------------------------------------------------------------ | | ||
| | `--datasets-root` | *required* | Directory containing one sub-directory per PLAID dataset. A single-dataset directory also works. | | ||
| | `--cache-dir` | `None` | Persistent artifact cache. When omitted, an ephemeral temp dir is used and cleaned at shutdown. | | ||
| | `--host` | `127.0.0.1` | Bind address for the trame HTTP server. | | ||
| | `--port` | `8080` | Port exposed by the trame HTTP server. | | ||
| | `--backend-id` | `disk` | PLAID backend identifier embedded in sample references and the cache key. | | ||
| | `--hub-repo` | `None` | Hugging Face Hub repo id (`namespace/name`) streamed via `init_streaming_from_hub`. Repeat the flag to pre-register multiple repos. | | ||
|
|
||
| Open `http://<host>:<port>/` in your browser. | ||
|
|
||
| ### Streaming from the Hugging Face Hub | ||
|
|
||
| Hub datasets can be added at launch time with `--hub-repo` or from the | ||
| running UI through the **Hub** tab in the side drawer (the drawer now | ||
| groups the local datasets root and the Hugging Face repo input under a | ||
| `Local / Hub` tab selector, hidden when `--disable-root-change` is set). | ||
| Each registered repo shows up as a removable chip and as a new entry in | ||
| the **Dataset** dropdown. Samples are loaded on demand through | ||
| `plaid.storage.init_streaming_from_hub`, so only the selected sample's | ||
| shards are fetched. | ||
|
|
||
| ```bash | ||
| # Start with one or more hub datasets pre-registered. | ||
| uv run plaid-viewer --hub-repo PLAID-lib/VKI-LS59 --hub-repo PLAID-lib/Rotor37 | ||
| ``` | ||
|
|
||
| Streaming splits returned by PLAID are forward-only | ||
| `datasets.IterableDataset` objects without `__len__`. The viewer adapts | ||
| accordingly: | ||
|
|
||
| - A `streaming` chip appears in the toolbar to advertise the mode. | ||
| - The **Sample** slider starts at a single reachable step and grows by | ||
| one every time the user moves it to the right; each right-arrow press | ||
| consumes the next element from the iterator. | ||
| - Revisiting an already-fetched index simply re-renders the cached | ||
| sample; the slider cannot be rewound because the underlying iterator | ||
| cannot. | ||
| - Switching split or dataset rebuilds a fresh iterator from the Hub. | ||
| - When the stream is exhausted the slider caps at the last consumed | ||
| index and the counter label shows `(end of stream)`. | ||
|
|
||
|
|
||
| ## Using the UI | ||
|
|
||
| The side drawer provides, from top to bottom: | ||
|
|
||
| 1. **Dataset / Split** - two `VSelect` controls that pick the active | ||
| dataset and split. | ||
| 2. **Sample** - a `VSlider` over the integer sample index of the current | ||
| split; the selected `sample_id` (and the total count) is shown under | ||
| the slider. | ||
| 3. **Base** - a `VBtnToggle` with exclusive, mandatory selection: exactly | ||
| one renderable CGNS base exposed by `vtkCGNSReader.GetBaseSelection()` | ||
| is active at any time. Bases that contain | ||
| no `Zone_t` children (for example, a `Global` base storing only | ||
| reference scalars or free-standing tensors) are not rendered but are | ||
| summarised in the **Non-visual bases** accordion further down the | ||
| drawer: each `DataArray_t` is listed with its name, dtype, shape and a | ||
| short value preview. | ||
| 4. **Field / Colormap / Show edges** - colour the geometry by any point | ||
| or cell array (all point and cell arrays are enabled on the reader | ||
| by default so every field shows up in the dropdown), pick from a set | ||
| of built-in colormaps and optionally overlay wireframe edges. | ||
| 5. **Cut plane** - toggle a `vtkCutter` and interactively adjust its | ||
| normal and signed offset along that normal (the plane origin is the | ||
| current dataset's bounding-box centre). | ||
| 6. **Threshold** - toggle a `vtkThreshold` filter on the currently | ||
| selected field and set the `[min, max]` range. Defaults are populated | ||
| from the field's data range. | ||
| 7. **Select features** - an expandable panel listing the field paths | ||
| available for the current dataset (retrieved from the PLAID metadata | ||
| schema). Toggling checkboxes and clicking **Apply** filters the loaded | ||
| samples down to the selected fields: | ||
| - For disk-backed datasets the selection is forwarded to | ||
| `converter.to_plaid(dataset, index, features=...)`. PLAID expands | ||
| the list internally with | ||
| `plaid.utils.cgns_helper.update_features_for_CGNS_compatibility` | ||
| to preserve the CGNS conventions (coordinates, zones, grid | ||
| locations, etc. that make the kept fields renderable). The | ||
| user-facing selection is first intersected with the active split's | ||
| own feature catalogue, so paths that only live in another split | ||
| (for example a field present in `train` but not in `test`) do not | ||
| trigger a `Missing features` error. | ||
| - For streaming (Hugging Face Hub) datasets the expansion must be | ||
| done ahead of `init_streaming_from_hub`. The viewer calls | ||
| `update_features_for_CGNS_compatibility` itself and hands the | ||
| expanded list to the streaming loader, then invalidates the | ||
| current iterator so the next sample is materialised with the new | ||
| filter. | ||
| The **Clear** / **Select all** buttons in the panel header provide | ||
| shortcuts; an empty selection loads only the geometric support | ||
| (mesh + zones + metadata). | ||
| 8. **Reset camera** - re-frames the current actor. | ||
|
|
||
| The 3D view is a server-side `VtkRemoteView` (images are rendered on the | ||
| server and streamed to the browser). Camera manipulation uses the | ||
| ParaView-like trackball style: | ||
|
|
||
| - Left mouse button: rotate. | ||
| - Middle mouse button (or Shift + left): pan. | ||
| - Mouse wheel (or right button drag): zoom. | ||
|
|
||
| A status line at the bottom of the drawer reports the last action or | ||
| error. | ||
|
|
||
| ## Cache layout | ||
|
|
||
| Artifacts are written under: | ||
|
|
||
| ``` | ||
| <cache_root>/datasets/<dataset_id>/<split>/<sample_id>/<key_prefix>/ | ||
| meshes/ # one CGNS per timestep (time-dependent) | ||
| meshes.cgns.series # ParaView file-series sidecar (time-dependent) | ||
| mesh.cgns # single static mesh | ||
| metadata.json # cache key, sample ref, export version, ... | ||
| ``` | ||
|
|
||
| The cache key is a SHA-256 of the sample reference, backend id, PLAID | ||
| version and `ViewerConfig.export_version`. Re-running the viewer with | ||
| the same inputs reuses existing artifacts; bumping `export_version` | ||
| invalidates them. | ||
|
|
||
| ## Programmatic usage | ||
|
|
||
| ```python | ||
| from pathlib import Path | ||
| from plaid.viewer.cache import CacheRoot | ||
| from plaid.viewer.config import ViewerConfig | ||
| from plaid.viewer.services import ParaviewArtifactService, PlaidDatasetService | ||
| from plaid.viewer.trame_app.server import build_server | ||
|
|
||
| config = ViewerConfig(datasets_root=Path("/path/to/datasets")) | ||
| with CacheRoot(persistent_dir=config.cache_dir) as cache: | ||
| datasets = PlaidDatasetService(config) | ||
| artifacts = ParaviewArtifactService(datasets, cache.path) | ||
| server = build_server(datasets, artifacts) | ||
| server.start(host="127.0.0.1", port=8080, open_browser=False) | ||
| ``` |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,11 @@ | ||
| """Dataset viewer for PLAID. | ||
|
|
||
| This package hosts the raw PLAID dataset viewer: a FastAPI backend plus an | ||
| embedded trame/ParaView visualization server. PLAID owns the UI shell and | ||
| the page; PLAID owns data loading, sample interpretation, and CGNS export; | ||
| ParaView/trame owns the scientific visualization. | ||
| """ | ||
|
|
||
| from plaid.viewer.models import ParaviewArtifact, SampleRef | ||
|
|
||
| __all__ = ["ParaviewArtifact", "SampleRef"] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,168 @@ | ||
| """Ephemeral-by-default artifact cache for the dataset viewer. | ||
|
|
||
| The cache lives under a per-process temporary directory by default and is | ||
| removed at shutdown. Four cleanup layers cover all practical failure modes: | ||
|
|
||
| 1. ``atexit.register`` for normal Python exit. | ||
| 2. Signal handlers for ``SIGINT`` / ``SIGTERM``. | ||
| 3. A FastAPI lifespan context (provided by callers). | ||
| 4. An orphan sweep at startup that removes directories left behind by | ||
| previously-crashed processes (detected via ``os.kill(pid, 0)``). | ||
| """ | ||
|
|
||
| from __future__ import annotations | ||
|
|
||
| import atexit | ||
| import errno | ||
| import logging | ||
| import os | ||
| import re | ||
| import shutil | ||
| import signal | ||
| import tempfile | ||
| import uuid | ||
| from pathlib import Path | ||
|
|
||
| logger = logging.getLogger(__name__) | ||
|
|
||
| # Ephemeral tempdir naming: ``plaid-viewer-{pid}-{uuid4.hex}``. | ||
| _EPHEMERAL_PREFIX = "plaid-viewer-" | ||
| _EPHEMERAL_PATTERN = re.compile(r"^plaid-viewer-(?P<pid>\d+)-(?P<token>[0-9a-f]+)$") | ||
|
|
||
|
|
||
| def _process_is_alive(pid: int) -> bool: | ||
| """Return ``True`` if a process with the given pid is still running.""" | ||
| if pid <= 0: | ||
| return False | ||
| try: | ||
| os.kill(pid, 0) | ||
| except ProcessLookupError: | ||
| return False | ||
| except PermissionError: | ||
| # The process exists but is owned by someone else. | ||
| return True | ||
| except OSError as exc: | ||
| return exc.errno != errno.ESRCH | ||
| return True | ||
|
|
||
|
|
||
| def sweep_orphans(temp_root: Path | None = None) -> list[Path]: | ||
| """Remove viewer tempdirs whose owning process is no longer running. | ||
|
|
||
| Args: | ||
| temp_root: Base temp directory to scan. Defaults to | ||
| :func:`tempfile.gettempdir`. | ||
|
|
||
| Returns: | ||
| List of directories that were removed. | ||
| """ | ||
| root = Path(temp_root) if temp_root is not None else Path(tempfile.gettempdir()) | ||
| removed: list[Path] = [] | ||
| if not root.is_dir(): | ||
| return removed | ||
| for entry in root.iterdir(): | ||
| if not entry.is_dir(): | ||
| continue | ||
| match = _EPHEMERAL_PATTERN.match(entry.name) | ||
| if match is None: | ||
| continue | ||
| pid = int(match.group("pid")) | ||
| if _process_is_alive(pid): | ||
| continue | ||
| try: | ||
| shutil.rmtree(entry, ignore_errors=True) | ||
| removed.append(entry) | ||
| logger.info("Removed orphan viewer cache: %s", entry) | ||
| except OSError as exc: | ||
| logger.warning("Could not remove orphan viewer cache %s: %s", entry, exc) | ||
| return removed | ||
|
|
||
|
|
||
| class CacheRoot: | ||
| """Context-manager-friendly artifact cache directory. | ||
|
|
||
| When ``persistent_dir`` is ``None`` (the default), a new ephemeral tempdir | ||
| named ``plaid-viewer-{pid}-{token}`` is created. The directory is | ||
| removed at process exit (``atexit``), on ``SIGINT`` / ``SIGTERM``, and | ||
| when the context manager is closed. | ||
|
|
||
| When ``persistent_dir`` is provided, that directory is used as-is and is | ||
| **not** removed. Callers wanting persistence pass this. | ||
| """ | ||
|
|
||
| def __init__( | ||
| self, | ||
| persistent_dir: Path | None = None, | ||
| *, | ||
| install_signal_handlers: bool = True, | ||
| run_orphan_sweep: bool = True, | ||
| ) -> None: | ||
| self._ephemeral = persistent_dir is None | ||
| if self._ephemeral: | ||
| if run_orphan_sweep: | ||
| sweep_orphans() | ||
| token = uuid.uuid4().hex[:12] | ||
| base = Path(tempfile.gettempdir()) | ||
| self._path = base / f"{_EPHEMERAL_PREFIX}{os.getpid()}-{token}" | ||
| self._path.mkdir(parents=True, exist_ok=False) | ||
| atexit.register(self._safe_cleanup) | ||
| if install_signal_handlers: | ||
| self._install_signal_handlers() | ||
| else: | ||
| self._path = Path(persistent_dir) | ||
| self._path.mkdir(parents=True, exist_ok=True) | ||
| self._closed = False | ||
|
|
||
| # ------------------------------------------------------------------ API | ||
|
|
||
| @property | ||
| def path(self) -> Path: | ||
| """Root directory of the cache.""" | ||
| return self._path | ||
|
|
||
| @property | ||
| def is_ephemeral(self) -> bool: | ||
| """Whether the cache directory is automatically cleaned up.""" | ||
| return self._ephemeral | ||
|
|
||
| def close(self) -> None: | ||
| """Remove the cache directory if it is ephemeral.""" | ||
| if self._closed: | ||
| return | ||
| self._closed = True | ||
| if self._ephemeral: | ||
| self._safe_cleanup() | ||
|
|
||
| def __enter__(self) -> "CacheRoot": # noqa: D105 | ||
| return self | ||
|
|
||
| def __exit__(self, exc_type, exc, tb) -> None: # noqa: D105 | ||
| self.close() | ||
|
|
||
| # -------------------------------------------------------------- Internals | ||
|
|
||
| def _safe_cleanup(self) -> None: | ||
| try: | ||
| shutil.rmtree(self._path, ignore_errors=True) | ||
| except Exception as exc: | ||
| logger.warning("Failed to clean viewer cache %s: %s", self._path, exc) | ||
|
|
||
| def _install_signal_handlers(self) -> None: | ||
| for sig in (signal.SIGINT, signal.SIGTERM): | ||
| try: | ||
| previous = signal.getsignal(sig) | ||
| except (ValueError, OSError): | ||
| continue | ||
|
|
||
| def handler(signum, frame, _prev=previous): | ||
| self._safe_cleanup() | ||
| if callable(_prev) and _prev not in (signal.SIG_DFL, signal.SIG_IGN): | ||
| _prev(signum, frame) | ||
| # Re-raise the default behaviour to keep expected exit codes. | ||
| signal.signal(signum, signal.SIG_DFL) | ||
| os.kill(os.getpid(), signum) | ||
|
|
||
| try: | ||
| signal.signal(sig, handler) | ||
| except (ValueError, OSError): | ||
| pass | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.