From 9eca580484872201f85b3efd6d843389b2b6b82c Mon Sep 17 00:00:00 2001 From: Xavier Roynard Date: Sat, 18 Apr 2026 00:58:19 +0200 Subject: [PATCH 1/3] docs: add structured AGENTS.md with nested module guides Add a comprehensive root AGENTS.md following modeles_d_agents best practices, plus nested AGENTS.md files for the containers and storage modules. --- AGENTS.md | 182 +++++++++++++++++++++++++++++++++ src/plaid/containers/AGENTS.md | 28 +++++ src/plaid/storage/AGENTS.md | 46 +++++++++ 3 files changed, 256 insertions(+) create mode 100644 AGENTS.md create mode 100644 src/plaid/containers/AGENTS.md create mode 100644 src/plaid/storage/AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 00000000..adcc4d99 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,182 @@ +# AGENTS.md -- plaid (pyplaid) + +## Project identity + +**plaid** is the foundational data model library of the [PLAID ecosystem](https://github.com/PLAID-lib). +Published on PyPI as `pyplaid`, it provides a structured format for representing physics +simulation data (meshes, fields, boundary conditions) and abstracts storage backends +(zarr, HuggingFace datasets, CGNS). + +Every other library in the ecosystem depends on plaid. + +## Expected agent behavior + +### Role + +You are a senior Python developer with experience in scientific computing, data modeling, +and open-source library design. You prioritize backward compatibility and clean abstractions. + +### Decision priorities + +1. **Backward compatibility** > new features -- this is a foundational library, breaking downstream users is costly +2. **Correctness** > performance -- data integrity in scientific computing is non-negotiable +3. **Readability** > cleverness -- contributors come from diverse scientific backgrounds + +### When in doubt + +- Do not change public API signatures without explicit approval +- Prefer adding new optional parameters with sensible defaults +- Check if the change impacts `scimm` or `maestro` (downstream consumers) +- Run the full test suite before proposing changes + +## Tech stack + +- **Language**: Python 3.11--3.13 +- **Package manager**: uv (with `pyproject.toml`) +- **Build backend**: setuptools with setuptools-scm (dynamic versioning) +- **Linter/formatter**: ruff +- **Test framework**: pytest +- **Documentation**: Sphinx (ReadTheDocs) +- **CI/CD**: GitHub Actions + +## Project structure + +``` +. +├── AGENTS.md <- This file +├── pyproject.toml <- Dependencies and project metadata +├── ruff.toml <- Ruff linter/formatter configuration +├── CHANGELOG.md <- Version history +├── CONTRIBUTING.md <- Contribution guidelines +├── src/plaid/ <- Source code +│ ├── __init__.py +│ ├── constants.py <- Global constants +│ ├── problem_definition.py <- ProblemDefinition (core concept) +│ ├── containers/ <- Dataset, Sample, Features (see nested AGENTS.md) +│ ├── storage/ <- Storage backends: zarr, hf_datasets, cgns (see nested AGENTS.md) +│ ├── bridges/ <- HuggingFace bridge utilities +│ ├── pipelines/ <- sklearn-compatible processing blocks +│ ├── post/ <- Post-processing (metrics, bisection) +│ └── examples/ <- Built-in example datasets +├── tests/ <- Test suite +├── docs/ <- Sphinx documentation source +├── examples/ <- Usage examples +└── benchmarks/ <- Performance benchmarks +``` + +## Architecture and key concepts + +### Core abstractions + +| Concept | Module | Description | +|---------|--------|-------------| +| `ProblemDefinition` | `problem_definition.py` | Declares fields, meshes, and their roles (input/output/context) for a physics problem | +| `Sample` | `containers/sample.py` | One simulation snapshot: mesh + field values | +| `Dataset` | `containers/dataset.py` | Ordered collection of Samples with shared ProblemDefinition | +| `Features` | `containers/features.py` | Named tensor-like data with metadata | +| `FeatureIdentifier` | `containers/feature_identifier.py` | Unique key to identify a feature across samples | + +### Storage pattern + +Storage uses a **Registry pattern** (`storage/registry.py`) to dispatch read/write +operations to the correct backend (zarr, hf_datasets, cgns). Each backend implements +a `reader.py` and `writer.py` following a common interface defined in `storage/common/`. + +### Dependency graph (ecosystem) + +``` +plaid (pyplaid) scimm + ^ ^ + | pyplaid>=0.1.13 | scimm>=0.2.0 + | | + +----------+-------------+ + | + maestro (glue layer) +``` + +plaid has **no dependency** on scimm or maestro. Changes here propagate downstream. + +## Code conventions + +### Formatting and linting + +Ruff is configured in `ruff.toml`: +- **Line length**: 88 characters +- **Lint rules**: `D` (docstrings), `E`/`W` (pycodestyle), `F` (pyflakes), `ARG` (unused arguments), `I` (import sorting) +- **Docstring convention**: Google style +- **Excluded directories**: `examples/`, `docs/`, `benchmarks/` +- **Test files**: docstring rules (`D`) and `S101` (assert) are ignored + +```bash +# Check linting +uv run ruff check . + +# Auto-fix +uv run ruff check --fix . + +# Format +uv run ruff format . +``` + +### Type hints + +- Required on all public functions and methods +- Use modern syntax: `list[str]`, `dict[str, int]`, `X | None` (not `Optional[X]`) +- Never use deprecated `typing.List`, `typing.Dict`, `typing.Optional` + +### Docstrings + +- Google style (enforced by ruff rule `D` with `convention = "google"`) +- Required on all public modules, classes, functions, and methods +- Update docstrings whenever you modify code behavior + +## Testing + +- **Framework**: pytest +- **Location**: `tests/` +- **Run all**: `uv run pytest` +- **Run specific**: `uv run pytest tests/path/to/test_file.py` +- **With coverage**: `uv run pytest --cov=src` + +Guidelines: +- Write tests for new public functions, classes, and methods +- Test edge cases and error conditions +- Use descriptive test names that explain the scenario +- Mock external dependencies (file I/O, network) to keep tests fast +- Do not test trivial code or third-party libraries + +## Commands + +```bash +# Install dependencies +uv sync + +# Run tests +uv run pytest + +# Check linting +uv run ruff check . + +# Auto-fix linting issues +uv run ruff check --fix . + +# Format code +uv run ruff format . + +# Build documentation +cd docs && make html +``` + +## Contribution workflow + +When making changes: + +1. Read and understand existing code before modifying +2. Write or update code with type hints +3. Write unit tests for new functionality +4. Update docstrings (Google style) +5. Update Sphinx documentation if functionality changed +6. Run formatter: `uv run ruff format .` +7. Run linter: `uv run ruff check --fix .` +8. Run tests: `uv run pytest` +9. Check if changes are breaking and inform the reviewer if a major version bump is needed diff --git a/src/plaid/containers/AGENTS.md b/src/plaid/containers/AGENTS.md new file mode 100644 index 00000000..675503f8 --- /dev/null +++ b/src/plaid/containers/AGENTS.md @@ -0,0 +1,28 @@ +# AGENTS.md -- plaid/containers + +This module defines the core data containers of the PLAID data model. + +## Key classes + +| Class | File | Description | +|-------|------|-------------| +| `Dataset` | `dataset.py` | Ordered collection of `Sample` objects sharing a common `ProblemDefinition`. Main entry point for loading and manipulating simulation data. | +| `Sample` | `sample.py` | Single simulation snapshot containing mesh coordinates and field values as `Features`. | +| `Features` | `features.py` | Named tensor-like container with shape and dtype metadata. Wraps numpy arrays. | +| `FeatureIdentifier` | `feature_identifier.py` | Immutable key (name + location) used to uniquely identify a feature across samples. | +| `DefaultManager` | `managers/default_manager.py` | Manages default values and missing data for features within a dataset. | + +## Design constraints + +- `Dataset` is a **large class** (~1800 lines). Avoid adding new responsibilities to it. Prefer extracting logic into helper functions or dedicated modules. +- `Sample` and `Features` are **value objects** -- they should remain simple, with minimal business logic. +- `FeatureIdentifier` is **immutable and hashable** -- it is used as dictionary keys throughout the codebase. Do not add mutable state. +- All containers must support **serialization** through the storage backends (zarr, hf_datasets, cgns). + +## Downstream impact + +These classes are the public API surface consumed by `maestro` and end users. Any signature change is a **breaking change** that requires a major version bump. + +## Testing + +Tests are in `tests/`. When modifying a container class, verify that storage round-trips (write then read) still produce identical data. diff --git a/src/plaid/storage/AGENTS.md b/src/plaid/storage/AGENTS.md new file mode 100644 index 00000000..74805e4d --- /dev/null +++ b/src/plaid/storage/AGENTS.md @@ -0,0 +1,46 @@ +# AGENTS.md -- plaid/storage + +This module implements the multi-backend storage layer for reading and writing PLAID datasets. + +## Architecture + +Storage follows a **Registry pattern**: + +``` +storage/ +├── registry.py <- Dispatches to the correct backend based on format +├── reader.py <- Public read API (delegates to backend readers) +├── writer.py <- Public write API (delegates to backend writers) +├── common/ <- Abstract interfaces and shared utilities +│ ├── reader.py <- Base reader interface +│ ├── writer.py <- Base writer interface +│ ├── bridge.py <- Format conversion helpers +│ └── preprocessor.py +├── zarr/ <- Zarr backend (reader.py, writer.py, bridge.py) +├── hf_datasets/ <- HuggingFace datasets backend (reader.py, writer.py, bridge.py) +└── cgns/ <- CGNS backend (reader.py, writer.py) +``` + +## How it works + +1. The **registry** (`registry.py`) maps format identifiers to backend modules. +2. The public `reader.py` and `writer.py` at the top level accept a format parameter and delegate to the appropriate backend. +3. Each backend implements the interfaces defined in `common/reader.py` and `common/writer.py`. + +## Adding a new backend + +1. Create a new subdirectory under `storage/` (e.g., `storage/my_format/`). +2. Implement `reader.py` and `writer.py` following the interfaces in `common/`. +3. Register the new backend in `registry.py`. +4. Add round-trip tests (write then read) to verify data integrity. + +## Design constraints + +- Backends must be **stateless** -- all configuration is passed through function parameters. +- Read/write operations must preserve **data integrity** exactly (no lossy conversions without explicit user consent). +- The `common/` interfaces are the **contract** -- do not add backend-specific parameters to the public API without updating the contract first. +- `zarr` is the primary backend and the most feature-complete. Use it as the reference when implementing others. + +## Testing + +Each backend should have round-trip tests that write a dataset and read it back, asserting equality. Tests are in `tests/`. From 5af6092266f0a52c1ee1fd8ac8501018d9de2a58 Mon Sep 17 00:00:00 2001 From: Xavier Roynard Date: Sat, 18 Apr 2026 17:55:34 +0200 Subject: [PATCH 2/3] docs: remove private repo names from public AGENTS.md, add PR rules and confidentiality section - Remove all mentions of scimm and maestro (private repos) from public-facing content - Add confidentiality section warning agents not to mention private repos - Add PR title emoji rules from .github/pull_request_template.md - Add PR checklist reference --- AGENTS.md | 45 ++++++++++++++++++++++------------ src/plaid/containers/AGENTS.md | 2 +- 2 files changed, 30 insertions(+), 17 deletions(-) diff --git a/AGENTS.md b/AGENTS.md index adcc4d99..93ac12be 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -7,7 +7,7 @@ Published on PyPI as `pyplaid`, it provides a structured format for representing simulation data (meshes, fields, boundary conditions) and abstracts storage backends (zarr, HuggingFace datasets, CGNS). -Every other library in the ecosystem depends on plaid. +Other libraries in the ecosystem depend on plaid. ## Expected agent behavior @@ -26,9 +26,16 @@ and open-source library design. You prioritize backward compatibility and clean - Do not change public API signatures without explicit approval - Prefer adding new optional parameters with sensible defaults -- Check if the change impacts `scimm` or `maestro` (downstream consumers) +- Check if the change impacts downstream consumers of plaid - Run the full test suite before proposing changes +### Confidentiality + +plaid is a **public** repository. Some downstream libraries in the PLAID ecosystem are private. +Never mention private repository names, internal project names, or confidential details +in any public-facing content (code comments, docstrings, commit messages, PR descriptions, +issues, or documentation). + ## Tech stack - **Language**: Python 3.11--3.13 @@ -82,20 +89,6 @@ Storage uses a **Registry pattern** (`storage/registry.py`) to dispatch read/wri operations to the correct backend (zarr, hf_datasets, cgns). Each backend implements a `reader.py` and `writer.py` following a common interface defined in `storage/common/`. -### Dependency graph (ecosystem) - -``` -plaid (pyplaid) scimm - ^ ^ - | pyplaid>=0.1.13 | scimm>=0.2.0 - | | - +----------+-------------+ - | - maestro (glue layer) -``` - -plaid has **no dependency** on scimm or maestro. Changes here propagate downstream. - ## Code conventions ### Formatting and linting @@ -145,6 +138,26 @@ Guidelines: - Mock external dependencies (file I/O, network) to keep tests fast - Do not test trivial code or third-party libraries +## Pull request rules + +PR titles **must start with one of the following emojis** to indicate the type of change: + +| Emoji | Type | +|-------|------| +| 🐛 | Bug fix | +| 📄 | Documentation | +| 🎉 | New feature or initial commit | +| 🚀 | Performance or deployment | +| ♻️ | Refactor or cleanup | +| 📦 | Packaging or dependency management | + +PR checklist (from `.github/pull_request_template.md`): +- Typing enforced +- Documentation updated +- Changelog updated +- Tests and example updates +- Coverage should be 100% + ## Commands ```bash diff --git a/src/plaid/containers/AGENTS.md b/src/plaid/containers/AGENTS.md index 675503f8..21bcb61c 100644 --- a/src/plaid/containers/AGENTS.md +++ b/src/plaid/containers/AGENTS.md @@ -21,7 +21,7 @@ This module defines the core data containers of the PLAID data model. ## Downstream impact -These classes are the public API surface consumed by `maestro` and end users. Any signature change is a **breaking change** that requires a major version bump. +These classes are the public API surface consumed by downstream libraries and end users. Any signature change is a **breaking change** that requires a major version bump. ## Testing From 8298dfd14ef3b4fc5120f84d2890592d02173eb1 Mon Sep 17 00:00:00 2001 From: Xavier Roynard Date: Sat, 18 Apr 2026 18:43:55 +0200 Subject: [PATCH 3/3] docs: add communication rules (English, direct tone) --- AGENTS.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/AGENTS.md b/AGENTS.md index 93ac12be..333c9920 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -36,6 +36,11 @@ Never mention private repository names, internal project names, or confidential in any public-facing content (code comments, docstrings, commit messages, PR descriptions, issues, or documentation). +### Communication rules + +- All interactions on this repository (issues, PRs, reviews, comments) must be in **English**. +- Be direct and concise. Avoid compliments, flattery, or filler sentences. + ## Tech stack - **Language**: Python 3.11--3.13