ai2cm · mcgibbon · Mar 12, 2026 · Mar 12, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -31,6 +31,70 @@ Pre-commit hooks run ruff, ruff-format, and mypy. If ruff-format modifies files,
 
 To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings. You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
 
+## Code Guidelines for Agents
+
+When writing or reviewing code, apply these project-specific guidelines. See
+CONTRIBUTING.md for the full details; this section highlights what agents are
+most likely to miss.
+
+### Design: abstraction and responsibility isolation
+
+This is the highest-priority review concern. When a decision changes in the
+future, as little code as possible should need to be touched.
+
+- **Polymorphism over type-checking.** `if isinstance(x, A) ... elif
+  isinstance(x, B) ...` chains mean the behavior should be a method on the
+  types being checked. Flag these in reviews and suggest moving logic to the
+  relevant classes.
+- **One abstraction level per function.** If a function mixes high-level
+  orchestration with low-level details, suggest extracting helpers.
+- **Distributed concerns stay in `Distributed`.** Other code calls
+  `dist.method()`. Never import `torch.distributed` or backend internals
+  outside of the distributed module. Prefer guard methods
+  (`dist.require_no_model_parallelism(msg)`) over scattered if-else checks.
+- **Training concerns stay in training code.** DDP wrapping, weight freezing,
+  and loss config should not leak into inference-capable code paths.
+- **Facade pattern for multi-PR refactors.** Implement the new class
+  internally, keep the old class as a translation layer, swap in a final PR.
+
+### Naming
+
+- Config classes: append `Config` to the built type (`TrainStepperConfig`).
+- Prefer descriptive names (`noise_distribution` not `distribution`).
+- "scatter" implies communication; "localize" when no communication occurs.
+- Private functions get a `_` prefix.
+
+### Config design
+
+- Validate in `__post_init__`, not at runtime.
+- No redundant options; remove unused fields.
+- Backwards compatibility for checkpoint loading is critical; use deprecation
+  warnings for config removal.
+
+### Testing
+
+- Create helpers for repeated test setup (threshold: 3+ instances).
+- When fixing a bug, add a failing test first.
+- Tests must test behavior, not re-implement logic.
+- Use `xfail` for known bugs, not silent skips.
+- Use non-trivial values (not all-ones) so tests exercise real behavior.
+
+### Code organization
+
+- `if/raise` instead of `assert` in production code.
+- Context managers for cleanup (timers, distributed contexts).
+- Pass composed objects, not their parts.
+- Commit vendorized code unmodified first, then modifications separately.
+
+### Review comment conventions
+
+One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
+Label every comment with a severity prefix:
+- **Issue**: must fix before merge.
+- **Suggestion (optional)**: non-blocking improvement.
+- **Question**: seeking clarification.
+- **nit**: minor style; does not need re-review.
+
 ## Pull Request Review Assistant
 
 Use this same workflow for both initial review and re-review.

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -26,16 +26,119 @@ If you're an external user and want to implement a new feature, **please open an
 issue first to discuss it with us**. This helps ensure the feature aligns with
 the project's direction and prevents wasted effort.
 
+## Code Guidelines
+
+### Design Principles
+
+#### Isolate responsibilities to as few abstraction levels as possible
+
+When designing a change, think first about what absolutely must change, then
+about what level of abstraction that change could be handled in. Choose the
+option that splits the concern across the fewest levels of abstraction. When a
+decision changes in the future, as little code as possible should need to be
+touched.
+
+- **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
+  ... elif isinstance(x, B) ...` chains, the behavior should be a method on
+  the types being checked. A single `isinstance` check can be acceptable, but
+  multiple branches are a sign that logic belongs on the objects themselves.
+- **Keep functions at one level of abstraction.** Don't mix high-level
+  orchestration with low-level implementation details in the same function.
+  Extract helpers when a function operates at mixed levels.
+- **Centralize cross-cutting concerns.** Only `Distributed` should access
+  distributed-aware code. Other modules call `dist.method()` rather than
+  importing `torch.distributed` or backend-specific code directly. Prefer
+  guard methods like `dist.require_no_model_parallelism(msg)` over if-else
+  checks scattered throughout the codebase.
+- **Training-specific concerns belong in training code.** The distributed
+  module wrapper (DDP), weight freezing, and loss configuration are
+  training concerns and should not be coupled into inference-capable code.
+
+#### Refactoring pattern: facade first, swap later
+
+When refactoring configs or APIs across multiple PRs, implement the new
+internal class alongside the old one. The old class becomes a facade that
+constructs the new classes. The final PR deletes the facade and promotes the
+new class. This avoids strange intermediate states where partially-implemented
+features block on breaking YAML changes.
+
+### Naming
+
+- Names should accurately describe behavior. "scatter" implies inter-process
+  communication; use "localize" when each rank computes its own local view.
+- Config class names: append `Config` to the name of the thing being built
+  (e.g. `TrainStepperConfig` builds `TrainStepper`).
+- Prefer descriptive names over abbreviations (`noise_distribution` not
+  `distribution`).
+- Mark functions as private (prefix `_`) when they are only used internally.
+
+### Configuration
+
+- Validate configs eagerly in `__post_init__`, not at runtime. Catch
+  misconfigurations before jobs are submitted.
+- Don't add config options that duplicate existing ones.
+- Remove unused fields rather than leaving them around.
+- Use deprecation warnings rather than hard errors when removing config
+  options, unless a breaking change has been communicated to the team.
+
+### Testing
+
+- **Test helpers over copy-paste.** Create helper functions to build common
+  test fixtures. If the same setup appears 3+ times, extract a helper.
+- **Demonstrate bugs with failing tests.** When fixing a bug, add a test
+  that fails without the fix, then fix it.
+- **Test behavior, not implementation.** If a test re-implements the logic
+  it's testing, it isn't actually verifying anything.
+- **Use xfail for known bugs.** Mark known issues with `pytest.mark.xfail`
+  rather than skipping them silently.
+- **Exercise meaningful values.** Don't use all-ones for area weights or
+  trivial shapes that hide real bugs.
+- **Regression tests for checkpoints.** Maintain regression checkpoints from
+  specific model releases. Use `strict=True` for state dict loading.
+
+### Code Organization
+
+- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
+- Remove unused code, flags, and imports proactively.
+- Use `if condition: raise` instead of `assert` in production code (asserts
+  can be stripped by `python -O`).
+- Use context managers for resource cleanup (timers, distributed contexts).
+- Pass composed objects rather than their parts (e.g. `stepper.loss_scaling`
+  instead of two separate arguments).
+
+### Vendorized Code
+
+When vendorizing external code, commit the unmodified copy first (with
+pre-commit checks skipped if needed), then commit your modifications. This
+makes review transparent by separating "what we copied" from "what we changed".
+
 ## Code Review
 
 Automated CI tests must pass before your pull request will be reviewed. All
 pull requests will be reviewed for:
 - Correctness and functionality
 - Code style, type-hinting and consistency
 - Alignment with project goals
+- Responsibility isolation and abstraction level consistency
+- Appropriate test coverage
 
 Please be responsive to feedback during the review process.
 
+### Review comment conventions
+
+One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
+Reviewers label comments to indicate priority:
+- **Issue**: Must be addressed before merge.
+- **Suggestion (optional)**: Worth considering but non-blocking.
+- **Question**: Seeking clarification; may or may not require changes.
+- **nit**: Minor style preference; does not require re-review.
+
+### PR scope
+
+- Keep PRs focused. Split cleanly separable changes into separate PRs.
+- Non-blocking suggestions are often deferred to follow-on PRs.
+- Config changes typically need multiple rounds of review.
+
 ## Developing
 
 You can install a developer environment using the `conda` package manager with