ai2cm · mcgibbon · Mar 12, 2026 · Mar 12, 2026 · Mar 13, 2026 · Mar 13, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -29,7 +29,34 @@ Pre-commit hooks run ruff, ruff-format, and mypy. If ruff-format modifies files,
 
 ### GitHub MCP Server Setup
 
-To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings. You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
+To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings.
+You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
+
+## Code Guidelines for Agents
+
+### Naming
+
+- Config classes loaded from user-specified yaml: append `Config` to the built type (`TrainStepperConfig`).
+- Private functions get a `_` prefix.
+
+### Config design
+
+- Validate in `__post_init__`, not at runtime.
+- Config loading backwards compatibility for inference is critical, but can be broken for training; use deprecation warnings for config removal. Ask user if unsure.
+
+### Testing
+
+- Create helpers for repeated test setup (threshold: 3+ instances).
+- Prefer explicit helpers over pytest fixtures; use fixtures only when sharing scope across tests is valuable.
+- When fixing a bug, add a failing test first.
+- Prefer tests that cover user-story-level behavior over tests that lock down subjective API details.
+- Helper function `validate_tensor_dict` is available for regression testing against a saved reference
+
+### Code organization
+
+- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
+- `fme/core` is not allowed to import from `fme.ace` or other submodules.
+- `fme/ace` is allowed to import from `fme.core`, but not from other submodules.
 
 ## Pull Request Review Assistant
 
@@ -60,7 +87,8 @@ Use these severity buckets:
 - **Suggestions (Should Consider)**: performance, error handling, clarity/design improvements
 - **Minor/Nitpicks (Optional)**: style, naming, docs polish
 
-For re-reviews, classify prior comments as **Addressed**, **Partially Addressed**, **Unaddressed**, or **Dismissed**. Treat clear author rationale as addressed when appropriate.
+For re-reviews, classify prior comments as **Addressed**, **Partially Addressed**, **Unaddressed**, or **Dismissed**.
+Treat clear author rationale as addressed when appropriate.
 
 ### 4) Output format
 

diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -26,16 +26,83 @@ If you're an external user and want to implement a new feature, **please open an
 issue first to discuss it with us**. This helps ensure the feature aligns with
 the project's direction and prevents wasted effort.
 
+## Code Guidelines
+
+For general Python style we follow the
+[Google Python Style Guide](https://google.github.io/styleguide/pyguide.html).
+The sections below cover what matters most in this project. See
+[AGENTS.md](AGENTS.md) for more detailed guidelines on naming, configuration,
+code organization, and other topics.
+
+### Design: isolate responsibilities
+
+When designing a change, think first about what absolutely must change, then
+about what level of abstraction that change could be handled in. Choose the
+option that splits the concern across the fewest levels of abstraction. When a
+decision changes in the future, as little code as possible should need to be
+touched. This does not mean minimizing the number of lines changed — you may
+need to modify APIs at several levels to properly isolate a feature into one
+level of abstraction.
+
+- **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
+  ... elif isinstance(x, B) ...` chains, the behavior should be a method on
+  the types being checked.
+- **Keep functions at one level of abstraction.** Don't mix high-level
+  orchestration with low-level details. Extract helpers when needed.
+- **Centralize cross-cutting concerns.** Only `Distributed` should access
+  `torch.distributed`; other modules call `dist.method()`. Training concerns
+  (DDP, weight freezing, loss config) stay in training code.
+- **Facade-first refactors.** When refactoring across multiple PRs, implement
+  the new class internally, keep the old class as a translation layer, then
+  swap in a final PR.
+
+### Testing
+
+- **Test helpers over copy-paste.** Create helper functions to build common
+  test fixtures. If the same setup appears 3+ times, extract a helper.
+  Prefer explicit helper functions over pytest fixtures, which can become
+  unwieldy; use fixtures only when sharing scope across tests is valuable.
+- **Demonstrate bugs with failing tests.** When fixing a bug, add a test
+  that fails without the fix, then fix it.
+- **Test behavior, not implementation.** If a test re-implements the logic
+  it's testing, it isn't actually verifying anything. Prefer tests that
+  cover important user-story-level behavior over tests that lock down
+  subjective API details, since the latter make it harder to evolve
+  interfaces. Both have a place, but use API-level tests in moderation.
+- **Use xfail for known bugs.** Mark known issues with `pytest.mark.xfail`
+  rather than skipping them silently.
+- **Exercise meaningful values.** Don't use all-ones for area weights or
+  trivial shapes that hide real bugs.
+- **Regression tests for checkpoints.** Maintain regression checkpoints from
+  specific model releases. Use `strict=True` for state dict loading.
+
 ## Code Review
 
 Automated CI tests must pass before your pull request will be reviewed. All
 pull requests will be reviewed for:
 - Correctness and functionality
 - Code style, type-hinting and consistency
 - Alignment with project goals
+- Responsibility isolation and abstraction level consistency
+- Appropriate test coverage
 
 Please be responsive to feedback during the review process.
 
+### Review comment conventions
+
+One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
+Reviewers label comments to indicate priority:
+- **Issue**: Must be addressed before merge.
+- **Suggestion (optional)**: Worth considering but non-blocking.
+- **Question**: Seeking clarification; may or may not require changes.
+- **nit**: Minor style preference; does not require re-review.
+
+### PR scope
+
+- Keep PRs focused. Split cleanly separable changes into separate PRs.
+- Non-blocking suggestions are often deferred to follow-on PRs.
+- Config changes typically need multiple rounds of review.
+
 ## Developing
 
 You can install a developer environment using the `conda` package manager with