From 91b19a472e0480ca4e4088ccadcf2140d4436065 Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Thu, 12 Mar 2026 18:20:40 +0000
Subject: [PATCH 1/6] Add code and review guidelines to CONTRIBUTING.md and
 AGENTS.md

Document the team's coding standards and review conventions derived
from reviewing PR history. Key additions: responsibility isolation
principles, polymorphism over isinstance chains, config validation,
testing practices, naming conventions, and review comment labeling.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 AGENTS.md       |  63 ++++++++++++++++++++++++++++++
 CONTRIBUTING.md | 102 ++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 165 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index 594c2b0fc..c846eb0b7 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -31,6 +31,69 @@ Pre-commit hooks run ruff, ruff-format, and mypy. If ruff-format modifies files,
 
 To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings. You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
 
+## Code Guidelines for Agents
+
+When writing or reviewing code, apply these project-specific guidelines. See
+CONTRIBUTING.md for the full details; this section highlights what agents are
+most likely to miss.
+
+### Design: abstraction and responsibility isolation
+
+This is the highest-priority review concern. When a decision changes in the
+future, as little code as possible should need to be touched.
+
+- **Polymorphism over type-checking.** `if isinstance(x, A) ... elif
+  isinstance(x, B) ...` chains mean the behavior should be a method on the
+  types being checked. Flag these in reviews and suggest moving logic to the
+  relevant classes.
+- **One abstraction level per function.** If a function mixes high-level
+  orchestration with low-level details, suggest extracting helpers.
+- **Distributed concerns stay in `Distributed`.** Other code calls
+  `dist.method()`. Never import `torch.distributed` or backend internals
+  outside of the distributed module. Prefer guard methods
+  (`dist.require_no_model_parallelism(msg)`) over scattered if-else checks.
+- **Training concerns stay in training code.** DDP wrapping, weight freezing,
+  and loss config should not leak into inference-capable code paths.
+- **Facade pattern for multi-PR refactors.** Implement the new class
+  internally, keep the old class as a translation layer, swap in a final PR.
+
+### Naming
+
+- Config classes: append `Config` to the built type (`TrainStepperConfig`).
+- Prefer descriptive names (`noise_distribution` not `distribution`).
+- "scatter" implies communication; "localize" when no communication occurs.
+- Private functions get a `_` prefix.
+
+### Config design
+
+- Validate in `__post_init__`, not at runtime.
+- No redundant options; remove unused fields.
+- Backwards compatibility for checkpoint loading is critical; use deprecation
+  warnings for config removal.
+
+### Testing
+
+- Create helpers for repeated test setup (threshold: 3+ instances).
+- When fixing a bug, add a failing test first.
+- Tests must test behavior, not re-implement logic.
+- Use `xfail` for known bugs, not silent skips.
+- Use non-trivial values (not all-ones) so tests exercise real behavior.
+
+### Code organization
+
+- `if/raise` instead of `assert` in production code.
+- Context managers for cleanup (timers, distributed contexts).
+- Pass composed objects, not their parts.
+- Commit vendorized code unmodified first, then modifications separately.
+
+### Review comment conventions
+
+Label every comment with a severity prefix:
+- **Issue**: must fix before merge.
+- **Suggestion (optional)**: non-blocking improvement.
+- **Question**: seeking clarification.
+- **nit**: minor style; does not need re-review.
+
 ## Pull Request Review Assistant
 
 Use this same workflow for both initial review and re-review.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9022eb535..63623637a 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -26,6 +26,92 @@ If you're an external user and want to implement a new feature, **please open an
 issue first to discuss it with us**. This helps ensure the feature aligns with
 the project's direction and prevents wasted effort.
 
+## Code Guidelines
+
+### Design Principles
+
+#### Isolate responsibilities to as few abstraction levels as possible
+
+When designing a change, think first about what absolutely must change, then
+about what level of abstraction that change could be handled in. Choose the
+option that splits the concern across the fewest levels of abstraction. When a
+decision changes in the future, as little code as possible should need to be
+touched.
+
+- **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
+  ... elif isinstance(x, B) ...` chains, the behavior should be a method on
+  the types being checked. A single `isinstance` check can be acceptable, but
+  multiple branches are a sign that logic belongs on the objects themselves.
+- **Keep functions at one level of abstraction.** Don't mix high-level
+  orchestration with low-level implementation details in the same function.
+  Extract helpers when a function operates at mixed levels.
+- **Centralize cross-cutting concerns.** Only `Distributed` should access
+  distributed-aware code. Other modules call `dist.method()` rather than
+  importing `torch.distributed` or backend-specific code directly. Prefer
+  guard methods like `dist.require_no_model_parallelism(msg)` over if-else
+  checks scattered throughout the codebase.
+- **Training-specific concerns belong in training code.** The distributed
+  module wrapper (DDP), weight freezing, and loss configuration are
+  training concerns and should not be coupled into inference-capable code.
+
+#### Refactoring pattern: facade first, swap later
+
+When refactoring configs or APIs across multiple PRs, implement the new
+internal class alongside the old one. The old class becomes a facade that
+constructs the new classes. The final PR deletes the facade and promotes the
+new class. This avoids strange intermediate states where partially-implemented
+features block on breaking YAML changes.
+
+### Naming
+
+- Names should accurately describe behavior. "scatter" implies inter-process
+  communication; use "localize" when each rank computes its own local view.
+- Config class names: append `Config` to the name of the thing being built
+  (e.g. `TrainStepperConfig` builds `TrainStepper`).
+- Prefer descriptive names over abbreviations (`noise_distribution` not
+  `distribution`).
+- Mark functions as private (prefix `_`) when they are only used internally.
+
+### Configuration
+
+- Validate configs eagerly in `__post_init__`, not at runtime. Catch
+  misconfigurations before jobs are submitted.
+- Don't add config options that duplicate existing ones.
+- Remove unused fields rather than leaving them around.
+- Use deprecation warnings rather than hard errors when removing config
+  options, unless a breaking change has been communicated to the team.
+
+### Testing
+
+- **Test helpers over copy-paste.** Create helper functions to build common
+  test fixtures. If the same setup appears 3+ times, extract a helper.
+- **Demonstrate bugs with failing tests.** When fixing a bug, add a test
+  that fails without the fix, then fix it.
+- **Test behavior, not implementation.** If a test re-implements the logic
+  it's testing, it isn't actually verifying anything.
+- **Use xfail for known bugs.** Mark known issues with `pytest.mark.xfail`
+  rather than skipping them silently.
+- **Exercise meaningful values.** Don't use all-ones for area weights or
+  trivial shapes that hide real bugs.
+- **Regression tests for checkpoints.** Maintain regression checkpoints from
+  specific model releases. Use `strict=True` for state dict loading.
+
+### Code Organization
+
+- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
+- Remove unused code, flags, and imports proactively.
+- Use `if condition: raise` instead of `assert` in production code (asserts
+  can be stripped by `python -O`).
+- Use context managers for resource cleanup (timers, distributed contexts).
+- Pass composed objects rather than their parts (e.g. `stepper.loss_scaling`
+  instead of two separate arguments).
+
+### Vendorized Code
+
+When vendorizing external code, commit the unmodified copy first (with
+pre-commit checks skipped if needed), then commit your modifications. This
+makes review transparent by separating "what we copied" from "what we changed".
+
 ## Code Review
 
 Automated CI tests must pass before your pull request will be reviewed. All
@@ -33,9 +119,25 @@ pull requests will be reviewed for:
 - Correctness and functionality
 - Code style, type-hinting and consistency
 - Alignment with project goals
+- Responsibility isolation and abstraction level consistency
+- Appropriate test coverage
 
 Please be responsive to feedback during the review process.
 
+### Review comment conventions
+
+Reviewers label comments to indicate priority:
+- **Issue**: Must be addressed before merge.
+- **Suggestion (optional)**: Worth considering but non-blocking.
+- **Question**: Seeking clarification; may or may not require changes.
+- **nit**: Minor style preference; does not require re-review.
+
+### PR scope
+
+- Keep PRs focused. Split cleanly separable changes into separate PRs.
+- Non-blocking suggestions are often deferred to follow-on PRs.
+- Config changes typically need multiple rounds of review.
+
 ## Developing
 
 You can install a developer environment using the `conda` package manager with

From 2b9c790073d50216e2e2e6a688a5e62a949d3e39 Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Thu, 12 Mar 2026 18:25:08 +0000
Subject: [PATCH 2/6] Link to Conventional Comments as reference for review
 labels

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 AGENTS.md       | 1 +
 CONTRIBUTING.md | 1 +
 2 files changed, 2 insertions(+)

diff --git a/AGENTS.md b/AGENTS.md
index c846eb0b7..af96f4cd3 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -88,6 +88,7 @@ future, as little code as possible should need to be touched.
 
 ### Review comment conventions
 
+One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
 Label every comment with a severity prefix:
 - **Issue**: must fix before merge.
 - **Suggestion (optional)**: non-blocking improvement.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 63623637a..ce76357bf 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -126,6 +126,7 @@ Please be responsive to feedback during the review process.
 
 ### Review comment conventions
 
+One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
 Reviewers label comments to indicate priority:
 - **Issue**: Must be addressed before merge.
 - **Suggestion (optional)**: Worth considering but non-blocking.

From 824f7d0105887a60620bb2fd3eaf19406d571a2e Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Fri, 13 Mar 2026 20:37:31 +0000
Subject: [PATCH 3/6] Address review comments with in-place text edits

- Remove concrete stepper.loss_scaling example, keep guidance abstract
- Add note about preferring helpers over pytest fixtures
- Add guidance on user-story-level tests vs API-level tests
- Add scope-context naming guidance (names reflect present scope)
- Clarify that isolating responsibilities != minimizing lines changed
- Clarify "pass composed objects" applies when multiple attributes used
- Add preference for fast-running, parsimonious tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 AGENTS.md       |  5 ++++-
 CONTRIBUTING.md | 20 +++++++++++++++-----
 2 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index af96f4cd3..b053aab65 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -61,6 +61,7 @@ future, as little code as possible should need to be touched.
 
 - Config classes: append `Config` to the built type (`TrainStepperConfig`).
 - Prefer descriptive names (`noise_distribution` not `distribution`).
+  Names should reflect the present scope, not the caller's context.
 - "scatter" implies communication; "localize" when no communication occurs.
 - Private functions get a `_` prefix.
 
@@ -73,6 +74,7 @@ future, as little code as possible should need to be touched.
 
 ### Testing
 
+- Prefer fast-running, parsimonious tests.
 - Create helpers for repeated test setup (threshold: 3+ instances).
 - When fixing a bug, add a failing test first.
 - Tests must test behavior, not re-implement logic.
@@ -83,7 +85,8 @@ future, as little code as possible should need to be touched.
 
 - `if/raise` instead of `assert` in production code.
 - Context managers for cleanup (timers, distributed contexts).
-- Pass composed objects, not their parts.
+- Pass composed objects, not their parts, if multiple attributes would be
+  used within the function.
 - Commit vendorized code unmodified first, then modifications separately.
 
 ### Review comment conventions
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index ce76357bf..9a441dd9b 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -36,7 +36,9 @@ When designing a change, think first about what absolutely must change, then
 about what level of abstraction that change could be handled in. Choose the
 option that splits the concern across the fewest levels of abstraction. When a
 decision changes in the future, as little code as possible should need to be
-touched.
+touched. This does not mean minimizing the number of lines changed — you may
+need to modify APIs at several levels to properly isolate a feature into one
+level of abstraction.
 
 - **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
   ... elif isinstance(x, B) ...` chains, the behavior should be a method on
@@ -69,7 +71,10 @@ features block on breaking YAML changes.
 - Config class names: append `Config` to the name of the thing being built
   (e.g. `TrainStepperConfig` builds `TrainStepper`).
 - Prefer descriptive names over abbreviations (`noise_distribution` not
-  `distribution`).
+  `distribution`). Names should only include information available in the
+  present scope — avoid naming based on the caller's context. For example,
+  a function that normalizes any tensor should be `normalize(x: Tensor)`,
+  not `normalize_loss(loss: Tensor)`.
 - Mark functions as private (prefix `_`) when they are only used internally.
 
 ### Configuration
@@ -85,10 +90,15 @@ features block on breaking YAML changes.
 
 - **Test helpers over copy-paste.** Create helper functions to build common
   test fixtures. If the same setup appears 3+ times, extract a helper.
+  Prefer explicit helper functions over pytest fixtures, which can become
+  unwieldy; use fixtures only when sharing scope across tests is valuable.
 - **Demonstrate bugs with failing tests.** When fixing a bug, add a test
   that fails without the fix, then fix it.
 - **Test behavior, not implementation.** If a test re-implements the logic
-  it's testing, it isn't actually verifying anything.
+  it's testing, it isn't actually verifying anything. Prefer tests that
+  cover important user-story-level behavior over tests that lock down
+  subjective API details, since the latter make it harder to evolve
+  interfaces. Both have a place, but use API-level tests in moderation.
 - **Use xfail for known bugs.** Mark known issues with `pytest.mark.xfail`
   rather than skipping them silently.
 - **Exercise meaningful values.** Don't use all-ones for area weights or
@@ -103,8 +113,8 @@ features block on breaking YAML changes.
 - Use `if condition: raise` instead of `assert` in production code (asserts
   can be stripped by `python -O`).
 - Use context managers for resource cleanup (timers, distributed contexts).
-- Pass composed objects rather than their parts (e.g. `stepper.loss_scaling`
-  instead of two separate arguments).
+- Pass composed objects rather than their parts when multiple attributes would
+  be used within the function.
 
 ### Vendorized Code
 

From cb4e439c7eadd62c84b50720e1ae64a9c30dc895 Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Fri, 13 Mar 2026 20:45:08 +0000
Subject: [PATCH 4/6] Move detailed guidelines from CONTRIBUTING.md to
 AGENTS.md

CONTRIBUTING.md was too verbose per reviewer feedback. Condensed it to
keep only the Design and Testing sections (most broadly relevant),
linked to Google Python Style Guide for general advice, and pointed
readers to AGENTS.md for detailed naming, config, and code org guidance.

Moved the following into AGENTS.md:
- Naming scope-context example (normalize vs normalize_loss)
- Pytest fixtures guidance (prefer helpers)
- User-story-level testing preference
- Code consolidation and cleanup guidelines

Removed review comment conventions from AGENTS.md (duplicated
CONTRIBUTING.md content).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 AGENTS.md       | 18 ++++++-------
 CONTRIBUTING.md | 72 +++++++++----------------------------------------
 2 files changed, 21 insertions(+), 69 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index b053aab65..ef6384413 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -62,6 +62,8 @@ future, as little code as possible should need to be touched.
 - Config classes: append `Config` to the built type (`TrainStepperConfig`).
 - Prefer descriptive names (`noise_distribution` not `distribution`).
   Names should reflect the present scope, not the caller's context.
+  For example, a function that normalizes any tensor should be
+  `normalize(x: Tensor)`, not `normalize_loss(loss: Tensor)`.
 - "scatter" implies communication; "localize" when no communication occurs.
 - Private functions get a `_` prefix.
 
@@ -76,28 +78,24 @@ future, as little code as possible should need to be touched.
 
 - Prefer fast-running, parsimonious tests.
 - Create helpers for repeated test setup (threshold: 3+ instances).
+  Prefer explicit helpers over pytest fixtures; use fixtures only when
+  sharing scope across tests is valuable.
 - When fixing a bug, add a failing test first.
-- Tests must test behavior, not re-implement logic.
+- Tests must test behavior, not re-implement logic. Prefer tests that cover
+  user-story-level behavior over tests that lock down subjective API details.
 - Use `xfail` for known bugs, not silent skips.
 - Use non-trivial values (not all-ones) so tests exercise real behavior.
 
 ### Code organization
 
+- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
+- Remove unused code, flags, and imports proactively.
 - `if/raise` instead of `assert` in production code.
 - Context managers for cleanup (timers, distributed contexts).
 - Pass composed objects, not their parts, if multiple attributes would be
   used within the function.
 - Commit vendorized code unmodified first, then modifications separately.
 
-### Review comment conventions
-
-One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
-Label every comment with a severity prefix:
-- **Issue**: must fix before merge.
-- **Suggestion (optional)**: non-blocking improvement.
-- **Question**: seeking clarification.
-- **nit**: minor style; does not need re-review.
-
 ## Pull Request Review Assistant
 
 Use this same workflow for both initial review and re-review.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 9a441dd9b..a797d4e96 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -28,9 +28,13 @@ the project's direction and prevents wasted effort.
 
 ## Code Guidelines
 
-### Design Principles
+For general Python style we follow the
+[Google Python Style Guide](https://google.github.io/styleguide/pyguide.html).
+The sections below cover what matters most in this project. See
+[AGENTS.md](AGENTS.md) for more detailed guidelines on naming, configuration,
+code organization, and other topics.
 
-#### Isolate responsibilities to as few abstraction levels as possible
+### Design: isolate responsibilities
 
 When designing a change, think first about what absolutely must change, then
 about what level of abstraction that change could be handled in. Choose the
@@ -42,49 +46,15 @@ level of abstraction.
 
 - **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
   ... elif isinstance(x, B) ...` chains, the behavior should be a method on
-  the types being checked. A single `isinstance` check can be acceptable, but
-  multiple branches are a sign that logic belongs on the objects themselves.
+  the types being checked.
 - **Keep functions at one level of abstraction.** Don't mix high-level
-  orchestration with low-level implementation details in the same function.
-  Extract helpers when a function operates at mixed levels.
+  orchestration with low-level details. Extract helpers when needed.
 - **Centralize cross-cutting concerns.** Only `Distributed` should access
-  distributed-aware code. Other modules call `dist.method()` rather than
-  importing `torch.distributed` or backend-specific code directly. Prefer
-  guard methods like `dist.require_no_model_parallelism(msg)` over if-else
-  checks scattered throughout the codebase.
-- **Training-specific concerns belong in training code.** The distributed
-  module wrapper (DDP), weight freezing, and loss configuration are
-  training concerns and should not be coupled into inference-capable code.
-
-#### Refactoring pattern: facade first, swap later
-
-When refactoring configs or APIs across multiple PRs, implement the new
-internal class alongside the old one. The old class becomes a facade that
-constructs the new classes. The final PR deletes the facade and promotes the
-new class. This avoids strange intermediate states where partially-implemented
-features block on breaking YAML changes.
-
-### Naming
-
-- Names should accurately describe behavior. "scatter" implies inter-process
-  communication; use "localize" when each rank computes its own local view.
-- Config class names: append `Config` to the name of the thing being built
-  (e.g. `TrainStepperConfig` builds `TrainStepper`).
-- Prefer descriptive names over abbreviations (`noise_distribution` not
-  `distribution`). Names should only include information available in the
-  present scope — avoid naming based on the caller's context. For example,
-  a function that normalizes any tensor should be `normalize(x: Tensor)`,
-  not `normalize_loss(loss: Tensor)`.
-- Mark functions as private (prefix `_`) when they are only used internally.
-
-### Configuration
-
-- Validate configs eagerly in `__post_init__`, not at runtime. Catch
-  misconfigurations before jobs are submitted.
-- Don't add config options that duplicate existing ones.
-- Remove unused fields rather than leaving them around.
-- Use deprecation warnings rather than hard errors when removing config
-  options, unless a breaking change has been communicated to the team.
+  `torch.distributed`; other modules call `dist.method()`. Training concerns
+  (DDP, weight freezing, loss config) stay in training code.
+- **Facade-first refactors.** When refactoring across multiple PRs, implement
+  the new class internally, keep the old class as a translation layer, then
+  swap in a final PR.
 
 ### Testing
 
@@ -106,22 +76,6 @@ features block on breaking YAML changes.
 - **Regression tests for checkpoints.** Maintain regression checkpoints from
   specific model releases. Use `strict=True` for state dict loading.
 
-### Code Organization
-
-- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
-- Remove unused code, flags, and imports proactively.
-- Use `if condition: raise` instead of `assert` in production code (asserts
-  can be stripped by `python -O`).
-- Use context managers for resource cleanup (timers, distributed contexts).
-- Pass composed objects rather than their parts when multiple attributes would
-  be used within the function.
-
-### Vendorized Code
-
-When vendorizing external code, commit the unmodified copy first (with
-pre-commit checks skipped if needed), then commit your modifications. This
-makes review transparent by separating "what we copied" from "what we changed".
-
 ## Code Review
 
 Automated CI tests must pass before your pull request will be reviewed. All

From 6062e06feee7d388ec98fe3f22b010ae19fcf151 Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Fri, 13 Mar 2026 21:04:39 +0000
Subject: [PATCH 5/6] Remove duplicated design section from AGENTS.md

The design guidelines are already in CONTRIBUTING.md. Replace with a
pointer so agents read them from the single source of truth.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---
 AGENTS.md | 26 +++-----------------------
 1 file changed, 3 insertions(+), 23 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index ef6384413..19392244f 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -33,29 +33,9 @@ To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools &
 
 ## Code Guidelines for Agents
 
-When writing or reviewing code, apply these project-specific guidelines. See
-CONTRIBUTING.md for the full details; this section highlights what agents are
-most likely to miss.
-
-### Design: abstraction and responsibility isolation
-
-This is the highest-priority review concern. When a decision changes in the
-future, as little code as possible should need to be touched.
-
-- **Polymorphism over type-checking.** `if isinstance(x, A) ... elif
-  isinstance(x, B) ...` chains mean the behavior should be a method on the
-  types being checked. Flag these in reviews and suggest moving logic to the
-  relevant classes.
-- **One abstraction level per function.** If a function mixes high-level
-  orchestration with low-level details, suggest extracting helpers.
-- **Distributed concerns stay in `Distributed`.** Other code calls
-  `dist.method()`. Never import `torch.distributed` or backend internals
-  outside of the distributed module. Prefer guard methods
-  (`dist.require_no_model_parallelism(msg)`) over scattered if-else checks.
-- **Training concerns stay in training code.** DDP wrapping, weight freezing,
-  and loss config should not leak into inference-capable code paths.
-- **Facade pattern for multi-PR refactors.** Implement the new class
-  internally, keep the old class as a translation layer, swap in a final PR.
+When writing or reviewing code, read CONTRIBUTING.md for design principles and
+testing guidelines. The sections below cover additional details agents are most
+likely to miss.
 
 ### Naming
 

From a552d937d2f30a0faf202d3c5ef44a61c25504e5 Mon Sep 17 00:00:00 2001
From: Jeremy McGibbon <jeremy.mcgibbon@gmail.com>
Date: Thu, 19 Mar 2026 15:33:53 +0000
Subject: [PATCH 6/6] modify AGENTS.md to focus on constraints we want to
 enforce

---
 AGENTS.md | 39 +++++++++++----------------------------
 1 file changed, 11 insertions(+), 28 deletions(-)

diff --git a/AGENTS.md b/AGENTS.md
index 19392244f..c903ef1c5 100644
--- a/AGENTS.md
+++ b/AGENTS.md
@@ -29,52 +29,34 @@ Pre-commit hooks run ruff, ruff-format, and mypy. If ruff-format modifies files,
 
 ### GitHub MCP Server Setup
 
-To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings. You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
+To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings.
+You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.
 
 ## Code Guidelines for Agents
 
-When writing or reviewing code, read CONTRIBUTING.md for design principles and
-testing guidelines. The sections below cover additional details agents are most
-likely to miss.
-
 ### Naming
 
-- Config classes: append `Config` to the built type (`TrainStepperConfig`).
-- Prefer descriptive names (`noise_distribution` not `distribution`).
-  Names should reflect the present scope, not the caller's context.
-  For example, a function that normalizes any tensor should be
-  `normalize(x: Tensor)`, not `normalize_loss(loss: Tensor)`.
-- "scatter" implies communication; "localize" when no communication occurs.
+- Config classes loaded from user-specified yaml: append `Config` to the built type (`TrainStepperConfig`).
 - Private functions get a `_` prefix.
 
 ### Config design
 
 - Validate in `__post_init__`, not at runtime.
-- No redundant options; remove unused fields.
-- Backwards compatibility for checkpoint loading is critical; use deprecation
-  warnings for config removal.
+- Config loading backwards compatibility for inference is critical, but can be broken for training; use deprecation warnings for config removal. Ask user if unsure.
 
 ### Testing
 
-- Prefer fast-running, parsimonious tests.
 - Create helpers for repeated test setup (threshold: 3+ instances).
-  Prefer explicit helpers over pytest fixtures; use fixtures only when
-  sharing scope across tests is valuable.
+- Prefer explicit helpers over pytest fixtures; use fixtures only when sharing scope across tests is valuable.
 - When fixing a bug, add a failing test first.
-- Tests must test behavior, not re-implement logic. Prefer tests that cover
-  user-story-level behavior over tests that lock down subjective API details.
-- Use `xfail` for known bugs, not silent skips.
-- Use non-trivial values (not all-ones) so tests exercise real behavior.
+- Prefer tests that cover user-story-level behavior over tests that lock down subjective API details.
+- Helper function `validate_tensor_dict` is available for regression testing against a saved reference
 
 ### Code organization
 
 - Consolidate duplicated code to shared locations (e.g. `fme/core/`).
-- Remove unused code, flags, and imports proactively.
-- `if/raise` instead of `assert` in production code.
-- Context managers for cleanup (timers, distributed contexts).
-- Pass composed objects, not their parts, if multiple attributes would be
-  used within the function.
-- Commit vendorized code unmodified first, then modifications separately.
+- `fme/core` is not allowed to import from `fme.ace` or other submodules.
+- `fme/ace` is allowed to import from `fme.core`, but not from other submodules.
 
 ## Pull Request Review Assistant
 
@@ -105,7 +87,8 @@ Use these severity buckets:
 - **Suggestions (Should Consider)**: performance, error handling, clarity/design improvements
 - **Minor/Nitpicks (Optional)**: style, naming, docs polish
 
-For re-reviews, classify prior comments as **Addressed**, **Partially Addressed**, **Unaddressed**, or **Dismissed**. Treat clear author rationale as addressed when appropriate.
+For re-reviews, classify prior comments as **Addressed**, **Partially Addressed**, **Unaddressed**, or **Dismissed**.
+Treat clear author rationale as addressed when appropriate.
 
 ### 4) Output format