Skip to content
64 changes: 64 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,70 @@ Pre-commit hooks run ruff, ruff-format, and mypy. If ruff-format modifies files,

To use the PR review rules, configure the GitHub MCP server in Cursor's "Tools & MCP" settings. You will need a read-only personal access token with the following permissions: Pull requests, Issues, Contents, Metadata.

## Code Guidelines for Agents

When writing or reviewing code, apply these project-specific guidelines. See
CONTRIBUTING.md for the full details; this section highlights what agents are
most likely to miss.

### Design: abstraction and responsibility isolation

This is the highest-priority review concern. When a decision changes in the
future, as little code as possible should need to be touched.

- **Polymorphism over type-checking.** `if isinstance(x, A) ... elif
isinstance(x, B) ...` chains mean the behavior should be a method on the
types being checked. Flag these in reviews and suggest moving logic to the
relevant classes.
- **One abstraction level per function.** If a function mixes high-level
orchestration with low-level details, suggest extracting helpers.
- **Distributed concerns stay in `Distributed`.** Other code calls
`dist.method()`. Never import `torch.distributed` or backend internals
outside of the distributed module. Prefer guard methods
(`dist.require_no_model_parallelism(msg)`) over scattered if-else checks.
- **Training concerns stay in training code.** DDP wrapping, weight freezing,
and loss config should not leak into inference-capable code paths.
- **Facade pattern for multi-PR refactors.** Implement the new class
internally, keep the old class as a translation layer, swap in a final PR.

### Naming

- Config classes: append `Config` to the built type (`TrainStepperConfig`).
- Prefer descriptive names (`noise_distribution` not `distribution`).
- "scatter" implies communication; "localize" when no communication occurs.
- Private functions get a `_` prefix.

### Config design

- Validate in `__post_init__`, not at runtime.
- No redundant options; remove unused fields.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to have any guidance on backwards compatiblity for configuration parameters of training/eval/inference? It slightly conflicts if we remove top-level configuration fields instead of deprecation warnings. Not that this is a frequent occurence.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added below.

- Backwards compatibility for checkpoint loading is critical; use deprecation
warnings for config removal.

### Testing
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should indicate a preference for fast-running and parsimonious tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm removing this for now in light of @frodre 's article, because I've so far found Claude is very good at writing fast-running tests already.


- Create helpers for repeated test setup (threshold: 3+ instances).
- When fixing a bug, add a failing test first.
- Tests must test behavior, not re-implement logic.
- Use `xfail` for known bugs, not silent skips.
- Use non-trivial values (not all-ones) so tests exercise real behavior.

### Code organization

- `if/raise` instead of `assert` in production code.
- Context managers for cleanup (timers, distributed contexts).
- Pass composed objects, not their parts.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add the specification "if multiple attributes would be use within the function". Something to highlight that I shouldn't pass an object just to grab a single value from it.

- Commit vendorized code unmodified first, then modifications separately.

### Review comment conventions

One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
Label every comment with a severity prefix:
- **Issue**: must fix before merge.
- **Suggestion (optional)**: non-blocking improvement.
- **Question**: seeking clarification.
- **nit**: minor style; does not need re-review.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This duplicates similar content added to CONTRIBUTING.md


## Pull Request Review Assistant

Use this same workflow for both initial review and re-review.
Expand Down
103 changes: 103 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,16 +26,119 @@ If you're an external user and want to implement a new feature, **please open an
issue first to discuss it with us**. This helps ensure the feature aligns with
the project's direction and prevents wasted effort.

## Code Guidelines
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For much of the general advice below, rather than commit to developing and maintaining our own set of detailed coding guidelines and design principles, I think it would be better to simply include links to external style guides and guidelines that match our expectations for the quality of contributions from sources that we know aren't going anywhere.

This way the contributing guidelines focus on things that are specific to fme.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have examples of external style guides to include? These came specifically from looking at our code reviews, rather than an external reference, so I don't have one (except for conventional comments).

I do feel like while some of these are general guidelines, they are ones that are specifically relevant to our code. It's hard to use a lot of CS-focused style guidelines because the nature of our code (our being the end user and developer) means a lot of the rules can be bent or broken. I find it hard in using them to pick out the "right" ones.

For example, isolating responsibilities to one abstraction level is specifically relevant to us because the freedom to make changes to our code in the future is high-value (because our requirements constantly change).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like @brianhenn mentioned, this feels overly verbose for CONTRIBUTING.md since the wall of text may dissuade humans from reading it! I like the sort of FAQ type high-level stuff under the first abstraction section. I'd vote for keeping that and the testing section since those would be most broadly relevant.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I agree. As a first pass, I'll keep those, but move everything else (including the part about refactoring using facades) to AGENTS.md and add a reference to it from here.

It's hard because I feel like this is text all humans should read >.< but I do agree with you.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated.


### Design Principles

#### Isolate responsibilities to as few abstraction levels as possible

When designing a change, think first about what absolutely must change, then
about what level of abstraction that change could be handled in. Choose the
option that splits the concern across the fewest levels of abstraction. When a
decision changes in the future, as little code as possible should need to be
touched.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting this does not mean you should be changing as few lines as possible when putting in the feature. You might need to modify APIs at several levels to isolate the feature into one level of abstraction.


- **Prefer polymorphism over type-checking.** If you see `if isinstance(x, A)
... elif isinstance(x, B) ...` chains, the behavior should be a method on
the types being checked. A single `isinstance` check can be acceptable, but
multiple branches are a sign that logic belongs on the objects themselves.
- **Keep functions at one level of abstraction.** Don't mix high-level
orchestration with low-level implementation details in the same function.
Extract helpers when a function operates at mixed levels.
- **Centralize cross-cutting concerns.** Only `Distributed` should access
distributed-aware code. Other modules call `dist.method()` rather than
importing `torch.distributed` or backend-specific code directly. Prefer
guard methods like `dist.require_no_model_parallelism(msg)` over if-else
checks scattered throughout the codebase.
- **Training-specific concerns belong in training code.** The distributed
module wrapper (DDP), weight freezing, and loss configuration are
training concerns and should not be coupled into inference-capable code.

#### Refactoring pattern: facade first, swap later

When refactoring configs or APIs across multiple PRs, implement the new
internal class alongside the old one. The old class becomes a facade that
constructs the new classes. The final PR deletes the facade and promotes the
new class. This avoids strange intermediate states where partially-implemented
features block on breaking YAML changes.

### Naming

- Names should accurately describe behavior. "scatter" implies inter-process
communication; use "localize" when each rank computes its own local view.
- Config class names: append `Config` to the name of the thing being built
(e.g. `TrainStepperConfig` builds `TrainStepper`).
- Prefer descriptive names over abbreviations (`noise_distribution` not
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth noting descriptions should for the most part only include information that's available in the context of its scope, for example we would avoid calling a function normalize_loss(loss: Tensor) if its internals have nothing to do with loss and could normalize anything - instead it would be normalize(x: Tensor).

`distribution`).
- Mark functions as private (prefix `_`) when they are only used internally.

### Configuration

- Validate configs eagerly in `__post_init__`, not at runtime. Catch
misconfigurations before jobs are submitted.
- Don't add config options that duplicate existing ones.
- Remove unused fields rather than leaving them around.
- Use deprecation warnings rather than hard errors when removing config
options, unless a breaking change has been communicated to the team.

### Testing

- **Test helpers over copy-paste.** Create helper functions to build common
test fixtures. If the same setup appears 3+ times, extract a helper.
Comment on lines +61 to +62
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also worth noting we want to avoid using pytest fixtures, use helpers instead.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that fixtures can quickly become unwieldy, but can also be very useful e.g. for sharing scope across tests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated (see extension of bullet).

- **Demonstrate bugs with failing tests.** When fixing a bug, add a test
that fails without the fix, then fix it.
- **Test behavior, not implementation.** If a test re-implements the logic
it's testing, it isn't actually verifying anything.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also important to have tests that cover important user-story-level details, rather than a lot of tests covering subjective API details, since those make it harder to change those details later and should be used in moderation. Not to say we can't have both, though.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I like tests for lower level operations with tricky logic, but once those are set they kind of feel like dead code compared to the user-story-level details of using the higher-level objects to perform ops.

- **Use xfail for known bugs.** Mark known issues with `pytest.mark.xfail`
rather than skipping them silently.
- **Exercise meaningful values.** Don't use all-ones for area weights or
trivial shapes that hide real bugs.
- **Regression tests for checkpoints.** Maintain regression checkpoints from
specific model releases. Use `strict=True` for state dict loading.

### Code Organization

- Consolidate duplicated code to shared locations (e.g. `fme/core/`).
- Remove unused code, flags, and imports proactively.
- Use `if condition: raise` instead of `assert` in production code (asserts
can be stripped by `python -O`).
- Use context managers for resource cleanup (timers, distributed contexts).
- Pass composed objects rather than their parts (e.g. `stepper.loss_scaling`
instead of two separate arguments).
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prefer to keep this example abstract in case the code changes, so this doesn't go out-of-date, or maybe remove the example altogether.


### Vendorized Code

When vendorizing external code, commit the unmodified copy first (with
pre-commit checks skipped if needed), then commit your modifications. This
makes review transparent by separating "what we copied" from "what we changed".

## Code Review

Automated CI tests must pass before your pull request will be reviewed. All
pull requests will be reviewed for:
- Correctness and functionality
- Code style, type-hinting and consistency
- Alignment with project goals
- Responsibility isolation and abstraction level consistency
- Appropriate test coverage

Please be responsive to feedback during the review process.

### Review comment conventions

One guide our team finds helpful is [Conventional Comments](https://conventionalcomments.org/).
Reviewers label comments to indicate priority:
- **Issue**: Must be addressed before merge.
- **Suggestion (optional)**: Worth considering but non-blocking.
- **Question**: Seeking clarification; may or may not require changes.
- **nit**: Minor style preference; does not require re-review.

### PR scope

- Keep PRs focused. Split cleanly separable changes into separate PRs.
- Non-blocking suggestions are often deferred to follow-on PRs.
- Config changes typically need multiple rounds of review.

## Developing

You can install a developer environment using the `conda` package manager with
Expand Down
Loading