Skip to content

Add GlobalDummyDatastore for global domain testing#444

Open
RajdeepKushwaha5 wants to merge 1 commit intomllam:mainfrom
RajdeepKushwaha5:feat/global-dummy-datastore
Open

Add GlobalDummyDatastore for global domain testing#444
RajdeepKushwaha5 wants to merge 1 commit intomllam:mainfrom
RajdeepKushwaha5:feat/global-dummy-datastore

Conversation

@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor

@RajdeepKushwaha5 RajdeepKushwaha5 commented Mar 20, 2026

Describe your changes

Adds a GlobalDummyDatastore test variant that simulates a global domain (no lateral boundaries) by returning an all-zero boundary mask. This enables testing the model's behavior when there is no boundary forcing — a prerequisite for global weather forecasting support.

Changes:

  • tests/dummy_datastore.py: Added GlobalDummyDatastore subclass of DummyDatastore with SHORT_NAME = "dummydata_global" that overrides boundary_mask to return all zeros.
  • tests/conftest.py: Registered GlobalDummyDatastore in DATASTORES and DATASTORES_EXAMPLES so all parametrized datastore tests automatically cover it.
  • tests/test_datastores.py: Relaxed test_boundary_mask to accept all-zero masks (global domains). Changed set(da_mask.values) == {0, 1} to set(da_mask.values).issubset({0, 1}) and replaced rigid sum checks with conditional logic for LAM vs global datastores.
  • tests/test_training.py: Added test_training_global that verifies training works end-to-end with an all-zero boundary mask (i.e., unroll_prediction uses only model predictions with no boundary forcing).

Motivation: As part of enabling global forecasting (#63), the test infrastructure needs to accommodate datastores without lateral boundaries. Currently test_boundary_mask rejects valid all-zero masks, blocking any global datastore from passing the existing test suite. This PR fixes that gap and adds dedicated global training coverage.

Issue Link

Closes #445

Part of global forecasting support: #63

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes
  • I have added tests that prove my fix is effective or that my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form (context).
  • I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

  • the code is readable
  • the code is well tested
  • the code is documented (including return types and parameters)
  • the code is easy to maintain

Author checklist after completed review

  • I have added a line to the CHANGELOG describing this change, in a section
    reflecting type of change (add section where missing):
    • added: when you have added new functionality
    • changed: when default behaviour of the code has been changed
    • fixes: when your contribution fixes a bug
    • maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

  • PR is up to date with the base branch

Copilot AI review requested due to automatic review settings March 20, 2026 02:52
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new global-domain dummy datastore variant to enable testing scenarios with no lateral boundary forcing, and updates tests to accept/cover all-zero boundary masks as a valid “global” case.

Changes:

  • Introduces GlobalDummyDatastore (boundary mask = all zeros) and registers it in the test datastore registry.
  • Updates test_boundary_mask to allow global (all-zero) masks in addition to LAM-style masks.
  • Adds an explicit global training test that runs end-to-end training using the new datastore.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
tests/dummy_datastore.py Adds GlobalDummyDatastore overriding boundary_mask to represent a global domain.
tests/conftest.py Registers the new datastore kind so existing parametrized tests cover it.
tests/test_datastores.py Relaxes boundary mask assertions to allow all-zero masks for global domains.
tests/test_training.py Adds a dedicated global training test and imports GlobalDummyDatastore.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +489 to +495
The boundary mask with all zeros, with dimensions
``('grid_index',)``.
"""
return xr.DataArray(
np.zeros(self.num_grid_points, dtype=int),
dims=["grid_index"],
)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

GlobalDummyDatastore.boundary_mask returns a new DataArray without the grid_index coordinates (MultiIndex) and without a name, unlike DummyDatastore.boundary_mask and the real datastores. This can break xarray alignment if the mask is ever combined with other datastore DataArrays by coordinates. Consider constructing the all-zero mask via xr.zeros_like(self.ds["boundary_mask"]).astype(int) (or explicitly passing coords={"grid_index": self.ds.grid_index} and name="boundary_mask").

Suggested change
The boundary mask with all zeros, with dimensions
``('grid_index',)``.
"""
return xr.DataArray(
np.zeros(self.num_grid_points, dtype=int),
dims=["grid_index"],
)
The boundary mask with all zeros, preserving the coordinates and
name of the underlying datastore's ``boundary_mask`` variable.
"""
return xr.zeros_like(self.ds["boundary_mask"]).astype(int)

Copilot uses AI. Check for mistakes.
Comment on lines +131 to +142
"""Test that training works with a global-style datastore (boundary mask
all zeros). In a global domain there are no lateral boundaries, so the
model's predictions should be used everywhere without any boundary
forcing."""
datastore = GlobalDummyDatastore()

# Verify the global property: boundary mask should be all zeros
assert datastore.boundary_mask.sum() == 0, (
"GlobalDummyDatastore boundary_mask should be all zeros"
)

run_simple_training(datastore, set_output_std=False)
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This new test runs a full run_simple_training() again, but dummydata_global is already included in DATASTORES and therefore covered by the existing parametrized test_training. This duplicates the most expensive part of the test suite and can significantly increase CI runtime. Consider converting this to a lightweight assertion-only test (boundary mask is all zeros), or folding the assertion into the existing parametrized test_training branch for dummydata_global so training is executed only once.

Suggested change
"""Test that training works with a global-style datastore (boundary mask
all zeros). In a global domain there are no lateral boundaries, so the
model's predictions should be used everywhere without any boundary
forcing."""
datastore = GlobalDummyDatastore()
# Verify the global property: boundary mask should be all zeros
assert datastore.boundary_mask.sum() == 0, (
"GlobalDummyDatastore boundary_mask should be all zeros"
)
run_simple_training(datastore, set_output_std=False)
"""Test global-style datastore boundary behavior (boundary mask all zeros).
Training with this datastore is already exercised via the parametrized
test_training, so this test only verifies the global boundary mask
property to avoid duplicate expensive training runs.
"""
datastore = init_datastore_example("dummydata_global")
# Verify the global property: boundary mask should be all zeros
assert datastore.boundary_mask.sum() == 0, (
"GlobalDummyDatastore boundary_mask should be all zeros"
)

Copilot uses AI. Check for mistakes.
Comment on lines +234 to +239
if da_mask.sum() > 0:
# LAM-style datastore: has both boundary and interior points
assert da_mask.sum() < da_mask.size
else:
# Global-style datastore: no boundary points (all interior)
assert da_mask.sum() == 0
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

da_mask.sum() is computed multiple times and used in both an if and asserts. To make the intent clearer and avoid repeated reductions, consider computing a single scalar once (e.g., mask_sum = int(da_mask.sum())) and using that for the conditional and assertions.

Suggested change
if da_mask.sum() > 0:
# LAM-style datastore: has both boundary and interior points
assert da_mask.sum() < da_mask.size
else:
# Global-style datastore: no boundary points (all interior)
assert da_mask.sum() == 0
mask_sum = int(da_mask.sum())
if mask_sum > 0:
# LAM-style datastore: has both boundary and interior points
assert mask_sum < da_mask.size
else:
# Global-style datastore: no boundary points (all interior)
assert mask_sum == 0

Copilot uses AI. Check for mistakes.
@RajdeepKushwaha5
Copy link
Copy Markdown
Contributor Author

Closes #445

Per maintainer feedback (joeloskarsson on mllam#445): a global setup should
not have a boundary mask at all. If a mask is present, it must not be
all zeros.

Changes:
- base.py: boundary_mask return type -> Optional[xr.DataArray]
- ar_model.py: handle None boundary_mask (all-zero tensor fallback)
- dummy_datastore.py: GlobalDummyDatastore.boundary_mask returns None
- test_datastores.py: test_boundary_mask handles None; restores strict
  == {0, 1} check when mask is present
- test_training.py: test_training_global asserts boundary_mask is None

Closes mllam#445
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

test_boundary_mask rejects valid global (all-zero) boundary masks

2 participants