Add GlobalDummyDatastore for global domain testing by RajdeepKushwaha5 · Pull Request #444 · mllam/neural-lam

RajdeepKushwaha5 · 2026-03-20T02:52:49Z

Describe your changes

Adds a GlobalDummyDatastore test variant that simulates a global domain (no lateral boundaries) by returning an all-zero boundary mask. This enables testing the model's behavior when there is no boundary forcing — a prerequisite for global weather forecasting support.

Changes:

tests/dummy_datastore.py: Added GlobalDummyDatastore subclass of DummyDatastore with SHORT_NAME = "dummydata_global" that overrides boundary_mask to return all zeros.
tests/conftest.py: Registered GlobalDummyDatastore in DATASTORES and DATASTORES_EXAMPLES so all parametrized datastore tests automatically cover it.
tests/test_datastores.py: Relaxed test_boundary_mask to accept all-zero masks (global domains). Changed set(da_mask.values) == {0, 1} to set(da_mask.values).issubset({0, 1}) and replaced rigid sum checks with conditional logic for LAM vs global datastores.
tests/test_training.py: Added test_training_global that verifies training works end-to-end with an all-zero boundary mask (i.e., unroll_prediction uses only model predictions with no boundary forcing).

Motivation: As part of enabling global forecasting (#63), the test infrastructure needs to accommodate datastores without lateral boundaries. Currently test_boundary_mask rejects valid all-zero masks, blocking any global datastore from passing the existing test suite. This PR fixes that gap and adds dedicated global training coverage.

Issue Link

Closes #445

Part of global forecasting support: #63

Type of change

🐛 Bug fix (non-breaking change that fixes an issue)
✨ New feature (non-breaking change that adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

My branch is up-to-date with the target branch - if not update your fork with the changes from the target branch (use pull with --rebase option if possible).
I have performed a self-review of my code
For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
I have updated the README to cover introduced code changes
I have added tests that prove my fix is effective or that my feature works
I have given the PR a name that clearly describes the change, written in imperative form (context).
I have requested a reviewer and an assignee (assignee is responsible for merging). This applies only if you have write access to the repo, otherwise feel free to tag a maintainer to add a reviewer and assignee.

Checklist for reviewers

Each PR comes with its own improvements and flaws. The reviewer should check the following:

the code is readable
the code is well tested
the code is documented (including return types and parameters)
the code is easy to maintain

Author checklist after completed review

I have added a line to the CHANGELOG describing this change, in a section
reflecting type of change (add section where missing):
- added: when you have added new functionality
- changed: when default behaviour of the code has been changed
- fixes: when your contribution fixes a bug
- maintenance: when your contribution is relates to repo maintenance, e.g. CI/CD or documentation

Checklist for assignee

PR is up to date with the base branch

Copilot

Pull request overview

Adds a new global-domain dummy datastore variant to enable testing scenarios with no lateral boundary forcing, and updates tests to accept/cover all-zero boundary masks as a valid “global” case.

Changes:

Introduces GlobalDummyDatastore (boundary mask = all zeros) and registers it in the test datastore registry.
Updates test_boundary_mask to allow global (all-zero) masks in addition to LAM-style masks.
Adds an explicit global training test that runs end-to-end training using the new datastore.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File	Description
tests/dummy_datastore.py	Adds `GlobalDummyDatastore` overriding `boundary_mask` to represent a global domain.
tests/conftest.py	Registers the new datastore kind so existing parametrized tests cover it.
tests/test_datastores.py	Relaxes boundary mask assertions to allow all-zero masks for global domains.
tests/test_training.py	Adds a dedicated global training test and imports `GlobalDummyDatastore`.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-03-20T02:57:10Z

tests/dummy_datastore.py

+            The boundary mask with all zeros, with dimensions
+            ``('grid_index',)``.
+        """
+        return xr.DataArray(
+            np.zeros(self.num_grid_points, dtype=int),
+            dims=["grid_index"],
+        )


GlobalDummyDatastore.boundary_mask returns a new DataArray without the grid_index coordinates (MultiIndex) and without a name, unlike DummyDatastore.boundary_mask and the real datastores. This can break xarray alignment if the mask is ever combined with other datastore DataArrays by coordinates. Consider constructing the all-zero mask via xr.zeros_like(self.ds["boundary_mask"]).astype(int) (or explicitly passing coords={"grid_index": self.ds.grid_index} and name="boundary_mask").

Suggested change

The boundary mask with all zeros, with dimensions

``('grid_index',)``.

"""

return xr.DataArray(

np.zeros(self.num_grid_points, dtype=int),

dims=["grid_index"],

)

The boundary mask with all zeros, preserving the coordinates and

name of the underlying datastore's ``boundary_mask`` variable.

"""

return xr.zeros_like(self.ds["boundary_mask"]).astype(int)

Copilot · 2026-03-20T02:57:11Z

tests/test_training.py

+    """Test that training works with a global-style datastore (boundary mask
+    all zeros).  In a global domain there are no lateral boundaries, so the
+    model's predictions should be used everywhere without any boundary
+    forcing."""
+    datastore = GlobalDummyDatastore()
+
+    # Verify the global property: boundary mask should be all zeros
+    assert datastore.boundary_mask.sum() == 0, (
+        "GlobalDummyDatastore boundary_mask should be all zeros"
+    )
+
+    run_simple_training(datastore, set_output_std=False)


This new test runs a full run_simple_training() again, but dummydata_global is already included in DATASTORES and therefore covered by the existing parametrized test_training. This duplicates the most expensive part of the test suite and can significantly increase CI runtime. Consider converting this to a lightweight assertion-only test (boundary mask is all zeros), or folding the assertion into the existing parametrized test_training branch for dummydata_global so training is executed only once.

Suggested change

"""Test that training works with a global-style datastore (boundary mask

all zeros). In a global domain there are no lateral boundaries, so the

model's predictions should be used everywhere without any boundary

forcing."""

datastore = GlobalDummyDatastore()

# Verify the global property: boundary mask should be all zeros

assert datastore.boundary_mask.sum() == 0, (

"GlobalDummyDatastore boundary_mask should be all zeros"

)

run_simple_training(datastore, set_output_std=False)

"""Test global-style datastore boundary behavior (boundary mask all zeros).

Training with this datastore is already exercised via the parametrized

test_training, so this test only verifies the global boundary mask

property to avoid duplicate expensive training runs.

"""

datastore = init_datastore_example("dummydata_global")

# Verify the global property: boundary mask should be all zeros

assert datastore.boundary_mask.sum() == 0, (

"GlobalDummyDatastore boundary_mask should be all zeros"

)

Copilot · 2026-03-20T02:57:11Z

tests/test_datastores.py

+    if da_mask.sum() > 0:
+        # LAM-style datastore: has both boundary and interior points
+        assert da_mask.sum() < da_mask.size
+    else:
+        # Global-style datastore: no boundary points (all interior)
+        assert da_mask.sum() == 0


da_mask.sum() is computed multiple times and used in both an if and asserts. To make the intent clearer and avoid repeated reductions, consider computing a single scalar once (e.g., mask_sum = int(da_mask.sum())) and using that for the conditional and assertions.

Suggested change

if da_mask.sum() > 0:

# LAM-style datastore: has both boundary and interior points

assert da_mask.sum() < da_mask.size

else:

# Global-style datastore: no boundary points (all interior)

assert da_mask.sum() == 0

mask_sum = int(da_mask.sum())

if mask_sum > 0:

# LAM-style datastore: has both boundary and interior points

assert mask_sum < da_mask.size

else:

# Global-style datastore: no boundary points (all interior)

assert mask_sum == 0

RajdeepKushwaha5 · 2026-03-20T03:11:38Z

Closes #445

Per maintainer feedback (joeloskarsson on mllam#445): a global setup should not have a boundary mask at all. If a mask is present, it must not be all zeros. Changes: - base.py: boundary_mask return type -> Optional[xr.DataArray] - ar_model.py: handle None boundary_mask (all-zero tensor fallback) - dummy_datastore.py: GlobalDummyDatastore.boundary_mask returns None - test_datastores.py: test_boundary_mask handles None; restores strict == {0, 1} check when mask is present - test_training.py: test_training_global asserts boundary_mask is None Closes mllam#445

Copilot AI review requested due to automatic review settings March 20, 2026 02:52

Copilot started reviewing on behalf of RajdeepKushwaha5 March 20, 2026 02:53 View session

Copilot AI reviewed Mar 20, 2026

View reviewed changes

RajdeepKushwaha5 force-pushed the feat/global-dummy-datastore branch from 983021b to ed641f2 Compare March 20, 2026 04:41

sohampatil01-svg mentioned this pull request Mar 20, 2026

feat: Add GlobalDummyDatastore for global domain testing #453

Closed

11 tasks

This was referenced Mar 21, 2026

feat: add geometry utilities for global domains (spherical coords, area weights, KNN graph) #473

Draft

Merge global forecasting capabilities from prob_model_global #63

Open

RajdeepKushwaha5 force-pushed the feat/global-dummy-datastore branch from ed641f2 to cd17add Compare March 21, 2026 18:50

sohampatil01-svg mentioned this pull request Mar 21, 2026

fix/feat/docs: Consolidated PR for #458, #453, and #452 #481

Closed

11 tasks

RajdeepKushwaha5 force-pushed the feat/global-dummy-datastore branch from cd17add to 4702dcb Compare March 21, 2026 19:16

RajdeepKushwaha5 mentioned this pull request Mar 21, 2026

test_boundary_mask rejects valid global (all-zero) boundary masks #445

Open

This was referenced Mar 24, 2026

Implementation for issue #445: Global forecasting support in test infrastructure #509

Closed

refactor: model class hierarchy into Forecaster/StepPredictor layers #513

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GlobalDummyDatastore for global domain testing#444

Add GlobalDummyDatastore for global domain testing#444
RajdeepKushwaha5 wants to merge 1 commit intomllam:mainfrom
RajdeepKushwaha5:feat/global-dummy-datastore

RajdeepKushwaha5 commented Mar 20, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

Copilot AI Mar 20, 2026

Uh oh!

RajdeepKushwaha5 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

RajdeepKushwaha5 commented Mar 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Issue Link

Type of change

Checklist before requesting a review

Checklist for reviewers

Author checklist after completed review

Checklist for assignee

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

Uh oh!

RajdeepKushwaha5 commented Mar 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

RajdeepKushwaha5 commented Mar 20, 2026 •

edited

Loading