Add GlobalDummyDatastore for global domain testing#444
Add GlobalDummyDatastore for global domain testing#444RajdeepKushwaha5 wants to merge 1 commit intomllam:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds a new global-domain dummy datastore variant to enable testing scenarios with no lateral boundary forcing, and updates tests to accept/cover all-zero boundary masks as a valid “global” case.
Changes:
- Introduces
GlobalDummyDatastore(boundary mask = all zeros) and registers it in the test datastore registry. - Updates
test_boundary_maskto allow global (all-zero) masks in addition to LAM-style masks. - Adds an explicit global training test that runs end-to-end training using the new datastore.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| tests/dummy_datastore.py | Adds GlobalDummyDatastore overriding boundary_mask to represent a global domain. |
| tests/conftest.py | Registers the new datastore kind so existing parametrized tests cover it. |
| tests/test_datastores.py | Relaxes boundary mask assertions to allow all-zero masks for global domains. |
| tests/test_training.py | Adds a dedicated global training test and imports GlobalDummyDatastore. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
tests/dummy_datastore.py
Outdated
| The boundary mask with all zeros, with dimensions | ||
| ``('grid_index',)``. | ||
| """ | ||
| return xr.DataArray( | ||
| np.zeros(self.num_grid_points, dtype=int), | ||
| dims=["grid_index"], | ||
| ) |
There was a problem hiding this comment.
GlobalDummyDatastore.boundary_mask returns a new DataArray without the grid_index coordinates (MultiIndex) and without a name, unlike DummyDatastore.boundary_mask and the real datastores. This can break xarray alignment if the mask is ever combined with other datastore DataArrays by coordinates. Consider constructing the all-zero mask via xr.zeros_like(self.ds["boundary_mask"]).astype(int) (or explicitly passing coords={"grid_index": self.ds.grid_index} and name="boundary_mask").
| The boundary mask with all zeros, with dimensions | |
| ``('grid_index',)``. | |
| """ | |
| return xr.DataArray( | |
| np.zeros(self.num_grid_points, dtype=int), | |
| dims=["grid_index"], | |
| ) | |
| The boundary mask with all zeros, preserving the coordinates and | |
| name of the underlying datastore's ``boundary_mask`` variable. | |
| """ | |
| return xr.zeros_like(self.ds["boundary_mask"]).astype(int) |
tests/test_training.py
Outdated
| """Test that training works with a global-style datastore (boundary mask | ||
| all zeros). In a global domain there are no lateral boundaries, so the | ||
| model's predictions should be used everywhere without any boundary | ||
| forcing.""" | ||
| datastore = GlobalDummyDatastore() | ||
|
|
||
| # Verify the global property: boundary mask should be all zeros | ||
| assert datastore.boundary_mask.sum() == 0, ( | ||
| "GlobalDummyDatastore boundary_mask should be all zeros" | ||
| ) | ||
|
|
||
| run_simple_training(datastore, set_output_std=False) |
There was a problem hiding this comment.
This new test runs a full run_simple_training() again, but dummydata_global is already included in DATASTORES and therefore covered by the existing parametrized test_training. This duplicates the most expensive part of the test suite and can significantly increase CI runtime. Consider converting this to a lightweight assertion-only test (boundary mask is all zeros), or folding the assertion into the existing parametrized test_training branch for dummydata_global so training is executed only once.
| """Test that training works with a global-style datastore (boundary mask | |
| all zeros). In a global domain there are no lateral boundaries, so the | |
| model's predictions should be used everywhere without any boundary | |
| forcing.""" | |
| datastore = GlobalDummyDatastore() | |
| # Verify the global property: boundary mask should be all zeros | |
| assert datastore.boundary_mask.sum() == 0, ( | |
| "GlobalDummyDatastore boundary_mask should be all zeros" | |
| ) | |
| run_simple_training(datastore, set_output_std=False) | |
| """Test global-style datastore boundary behavior (boundary mask all zeros). | |
| Training with this datastore is already exercised via the parametrized | |
| test_training, so this test only verifies the global boundary mask | |
| property to avoid duplicate expensive training runs. | |
| """ | |
| datastore = init_datastore_example("dummydata_global") | |
| # Verify the global property: boundary mask should be all zeros | |
| assert datastore.boundary_mask.sum() == 0, ( | |
| "GlobalDummyDatastore boundary_mask should be all zeros" | |
| ) |
tests/test_datastores.py
Outdated
| if da_mask.sum() > 0: | ||
| # LAM-style datastore: has both boundary and interior points | ||
| assert da_mask.sum() < da_mask.size | ||
| else: | ||
| # Global-style datastore: no boundary points (all interior) | ||
| assert da_mask.sum() == 0 |
There was a problem hiding this comment.
da_mask.sum() is computed multiple times and used in both an if and asserts. To make the intent clearer and avoid repeated reductions, consider computing a single scalar once (e.g., mask_sum = int(da_mask.sum())) and using that for the conditional and assertions.
| if da_mask.sum() > 0: | |
| # LAM-style datastore: has both boundary and interior points | |
| assert da_mask.sum() < da_mask.size | |
| else: | |
| # Global-style datastore: no boundary points (all interior) | |
| assert da_mask.sum() == 0 | |
| mask_sum = int(da_mask.sum()) | |
| if mask_sum > 0: | |
| # LAM-style datastore: has both boundary and interior points | |
| assert mask_sum < da_mask.size | |
| else: | |
| # Global-style datastore: no boundary points (all interior) | |
| assert mask_sum == 0 |
|
Closes #445 |
983021b to
ed641f2
Compare
ed641f2 to
cd17add
Compare
Per maintainer feedback (joeloskarsson on mllam#445): a global setup should not have a boundary mask at all. If a mask is present, it must not be all zeros. Changes: - base.py: boundary_mask return type -> Optional[xr.DataArray] - ar_model.py: handle None boundary_mask (all-zero tensor fallback) - dummy_datastore.py: GlobalDummyDatastore.boundary_mask returns None - test_datastores.py: test_boundary_mask handles None; restores strict == {0, 1} check when mask is present - test_training.py: test_training_global asserts boundary_mask is None Closes mllam#445
cd17add to
4702dcb
Compare
Describe your changes
Adds a
GlobalDummyDatastoretest variant that simulates a global domain (no lateral boundaries) by returning an all-zero boundary mask. This enables testing the model's behavior when there is no boundary forcing — a prerequisite for global weather forecasting support.Changes:
tests/dummy_datastore.py: AddedGlobalDummyDatastoresubclass ofDummyDatastorewithSHORT_NAME = "dummydata_global"that overridesboundary_maskto return all zeros.tests/conftest.py: RegisteredGlobalDummyDatastoreinDATASTORESandDATASTORES_EXAMPLESso all parametrized datastore tests automatically cover it.tests/test_datastores.py: Relaxedtest_boundary_maskto accept all-zero masks (global domains). Changedset(da_mask.values) == {0, 1}toset(da_mask.values).issubset({0, 1})and replaced rigid sum checks with conditional logic for LAM vs global datastores.tests/test_training.py: Addedtest_training_globalthat verifies training works end-to-end with an all-zero boundary mask (i.e.,unroll_predictionuses only model predictions with no boundary forcing).Motivation: As part of enabling global forecasting (#63), the test infrastructure needs to accommodate datastores without lateral boundaries. Currently
test_boundary_maskrejects valid all-zero masks, blocking any global datastore from passing the existing test suite. This PR fixes that gap and adds dedicated global training coverage.Issue Link
Closes #445
Part of global forecasting support: #63
Type of change
Checklist before requesting a review
pullwith--rebaseoption if possible).Checklist for reviewers
Each PR comes with its own improvements and flaws. The reviewer should check the following:
Author checklist after completed review
reflecting type of change (add section where missing):
Checklist for assignee