Skip to content

feat: Zarr export bridge for ensemble probabilistic forecasts#521

Open
Debadri-das wants to merge 4 commits intomllam:mainfrom
Debadri-das:feature/520-zarr-export-bridge
Open

feat: Zarr export bridge for ensemble probabilistic forecasts#521
Debadri-das wants to merge 4 commits intomllam:mainfrom
Debadri-das:feature/520-zarr-export-bridge

Conversation

@Debadri-das
Copy link
Copy Markdown

@Debadri-das Debadri-das commented Mar 26, 2026

Describe your changes

This PR adds comprehensive support for 4D ensemble tensors (S, T, N, F) to enable probabilistic weather forecast exports to Zarr format for mllam-verification integration.

Problem: Because the neural-lam codebase did not support 4D ensemble tensors and could only handle 2D/3D deterministic forecasts, probabilistic predictions could not be exported to Zarr format for external verification using mllam-verification.

Solution: Implemented full ensemble support across the data pipeline:

  1. WeatherDataset (neural_lam/weather_dataset.py):

    • Added 4D tensor shape validation with elif len(tensor.shape) == 4: branch
    • Implemented ensemble_member dimension in dims list: ["ensemble_member", "time", "grid_index", f"{category}_feature"]
    • Added ensemble_member coordinate binding logic with fallback to np.arange()
    • Updated docstring documenting 2D/3D/4D tensor support
    • Maintains full backward compatibility with existing deterministic workflows
  2. ARModel Documentation (neural_lam/models/ar_model.py):

    • Updated _create_dataarray_from_tensor() docstring with detailed ensemble support documentation
    • Clarified parameter expectations for 3D (deterministic) vs 4D (ensemble) tensors
    • Added Returns section documenting coordinate structure for Zarr export
  3. ARModel Ensemble Handling (neural_lam/models/ar_model.py):

    • Modified unroll_prediction() to detect and handle 5D ensemble inputs (vs 4D deterministic)
    • Reshapes tensors for consistent processing: (B, S, T, N, F)(B*S, T, N, F)
    • Reshapes predictions back to ensemble form after stacking: (B*S, T, N, F)(B, S, T, N, F)
    • Maintains deterministic behavior for 4D inputs
  4. Test Coverage (tests/test_datasets.py):

    • Added test_dataset_item_create_dataarray_from_tensor_4d_ensemble() test
    • Verifies 4D tensor conversion to xarray DataArray with correct dimensions
    • Validates ensemble_member coordinate exists with correct size
    • Checks time coordinate mapping and tensor value preservation
  5. Documentation (CHANGELOG.md):

Motivation and Context

As stated in #62, the core evaluation pipeline is moving toward using mllam-verification (supported by the scores package). We require a bridge to export ensemble predictions to Zarr in order to support probabilistic models (such as Graph-EFM). Ensemble forecasts cannot be used with external verification tools.

Dependencies

No new dependencies are required. All code uses existing packages: PyTorch, xarray, and numpy.

Issue Link

#520

Type of change

  • 🐛 Bug fix (non-breaking change that fixes an issue)
  • ✨ New feature (non-breaking change that adds functionality)
  • 💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • 📖 Documentation (Addition or improvements to documentation)

Checklist before requesting a review

  • My branch is up-to-date with the target branch
  • I have performed a self-review of my code
  • For any new/modified functions/classes I have added docstrings that clearly describe its purpose, expected inputs and returned values
  • I have placed in-line comments to clarify the intent of any hard-to-understand passages of my code
  • I have updated the README to cover introduced code changes (not applicable - internal API change)
  • I have added tests that prove my feature works
  • I have given the PR a name that clearly describes the change, written in imperative form
  • I have requested a reviewer and an assignee (no write access - requesting maintainer to add)

Commits

  1. Add 4D ensemble tensor support to WeatherDataset - Implements shape validation, dimension mapping, and coordinate binding for 4D tensors
  2. Document ensemble support in ARModel._create_dataarray_from_tensor() - Updates docstring with 4D tensor documentation
  3. Add test for 4D ensemble DataArray creation - Comprehensive test coverage for ensemble DataArray conversion
  4. Update CHANGELOG for ensemble Zarr export support - Documents all changes with issue [Feature] Zarr Export Bridge: Addition of ensemble support to DataArray creation for mllam-verification #520 reference

Related Issues

- Support 4D tensors (S, T, N, F) for ensemble predictions
- Add ensemble_member dimension to dims list
- Implement ensemble_member coordinate binding logic
- Maintain backward compatibility with 2D/3D tensors
- Update docstring to document ensemble support

Fixes mllam#520
- Update docstring to document 4D tensor support
- Clarify parameter expectations for ensemble vs deterministic
- Add Returns section with coordinate details
- Enable Zarr export for probabilistic forecasts

Refs mllam#520
- Test tensor with shape (S, T, N, F) where S=ensemble members
- Verify ensemble_member coordinate exists and has correct size
- Verify time coordinate is properly mapped
- Verify all dimensions are correctly named
- Ensure values are preserved in conversion

Refs mllam#520
Document all changes related to issue mllam#520:
- weather_dataset.py 4D tensor support with ensemble_member coordinate
- ar_model.py ensemble prediction handling in _create_dataarray_from_tensor()
- ar_model.py ensemble dimension stacking in unroll_prediction()
- Test coverage for 4D ensemble DataArray creation

Closes mllam#520
@Debadri-das
Copy link
Copy Markdown
Author

Hi @joeloskarsson @sadamov , due to an ongoing nan bounds check issue, I am observing a local failure in test_clamping.py::test_clamping. Should we create a different issue to monitor it even if it has nothing to do with these DataArray modifications?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Zarr Export Bridge: Addition of ensemble support to DataArray creation for mllam-verification

1 participant