Skip to content

Conversation

clessig
Copy link
Collaborator

@clessig clessig commented Sep 4, 2025

Description

Data reader for station-like data in simple netCDF format

Type of Change

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • Documentation update

Issue Number

Closes #862

Code Compatibility

  • I have performed a self-review of my code

Code Performance and Testing

  • I ran the uv run train and (if necessary) uv run evaluate on a least one GPU node and it works
  • If the new feature introduces modifications at the config level, I have made sure to have notified the other software developers through Mattermost and updated the paths in the $WEATHER_GENERATOR_PRIVATE directory

Dependencies

  • I have ensured that the code is still pip-installable after the changes and runs
  • I have tested that new dependencies themselves are pip-installable.
  • I have not introduced new dependencies in the inference portion of the pipeline

Documentation

  • My code follows the style guidelines of this project
  • I have updated the documentation and docstrings to reflect the changes
  • I have added comments to my code, particularly in hard-to-understand areas

Additional Notes

self.len = len(ds)

self.offset_data_channels = 4
self.fillvalue = ds["air_temperature"][0, 0].values.item()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should force the user to set a _FillValue attribute in the NetCDF files if there are missing data. I forgot to do this with my file (well actually, if you are using the default fill value of 9.96921e+36, then you don't need to set the flag, however xarray doesn't recognize this).

When _FillValue is set, xr.open_dataset fill automatically convert missing values to NaNs.

self.ds = ds
self.len = len(ds)

self.offset_data_channels = 4
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Line 91 can be removed now that the user specifically select channels



class DataReaderSynop(DataReaderTimestep):
"Wrapper for SYNOP datasets from MetNo in NetCDF"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be useful to specify the requirements of the NetCDF file? Here is a suggestion:

Generic parser for station data in NetCDF format. The file must have 2 dimensions: time and location.
- Data variables must have dimensions in the following order (time, location). The names of the dimension can be anything.
- Geoinfo variables must have dimension (location,)
- Any variable with missing values must have the _FillValue attribute set
- A latitude and longitude variable with dimension (location,) must be provided. The units must be degrees and the variable name can be configured.
- A variable called time must be provided and have dimension (time,). Units must follow CF-conventions and the variable must have a units attribute.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

Data reader for simple station/synop data in netCDF format
2 participants