Skip to content

Feature/high level object#919

Merged
sandorkertesz merged 13 commits intorelease/1.0.0rc0from
feature/high-level-object
Mar 10, 2026
Merged

Feature/high level object#919
sandorkertesz merged 13 commits intorelease/1.0.0rc0from
feature/high-level-object

Conversation

@sandorkertesz
Copy link
Collaborator

@sandorkertesz sandorkertesz commented Mar 10, 2026

Description

from_source() and form_object() now return a Data (high-level) object with the following API:

>>> ds = from_source("file", "my.grib")
>>> ds.available_types
['fieldlist', 'xarray', 'pandas', 'numpy', 'array']
>>> fl = ds.to_fieldlist()
>>> len(fl)
6
>>> ds.to_target("file", "mypath/mydata.grib")
>>> ds = from_source("file", "my.nc")
>>> ds.available_types
['xarray', 'pandas', 'fieldlist', 'numpy', 'array']
>>> a = ds.to_xarray()
>>> a
xarray.Dataset
...

Data

The Data object is polymorphic. See the src/earthkit/data/data.

Features

  • Data objects have a describe() method that is yet to be properly implemented
  • Data loading is delayed as long as possible
  • Data objects can be used in concat() and to_target(). In concat() mixing Data and non-Data objects (e.g. fieldlists) is not allowed.

Sources

Sources should implement the to_data_object() method to return an appropriate Data object. If it is not implemented a DefaultSourceData is returned. A default was added to make the Source plugins (if there is any) work. This might be reviewed and force each source to implement to_data_object().

The File source delegates the to_data_object() call to the underlying Reader object.

from_source() now calls form_source_internally(), which still returns a Source. All existing internal calls to form_source() was replaced with form_source_internally().

Readers

Readers now must implement a to_data_object() method. They must also implement _encode_default().

Encoders

The encoders were extended to almost all the types the Readers can handle. It is now ensured that in calls like this:

ds = from_source("url", "my_grib_file_from_a_url")
ds.to_target("file", "my.grib")

the files are copied with a simple file copy to the target location without involving any encdoding/parsing.

FeatureList

FeatureList is a new object, similar to a FieldList, but it is an iterable of "features", where a "feature" can be anything. Used for list-like data that cannot be represented as Fields. Right now the following data types can be converted to a FeatureList (with calling to_featurelist() on the Data object):

  • BUFR: a feature represents a BUFR message
  • GeoJson: a feature is a row in the underlying GeoPandas geodataframe
  • ShapeFile: a feature is a row in the underlying GeoPandas geodataframe
  • GeoPandas: a feature is a row in the underlying GeoPandas geodataframe

Still work in progress, but BUFR has a complete implementation.

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@sandorkertesz sandorkertesz merged commit 573b883 into release/1.0.0rc0 Mar 10, 2026
99 of 106 checks passed
@sandorkertesz sandorkertesz deleted the feature/high-level-object branch March 10, 2026 15:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant