-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Import datatree in xarray? #7418
Changes from 24 commits
7c6fa70
5ef43be
d184764
d986df3
d2e8ec3
08ff5c4
1401ca5
b153152
c5b8d10
62b5e27
ffa53c4
a8f752d
eed3a71
3d3c29f
74fea3a
95d76e6
caafe90
462e0b3
91c6ee1
bc6a538
3baf79e
667d5cd
dfe763b
d231055
ae07dfd
6343104
395a3ae
7cf1d55
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
- sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -45,4 +45,5 @@ dependencies: | |
# - sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
# - sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -41,4 +41,5 @@ dependencies: | |
- sparse | ||
- toolz | ||
- typing_extensions | ||
- xarray-datatree | ||
- zarr |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -46,3 +46,5 @@ dependencies: | |
- toolz | ||
- typing_extensions | ||
- zarr | ||
- pip: | ||
- git+https://github.com/xarray-contrib/datatree | ||
Comment on lines
+49
to
+50
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is there a reason why we're installing from github here? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because I want to see if this commit to datatree fixes the mypy issue without releasing a whole new version of datatree just to check. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -156,6 +156,9 @@ to the original netCDF file, regardless if they exist in the original dataset. | |
Groups | ||
~~~~~~ | ||
|
||
Single groups as datasets | ||
......................... | ||
|
||
NetCDF groups are not supported as part of the :py:class:`Dataset` data model. | ||
Instead, groups can be loaded individually as Dataset objects. | ||
To do so, pass a ``group`` keyword argument to the | ||
|
@@ -228,10 +231,34 @@ Either of these groups can be loaded from the file as an independent :py:class:` | |
Data variables: | ||
b int64 ... | ||
|
||
.. note:: | ||
.. _io.netcdf_datatree_groups: | ||
|
||
Multiple Groups as a DataTree | ||
............................. | ||
|
||
For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental | ||
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package. | ||
If installed, this package's API can be imported directly from xarray, i.e. ``from xarray import DataTree``. | ||
|
||
Whilst netCDF groups can only be loaded individually as Dataset objects, a whole file of many nested groups can be loaded | ||
as a single :py:class:`DataTree` object. | ||
To open a whole netCDF file as a tree of groups use the :py:func:`open_datatree()` function. | ||
To save a DataTree object as a netCDF file containing many groups, use the :py:meth:`DataTree.to_netcdf()`` method. | ||
|
||
.. _netcdf.group.warning: | ||
|
||
.. warning:: | ||
``DataTree`` objects do not follow the exact same data model as netCDF files, which means that perfect round-tripping | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is that intentionally preformatted, or would it make sense to convert it to a link? (that's really minor, though) |
||
is not always possible. | ||
|
||
In particular in the netCDF data model dimensions are entities that can exist regardless of whether any variable possesses them. | ||
This is in contrast to `xarray's data model <https://docs.xarray.dev/en/stable/user-guide/data-structures.html>`_ | ||
(and hence `datatree's data model <https://xarray-datatree.readthedocs.io/en/latest/data-structures.html>`_) in which the dimensions of a (Dataset/Tree) | ||
object are simply the set of dimensions present across all variables in that dataset. | ||
|
||
For native handling of multiple groups with xarray, including I/O, you might be interested in the experimental | ||
`xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package. | ||
This means that if a netCDF file contains dimensions but no variables which possess those dimensions, | ||
these dimensions will not be present when that file is opened as a DataTree object. | ||
Saving this DataTree object to file will therefore not preserve these "unused" dimensions. | ||
|
||
|
||
.. _io.encoding: | ||
|
@@ -633,6 +660,21 @@ To read back a zarr dataset that has been created this way, we use the | |
ds_zarr = xr.open_zarr("path/to/directory.zarr") | ||
ds_zarr | ||
|
||
Groups | ||
~~~~~~ | ||
|
||
Like for netCDF, zarr groups can either be opened as individual :py:class:`Dataset` objects using the ``group`` keyword argument to :py:func:`open_dataset`, | ||
or alternatively nested groups in zarr stores can be represented by loading the store as a :py:class:`DataTree` object. | ||
(The latter option requires that you have the `xarray-datatree <https://github.com/xarray-contrib/datatree>`_ package installed.) | ||
|
||
To open a whole zarr store as a tree of groups use the :py:func:`open_datatree()` function. | ||
To save a DataTree object as a zarr store containing many groups, use the :py:meth:`DataTree.to_zarr()` method. | ||
|
||
.. note:: | ||
Note that perfect round-tripping should always be possible with a zarr store (:ref:`unlike for netCDF files<netcdf.group.warning>`), | ||
as zarr does not support "unused" dimensions. | ||
|
||
|
||
Cloud Storage Buckets | ||
~~~~~~~~~~~~~~~~~~~~~ | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||
---|---|---|---|---|---|---|
|
@@ -52,6 +52,12 @@ | |||||
# Disable minimum version checks on downstream libraries. | ||||||
__version__ = "999" | ||||||
|
||||||
try: | ||||||
from datatree import DataTree # noqa | ||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
I think this is why the docs build is failing. Also, not sure if the error code still works with There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I actually found that if I don't add that error code then ruff replaces try:
from datatree import DataTree with try:
pass which obviously caused an ImportError. I thought that was surprising behavior for a linter too... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've changed this to from datatree import DataTree, register_datatree_accessor, open_datatree # noqa now though |
||||||
except ImportError: | ||||||
... | ||||||
TomNicholas marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||
|
||||||
|
||||||
# A hardcoded __all__ variable is necessary to appease | ||||||
# `mypy --strict` running in projects that import xarray. | ||||||
__all__ = ( | ||||||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3656,6 +3656,48 @@ def reduce( | |
var = self.variable.reduce(func, dim, axis, keep_attrs, keepdims, **kwargs) | ||
return self._replace_maybe_drop_dims(var) | ||
|
||
def to_datatree(self, node_name: str | None = None, name: str | None = None): | ||
""" | ||
Convert this dataarray into a datatree.DataTree. | ||
|
||
WARNING: The DataTree structure is considered experimental, | ||
and the API is less solidified than for other xarray features. | ||
|
||
The returned tree will only consist of a single node. | ||
That node will contain a copy of the dataarray's data, | ||
meaning including its coordinates, dimensions and attributes. | ||
|
||
Requires the xarray-datatree package to be installed. | ||
Find it at https://github.com/xarray-contrib/datatree. | ||
Comment on lines
+3663
to
+3671
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. should this also be moved into a warning block? |
||
|
||
Parameters | ||
---------- | ||
node_name: str, optional | ||
The name of the datatree node created. | ||
name: str, optional | ||
Name to substitute for this array's name. | ||
|
||
Returns | ||
------- | ||
dt : DataTree | ||
A single-node datatree object, containing the information from this dataarray. | ||
|
||
See Also | ||
-------- | ||
datatree.DataTree | ||
""" | ||
|
||
try: | ||
from datatree import DataTree | ||
except ImportError: | ||
raise ImportError( | ||
"Could not import the datatree package. " | ||
"Find it at https://github.com/xarray-contrib/datatree" | ||
) | ||
|
||
ds = self.to_dataset(name=name) | ||
return DataTree(data=ds, name=node_name) | ||
|
||
def to_pandas(self) -> DataArray | pd.Series | pd.DataFrame: | ||
"""Convert this array into a pandas object with the same shape. | ||
|
||
|
Original file line number | Diff line number | Diff line change | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -6116,6 +6116,45 @@ def to_array( | |||||||||||
|
||||||||||||
return DataArray._construct_direct(variable, coords, name, indexes) | ||||||||||||
|
||||||||||||
def to_datatree(self, node_name: str | None = None): | ||||||||||||
""" | ||||||||||||
Convert this dataset into a datatree.DataTree. | ||||||||||||
|
||||||||||||
.. warning:: The DataTree structure is considered experimental, | ||||||||||||
and the API is less solidified than for other xarray features. | ||||||||||||
Comment on lines
+6123
to
+6124
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure if I just don't know enough about rst, but I wonder if it would be better to move the whole text into the block?
Suggested change
|
||||||||||||
|
||||||||||||
The returned tree will only consist of a single node. | ||||||||||||
That node will contain a copy of the dataset's data, | ||||||||||||
meaning all variables, coordinates, dimensions and attributes. | ||||||||||||
|
||||||||||||
Requires the xarray-datatree package to be installed. | ||||||||||||
Find it at https://github.com/xarray-contrib/datatree. | ||||||||||||
|
||||||||||||
Parameters | ||||||||||||
---------- | ||||||||||||
node_name: str, optional | ||||||||||||
The name of the datatree node created. | ||||||||||||
|
||||||||||||
Returns | ||||||||||||
------- | ||||||||||||
dt : DataTree | ||||||||||||
A single-node datatree object, containing the information from this dataset. | ||||||||||||
|
||||||||||||
See Also | ||||||||||||
-------- | ||||||||||||
datatree.DataTree | ||||||||||||
""" | ||||||||||||
|
||||||||||||
try: | ||||||||||||
from datatree import DataTree | ||||||||||||
except ImportError: | ||||||||||||
raise ImportError( | ||||||||||||
"Could not import the datatree package. " | ||||||||||||
"Find it at https://github.com/xarray-contrib/datatree" | ||||||||||||
) | ||||||||||||
|
||||||||||||
return DataTree(data=self, name=node_name) | ||||||||||||
|
||||||||||||
def _normalize_dim_order( | ||||||||||||
self, dim_order: Sequence[Hashable] | None = None | ||||||||||||
) -> dict[Hashable, int]: | ||||||||||||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
@@ -0,0 +1,30 @@ | ||||||||||||||
import xarray.testing as xrt | ||||||||||||||
from xarray import Dataset | ||||||||||||||
from xarray.tests import requires_datatree | ||||||||||||||
|
||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. since the whole module depends on
Suggested change
then we don't need to decorate every test with If we want to reuse
Suggested change
|
||||||||||||||
|
||||||||||||||
@requires_datatree | ||||||||||||||
def test_import_datatree(): | ||||||||||||||
"""Just test importing datatree package from xarray-contrib repo""" | ||||||||||||||
from xarray import DataTree | ||||||||||||||
|
||||||||||||||
DataTree() | ||||||||||||||
|
||||||||||||||
|
||||||||||||||
@requires_datatree | ||||||||||||||
def test_to_datatree(): | ||||||||||||||
from xarray import DataTree | ||||||||||||||
|
||||||||||||||
ds = Dataset({"a": ("x", [1, 2, 3])}) | ||||||||||||||
dt = ds.to_datatree(node_name="group1") | ||||||||||||||
|
||||||||||||||
assert isinstance(dt, DataTree) | ||||||||||||||
assert dt.name == "group1" | ||||||||||||||
xrt.assert_identical(dt.to_dataset(), ds) | ||||||||||||||
|
||||||||||||||
da = ds["a"] | ||||||||||||||
dt = da.to_datatree(node_name="group1") | ||||||||||||||
|
||||||||||||||
assert isinstance(dt, DataTree) | ||||||||||||||
assert dt.name == "group1" | ||||||||||||||
xrt.assert_identical(dt["a"], da) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's probably fine as-is (and I'm always confused about the name), but should this be
xarray-datatree
, given that that's the package we are installing below?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably!