Skip to content

Commit 5f37042

Browse files
committed
Merge branch 'main' into fix/zarr-v3
* main: Fix multiple grouping with missing groups (pydata#9650) flox: Properly propagate multiindex (pydata#9649) Update Datatree html repr to indicate inheritance (pydata#9633) Re-implement map_over_datasets using group_subtrees (pydata#9636) fix zarr intersphinx (pydata#9652) Replace black and blackdoc with ruff-format (pydata#9506) Fix error and missing code cell in io.rst (pydata#9641) Support alternative names for the root node in DataTree.from_dict (pydata#9638) Updates to DataTree.equals and DataTree.identical (pydata#9627) DOC: Clarify error message in open_dataarray (pydata#9637) Add zip_subtrees for paired iteration over DataTrees (pydata#9623) Type check datatree tests (pydata#9632) Add missing `memo` argument to DataTree.__deepcopy__ (pydata#9631) Bug fixes for DataTree indexing and aggregation (pydata#9626) Add inherit=False option to DataTree.copy() (pydata#9628) docs(groupby): mention deprecation of `squeeze` kwarg (pydata#9625) Migration guide for users of old datatree repo (pydata#9598) Reimplement Datatree typed ops (pydata#9619)
2 parents ff0f2c0 + df87f69 commit 5f37042

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

60 files changed

+1695
-1604
lines changed

.pre-commit-config.yaml

+1-6
Original file line numberDiff line numberDiff line change
@@ -15,20 +15,15 @@ repos:
1515
# Ruff version.
1616
rev: 'v0.6.9'
1717
hooks:
18+
- id: ruff-format
1819
- id: ruff
1920
args: ["--fix", "--show-fixes"]
20-
# https://github.com/python/black#version-control-integration
21-
- repo: https://github.com/psf/black-pre-commit-mirror
22-
rev: 24.8.0
23-
hooks:
24-
- id: black-jupyter
2521
- repo: https://github.com/keewis/blackdoc
2622
rev: v0.3.9
2723
hooks:
2824
- id: blackdoc
2925
exclude: "generate_aggregations.py"
3026
additional_dependencies: ["black==24.8.0"]
31-
- id: blackdoc-autoupdate-black
3227
- repo: https://github.com/pre-commit/mirrors-mypy
3328
rev: v1.11.2
3429
hooks:

CORE_TEAM_GUIDE.md

+1-2
Original file line numberDiff line numberDiff line change
@@ -271,8 +271,7 @@ resources such as:
271271
[NumPy documentation guide](https://numpy.org/devdocs/dev/howto-docs.html#documentation-style)
272272
for docstring conventions.
273273
- [`pre-commit`](https://pre-commit.com) hooks for autoformatting.
274-
- [`black`](https://github.com/psf/black) autoformatting.
275-
- [`flake8`](https://github.com/PyCQA/flake8) linting.
274+
- [`ruff`](https://github.com/astral-sh/ruff) autoformatting and linting.
276275
- [python-xarray](https://stackoverflow.com/questions/tagged/python-xarray) on Stack Overflow.
277276
- [@xarray_dev](https://twitter.com/xarray_dev) on Twitter.
278277
- [xarray-dev](https://discord.gg/bsSGdwBn) discord community (normally only used for remote synchronous chat during sprints).

DATATREE_MIGRATION_GUIDE.md

+63
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Migration guide for users of `xarray-contrib/datatree`
2+
3+
_15th October 2024_
4+
5+
This guide is for previous users of the prototype `datatree.DataTree` class in the `xarray-contrib/datatree repository`. That repository has now been archived, and will not be maintained. This guide is intended to help smooth your transition to using the new, updated `xarray.DataTree` class.
6+
7+
> [!IMPORTANT]
8+
> There are breaking changes! You should not expect that code written with `xarray-contrib/datatree` will work without any modifications. At the absolute minimum you will need to change the top-level import statement, but there are other changes too.
9+
10+
We have made various changes compared to the prototype version. These can be split into three categories: data model changes, which affect the hierarchal structure itself; integration with xarray's IO backends; and minor API changes, which mostly consist of renaming methods to be more self-consistent.
11+
12+
### Data model changes
13+
14+
The most important changes made are to the data model of `DataTree`. Whilst previously data in different nodes was unrelated and therefore unconstrained, now trees have "internal alignment" - meaning that dimensions and indexes in child nodes must exactly align with those in their parents.
15+
16+
These alignment checks happen at tree construction time, meaning there are some netCDF4 files and zarr stores that could previously be opened as `datatree.DataTree` objects using `datatree.open_datatree`, but now cannot be opened as `xr.DataTree` objects using `xr.open_datatree`. For these cases we added a new opener function `xr.open_groups`, which returns a `dict[str, Dataset]`. This is intended as a fallback for tricky cases, where the idea is that you can still open the entire contents of the file using `open_groups`, edit the `Dataset` objects, then construct a valid tree from the edited dictionary using `DataTree.from_dict`.
17+
18+
The alignment checks allowed us to add "Coordinate Inheritance", a much-requested feature where indexed coordinate variables are now "inherited" down to child nodes. This allows you to define common coordinates in a parent group that are then automatically available on every child node. The distinction between a locally-defined coordinate variables and an inherited coordinate that was defined on a parent node is reflected in the `DataTree.__repr__`. Generally if you prefer not to have these variables be inherited you can get more similar behaviour to the old `datatree` package by removing indexes from coordinates, as this prevents inheritance.
19+
20+
Tree structure checks between multiple trees (i.e., `DataTree.isomorophic`) and pairing of nodes in arithmetic has also changed. Nodes are now matched (with `xarray.group_subtrees`) based on their relative paths, without regard to the order in which child nodes are defined.
21+
22+
For further documentation see the page in the user guide on Hierarchical Data.
23+
24+
### Integrated backends
25+
26+
Previously `datatree.open_datatree` used a different codepath from `xarray.open_dataset`, and was hard-coded to only support opening netCDF files and Zarr stores.
27+
Now xarray's backend entrypoint system has been generalized to include `open_datatree` and the new `open_groups`.
28+
This means we can now extend other xarray backends to support `open_datatree`! If you are the maintainer of an xarray backend we encourage you to add support for `open_datatree` and `open_groups`!
29+
30+
Additionally:
31+
- A `group` kwarg has been added to `open_datatree` for choosing which group in the file should become the root group of the created tree.
32+
- Various performance improvements have been made, which should help when opening netCDF files and Zarr stores with large numbers of groups.
33+
- We anticipate further performance improvements being possible for datatree IO.
34+
35+
### API changes
36+
37+
A number of other API changes have been made, which should only require minor modifications to your code:
38+
- The top-level import has changed, from `from datatree import DataTree, open_datatree` to `from xarray import DataTree, open_datatree`. Alternatively you can now just use the `import xarray as xr` namespace convention for everything datatree-related.
39+
- The `DataTree.ds` property has been changed to `DataTree.dataset`, though `DataTree.ds` remains as an alias for `DataTree.dataset`.
40+
- Similarly the `ds` kwarg in the `DataTree.__init__` constructor has been replaced by `dataset`, i.e. use `DataTree(dataset=)` instead of `DataTree(ds=...)`.
41+
- The method `DataTree.to_dataset()` still exists but now has different options for controlling which variables are present on the resulting `Dataset`, e.g. `inherit=True/False`.
42+
- `DataTree.copy()` also has a new `inherit` keyword argument for controlling whether or not coordinates defined on parents are copied (only relevant when copying a non-root node).
43+
- The `DataTree.parent` property is now read-only. To assign a ancestral relationships directly you must instead use the `.children` property on the parent node, which remains settable.
44+
- Similarly the `parent` kwarg has been removed from the `DataTree.__init__` constuctor.
45+
- DataTree objects passed to the `children` kwarg in `DataTree.__init__` are now shallow-copied.
46+
- `DataTree.as_array` has been replaced by `DataTree.to_dataarray`.
47+
- A number of methods which were not well tested have been (temporarily) disabled. In general we have tried to only keep things that are known to work, with the plan to increase API surface incrementally after release.
48+
49+
## Thank you!
50+
51+
Thank you for trying out `xarray-contrib/datatree`!
52+
53+
We welcome contributions of any kind, including good ideas that never quite made it into the original datatree repository. Please also let us know if we have forgotten to mention a change that should have been listed in this guide.
54+
55+
Sincerely, the datatree team:
56+
57+
Tom Nicholas,
58+
Owen Littlejohns,
59+
Matt Savoie,
60+
Eni Awowale,
61+
Alfonso Ladino,
62+
Justus Magin,
63+
Stephan Hoyer

ci/min_deps_check.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@
33
publication date. Compare it against requirements/min-all-deps.yml to verify the
44
policy on obsolete dependencies is being followed. Print a pretty report :)
55
"""
6+
67
from __future__ import annotations
78

89
import itertools
@@ -16,7 +17,6 @@
1617

1718
CHANNELS = ["conda-forge", "defaults"]
1819
IGNORE_DEPS = {
19-
"black",
2020
"coveralls",
2121
"flake8",
2222
"hypothesis",

ci/requirements/all-but-dask.yml

-1
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,6 @@ channels:
33
- conda-forge
44
- nodefaults
55
dependencies:
6-
- black
76
- aiobotocore
87
- array-api-strict
98
- boto3

doc/api.rst

+20-10
Original file line numberDiff line numberDiff line change
@@ -749,6 +749,17 @@ Manipulate the contents of a single ``DataTree`` node.
749749
DataTree.assign
750750
DataTree.drop_nodes
751751

752+
DataTree Operations
753+
-------------------
754+
755+
Apply operations over multiple ``DataTree`` objects.
756+
757+
.. autosummary::
758+
:toctree: generated/
759+
760+
map_over_datasets
761+
group_subtrees
762+
752763
Comparisons
753764
-----------
754765

@@ -849,20 +860,20 @@ Aggregate data in all nodes in the subtree simultaneously.
849860
DataTree.cumsum
850861
DataTree.cumprod
851862

852-
.. ndarray methods
853-
.. ---------------
863+
ndarray methods
864+
---------------
854865

855-
.. Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree.
866+
Methods copied from :py:class:`numpy.ndarray` objects, here applying to the data in all nodes in the subtree.
856867

857-
.. .. autosummary::
858-
.. :toctree: generated/
868+
.. autosummary::
869+
:toctree: generated/
859870

860-
.. DataTree.argsort
871+
DataTree.argsort
872+
DataTree.conj
873+
DataTree.conjugate
874+
DataTree.round
861875
.. DataTree.astype
862876
.. DataTree.clip
863-
.. DataTree.conj
864-
.. DataTree.conjugate
865-
.. DataTree.round
866877
.. DataTree.rank
867878
868879
.. Reshaping and reorganising
@@ -954,7 +965,6 @@ DataTree methods
954965

955966
open_datatree
956967
open_groups
957-
map_over_datasets
958968
DataTree.to_dict
959969
DataTree.to_netcdf
960970
DataTree.to_zarr

doc/conf.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -347,7 +347,7 @@
347347
"scipy": ("https://docs.scipy.org/doc/scipy", None),
348348
"sparse": ("https://sparse.pydata.org/en/latest/", None),
349349
"xarray-tutorial": ("https://tutorial.xarray.dev/", None),
350-
"zarr": ("https://zarr.readthedocs.io/en/latest/", None),
350+
"zarr": ("https://zarr.readthedocs.io/en/stable/", None),
351351
}
352352

353353

doc/contributing.rst

+2-6
Original file line numberDiff line numberDiff line change
@@ -549,11 +549,7 @@ Code Formatting
549549

550550
xarray uses several tools to ensure a consistent code format throughout the project:
551551

552-
- `Black <https://black.readthedocs.io/en/stable/>`_ for standardized
553-
code formatting,
554-
- `blackdoc <https://blackdoc.readthedocs.io/en/stable/>`_ for
555-
standardized code formatting in documentation,
556-
- `ruff <https://github.com/charliermarsh/ruff/>`_ for code quality checks and standardized order in imports
552+
- `ruff <https://github.com/astral-sh/ruff>`_ for formatting, code quality checks and standardized order in imports
557553
- `absolufy-imports <https://github.com/MarcoGorelli/absolufy-imports>`_ for absolute instead of relative imports from different files,
558554
- `mypy <http://mypy-lang.org/>`_ for static type checking on `type hints
559555
<https://docs.python.org/3/library/typing.html>`_.
@@ -1069,7 +1065,7 @@ PR checklist
10691065
- Test the code using `Pytest <http://doc.pytest.org/en/latest/>`_. Running all tests (type ``pytest`` in the root directory) takes a while, so feel free to only run the tests you think are needed based on your PR (example: ``pytest xarray/tests/test_dataarray.py``). CI will catch any failing tests.
10701066
- By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a ``[test-upstream]`` tag to the first line of the commit message. For documentation-only commits, you can skip the CI per commit by adding a ``[skip-ci]`` tag to the first line of the commit message.
10711067
1072-
- **Properly format your code** and verify that it passes the formatting guidelines set by `Black <https://black.readthedocs.io/en/stable/>`_ and `Flake8 <http://flake8.pycqa.org/en/latest/>`_. See `"Code formatting" <https://docs.xarray.dev/en/stablcontributing.html#code-formatting>`_. You can use `pre-commit <https://pre-commit.com/>`_ to run these automatically on each commit.
1068+
- **Properly format your code** and verify that it passes the formatting guidelines set by `ruff <https://github.com/astral-sh/ruff>`_. See `"Code formatting" <https://docs.xarray.dev/en/stablcontributing.html#code-formatting>`_. You can use `pre-commit <https://pre-commit.com/>`_ to run these automatically on each commit.
10731069
10741070
- Run ``pre-commit run --all-files`` in the root directory. This may modify some files. Confirm and commit any formatting changes.
10751071

doc/user-guide/hierarchical-data.rst

+63-18
Original file line numberDiff line numberDiff line change
@@ -362,21 +362,26 @@ This returns an iterable of nodes, which yields them in depth-first order.
362362
for node in vertebrates.subtree:
363363
print(node.path)
364364
365-
A very useful pattern is to use :py:class:`~xarray.DataTree.subtree` conjunction with the :py:class:`~xarray.DataTree.path` property to manipulate the nodes however you wish,
366-
then rebuild a new tree using :py:meth:`xarray.DataTree.from_dict()`.
365+
Similarly, :py:class:`~xarray.DataTree.subtree_with_keys` returns an iterable of
366+
relative paths and corresponding nodes.
367367

368+
A very useful pattern is to iterate over :py:class:`~xarray.DataTree.subtree_with_keys`
369+
to manipulate nodes however you wish, then rebuild a new tree using
370+
:py:meth:`xarray.DataTree.from_dict()`.
368371
For example, we could keep only the nodes containing data by looping over all nodes,
369372
checking if they contain any data using :py:class:`~xarray.DataTree.has_data`,
370373
then rebuilding a new tree using only the paths of those nodes:
371374

372375
.. ipython:: python
373376
374-
non_empty_nodes = {node.path: node.dataset for node in dt.subtree if node.has_data}
377+
non_empty_nodes = {
378+
path: node.dataset for path, node in dt.subtree_with_keys if node.has_data
379+
}
375380
xr.DataTree.from_dict(non_empty_nodes)
376381
377382
You can see this tree is similar to the ``dt`` object above, except that it is missing the empty nodes ``a/c`` and ``a/c/d``.
378383

379-
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.root.name)``.)
384+
(If you want to keep the name of the root node, you will need to add the ``name`` kwarg to :py:class:`~xarray.DataTree.from_dict`, i.e. ``DataTree.from_dict(non_empty_nodes, name=dt.name)``.)
380385

381386
.. _manipulating trees:
382387

@@ -573,38 +578,78 @@ Then calculate the RMS value of these signals:
573578
574579
.. _multiple trees:
575580

576-
We can also use the :py:meth:`~xarray.map_over_datasets` decorator to promote a function which accepts datasets into one which
577-
accepts datatrees.
581+
We can also use :py:func:`~xarray.map_over_datasets` to apply a function over
582+
the data in multiple trees, by passing the trees as positional arguments.
578583

579584
Operating on Multiple Trees
580585
---------------------------
581586

582587
The examples so far have involved mapping functions or methods over the nodes of a single tree,
583588
but we can generalize this to mapping functions over multiple trees at once.
584589

590+
Iterating Over Multiple Trees
591+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
592+
593+
To iterate over the corresponding nodes in multiple trees, use
594+
:py:func:`~xarray.group_subtrees` instead of
595+
:py:class:`~xarray.DataTree.subtree_with_keys`. This combines well with
596+
:py:meth:`xarray.DataTree.from_dict()` to build a new tree:
597+
598+
.. ipython:: python
599+
600+
dt1 = xr.DataTree.from_dict({"a": xr.Dataset({"x": 1}), "b": xr.Dataset({"x": 2})})
601+
dt2 = xr.DataTree.from_dict(
602+
{"a": xr.Dataset({"x": 10}), "b": xr.Dataset({"x": 20})}
603+
)
604+
result = {}
605+
for path, (node1, node2) in xr.group_subtrees(dt1, dt2):
606+
result[path] = node1.dataset + node2.dataset
607+
xr.DataTree.from_dict(result)
608+
609+
Alternatively, you apply a function directly to paired datasets at every node
610+
using :py:func:`xarray.map_over_datasets`:
611+
612+
.. ipython:: python
613+
614+
xr.map_over_datasets(lambda x, y: x + y, dt1, dt2)
615+
585616
Comparing Trees for Isomorphism
586617
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
587618

588619
For it to make sense to map a single non-unary function over the nodes of multiple trees at once,
589-
each tree needs to have the same structure. Specifically two trees can only be considered similar, or "isomorphic",
590-
if they have the same number of nodes, and each corresponding node has the same number of children.
591-
We can check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method.
620+
each tree needs to have the same structure. Specifically two trees can only be considered similar,
621+
or "isomorphic", if the full paths to all of their descendent nodes are the same.
622+
623+
Applying :py:func:`~xarray.group_subtrees` to trees with different structures
624+
raises :py:class:`~xarray.TreeIsomorphismError`:
592625

593626
.. ipython:: python
594627
:okexcept:
595628
596-
dt1 = xr.DataTree.from_dict({"a": None, "a/b": None})
597-
dt2 = xr.DataTree.from_dict({"a": None})
598-
dt1.isomorphic(dt2)
629+
tree = xr.DataTree.from_dict({"a": None, "a/b": None, "a/c": None})
630+
simple_tree = xr.DataTree.from_dict({"a": None})
631+
for _ in xr.group_subtrees(tree, simple_tree):
632+
...
633+
634+
We can explicitly also check if any two trees are isomorphic using the :py:meth:`~xarray.DataTree.isomorphic` method:
635+
636+
.. ipython:: python
637+
638+
tree.isomorphic(simple_tree)
599639
600-
dt3 = xr.DataTree.from_dict({"a": None, "b": None})
601-
dt1.isomorphic(dt3)
640+
Corresponding tree nodes do not need to have the same data in order to be considered isomorphic:
602641

603-
dt4 = xr.DataTree.from_dict({"A": None, "A/B": xr.Dataset({"foo": 1})})
604-
dt1.isomorphic(dt4)
642+
.. ipython:: python
643+
644+
tree_with_data = xr.DataTree.from_dict({"a": xr.Dataset({"foo": 1})})
645+
simple_tree.isomorphic(tree_with_data)
646+
647+
They also do not need to define child nodes in the same order:
648+
649+
.. ipython:: python
605650
606-
If the trees are not isomorphic a :py:class:`~xarray.TreeIsomorphismError` will be raised.
607-
Notice that corresponding tree nodes do not need to have the same name or contain the same data in order to be considered isomorphic.
651+
reordered_tree = xr.DataTree.from_dict({"a": None, "a/c": None, "a/b": None})
652+
tree.isomorphic(reordered_tree)
608653
609654
Arithmetic Between Multiple Trees
610655
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

doc/user-guide/io.rst

+3-8
Original file line numberDiff line numberDiff line change
@@ -19,16 +19,11 @@ format (recommended).
1919
2020
np.random.seed(123456)
2121
22-
You can `read different types of files <https://docs.xarray.dev/en/stable/user-guide/io.html>`_
23-
in `xr.open_dataset` by specifying the engine to be used:
22+
You can read different types of files in `xr.open_dataset` by specifying the engine to be used:
2423

25-
.. ipython:: python
26-
:okexcept:
27-
:suppress:
28-
29-
import xarray as xr
24+
.. code:: python
3025
31-
xr.open_dataset("my_file.grib", engine="cfgrib")
26+
xr.open_dataset("example.nc", engine="netcdf4")
3227
3328
The "engine" provides a set of instructions that tells xarray how
3429
to read the data and pack them into a `dataset` (or `dataarray`).

doc/whats-new.rst

+7-3
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,15 @@ New Features
2323
~~~~~~~~~~~~
2424
- ``DataTree`` related functionality is now exposed in the main ``xarray`` public
2525
API. This includes: ``xarray.DataTree``, ``xarray.open_datatree``, ``xarray.open_groups``,
26-
``xarray.map_over_datasets``, ``xarray.register_datatree_accessor`` and
27-
``xarray.testing.assert_isomorphic``.
26+
``xarray.map_over_datasets``, ``xarray.group_subtrees``,
27+
``xarray.register_datatree_accessor`` and ``xarray.testing.assert_isomorphic``.
2828
By `Owen Littlejohns <https://github.com/owenlittlejohns>`_,
2929
`Eni Awowale <https://github.com/eni-awowale>`_,
3030
`Matt Savoie <https://github.com/flamingbear>`_,
3131
`Stephan Hoyer <https://github.com/shoyer>`_ and
3232
`Tom Nicholas <https://github.com/TomNicholas>`_.
33+
- A migration guide for users of the prototype `xarray-contrib/datatree repository <https://github.com/xarray-contrib/datatree>`_ has been added, and can be found in the `DATATREE_MIGRATION_GUIDE.md` file in the repository root.
34+
By `Tom Nicholas <https://github.com/TomNicholas>`_.
3335
- Added zarr backends for :py:func:`open_groups` (:issue:`9430`, :pull:`9469`).
3436
By `Eni Awowale <https://github.com/eni-awowale>`_.
3537
- Added support for vectorized interpolation using additional interpolators
@@ -65,11 +67,13 @@ Bug fixes
6567
the non-missing times could in theory be encoded with integers
6668
(:issue:`9488`, :pull:`9497`). By `Spencer Clark
6769
<https://github.com/spencerkclark>`_.
68-
- Fix a few bugs affecting groupby reductions with `flox`. (:issue:`8090`, :issue:`9398`).
70+
- Fix a few bugs affecting groupby reductions with `flox`. (:issue:`8090`, :issue:`9398`, :issue:`9648`).
6971
By `Deepak Cherian <https://github.com/dcherian>`_.
7072
- Fix the safe_chunks validation option on the to_zarr method
7173
(:issue:`5511`, :pull:`9559`). By `Joseph Nowak
7274
<https://github.com/josephnowak>`_.
75+
- Fix binning by multiple variables where some bins have no observations. (:issue:`9630`).
76+
By `Deepak Cherian <https://github.com/dcherian>`_.
7377

7478
Documentation
7579
~~~~~~~~~~~~~

0 commit comments

Comments
 (0)