Skip to content

Commit 19a0428

Browse files
dcherianmax-sixty
andauthored
GroupBy(multiple groupers) (#9372)
* GroupBy(multiple groupers) * Add example to docs fix docs [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix docs * More docs * fix doc * fix doc again * Fix bug. * Add whats-new note * edit * Error on multi-variable groupby with MultiIndex * Update doc/user-guide/groupby.rst --------- Co-authored-by: Maximilian Roos <[email protected]>
1 parent 6a2eddd commit 19a0428

File tree

6 files changed

+423
-106
lines changed

6 files changed

+423
-106
lines changed

doc/user-guide/groupby.rst

+28-16
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,7 @@ You can index out a particular group:
8181
8282
ds.groupby("letters")["b"]
8383
84-
Just like in pandas, creating a GroupBy object is cheap: it does not actually
85-
split the data until you access particular values.
84+
To group by multiple variables, see :ref:`this section <groupby.multiple>`.
8685

8786
Binning
8887
~~~~~~~
@@ -180,19 +179,6 @@ This last line is roughly equivalent to the following::
180179
results.append(group - alt.sel(letters=label))
181180
xr.concat(results, dim='x')
182181

183-
Iterating and Squeezing
184-
~~~~~~~~~~~~~~~~~~~~~~~
185-
186-
Previously, Xarray defaulted to squeezing out dimensions of size one when iterating over
187-
a GroupBy object. This behaviour is being removed.
188-
You can always squeeze explicitly later with the Dataset or DataArray
189-
:py:meth:`DataArray.squeeze` methods.
190-
191-
.. ipython:: python
192-
193-
next(iter(arr.groupby("x", squeeze=False)))
194-
195-
196182
.. _groupby.multidim:
197183

198184
Multidimensional Grouping
@@ -236,6 +222,8 @@ applying your function, and then unstacking the result:
236222
stacked = da.stack(gridcell=["ny", "nx"])
237223
stacked.groupby("gridcell").sum(...).unstack("gridcell")
238224
225+
Alternatively, you can groupby both `lat` and `lon` at the :ref:`same time <groupby.multiple>`.
226+
239227
.. _groupby.groupers:
240228

241229
Grouper Objects
@@ -276,7 +264,8 @@ is identical to
276264
277265
ds.groupby(x=UniqueGrouper())
278266
279-
and
267+
268+
Similarly,
280269

281270
.. code-block:: python
282271
@@ -303,3 +292,26 @@ is identical to
303292
from xarray.groupers import TimeResampler
304293
305294
ds.resample(time=TimeResampler("ME"))
295+
296+
297+
.. _groupby.multiple:
298+
299+
Grouping by multiple variables
300+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
301+
302+
Use grouper objects to group by multiple dimensions:
303+
304+
.. ipython:: python
305+
306+
from xarray.groupers import UniqueGrouper
307+
308+
da.groupby(lat=UniqueGrouper(), lon=UniqueGrouper()).sum()
309+
310+
311+
Different groupers can be combined to construct sophisticated GroupBy operations.
312+
313+
.. ipython:: python
314+
315+
from xarray.groupers import BinGrouper
316+
317+
ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()).sum()

doc/whats-new.rst

+5
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,11 @@ New Features
2424
~~~~~~~~~~~~
2525
- Make chunk manager an option in ``set_options`` (:pull:`9362`).
2626
By `Tom White <https://github.com/tomwhite>`_.
27+
- Support for :ref:`grouping by multiple variables <groupby.multiple>`.
28+
This is quite new, so please check your results and report bugs.
29+
Binary operations after grouping by multiple arrays are not supported yet.
30+
(:issue:`1056`, :issue:`9332`, :issue:`324`, :pull:`9372`).
31+
By `Deepak Cherian <https://github.com/dcherian>`_.
2732
- Allow data variable specific ``constant_values`` in the dataset ``pad`` function (:pull:`9353``).
2833
By `Tiago Sanona <https://github.com/tsanona>`_.
2934

xarray/core/dataarray.py

+7-12
Original file line numberDiff line numberDiff line change
@@ -6801,27 +6801,22 @@ def groupby(
68016801
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
68026802
group = None
68036803

6804-
grouper: Grouper
6804+
rgroupers: tuple[ResolvedGrouper, ...]
68056805
if group is not None:
68066806
if groupers:
68076807
raise ValueError(
68086808
"Providing a combination of `group` and **groupers is not supported."
68096809
)
6810-
grouper = UniqueGrouper()
6810+
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
68116811
else:
6812-
if len(groupers) > 1:
6813-
raise ValueError("grouping by multiple variables is not supported yet.")
68146812
if not groupers:
68156813
raise ValueError("Either `group` or `**groupers` must be provided.")
6816-
group, grouper = next(iter(groupers.items()))
6817-
6818-
rgrouper = ResolvedGrouper(grouper, group, self)
6814+
rgroupers = tuple(
6815+
ResolvedGrouper(grouper, group, self)
6816+
for group, grouper in groupers.items()
6817+
)
68196818

6820-
return DataArrayGroupBy(
6821-
self,
6822-
(rgrouper,),
6823-
restore_coord_dims=restore_coord_dims,
6824-
)
6819+
return DataArrayGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
68256820

68266821
@_deprecate_positional_args("v2024.07.0")
68276822
def groupby_bins(

xarray/core/dataset.py

+8-11
Original file line numberDiff line numberDiff line change
@@ -10397,25 +10397,22 @@ def groupby(
1039710397
groupers = either_dict_or_kwargs(group, groupers, "groupby") # type: ignore
1039810398
group = None
1039910399

10400+
rgroupers: tuple[ResolvedGrouper, ...]
1040010401
if group is not None:
1040110402
if groupers:
1040210403
raise ValueError(
1040310404
"Providing a combination of `group` and **groupers is not supported."
1040410405
)
10405-
rgrouper = ResolvedGrouper(UniqueGrouper(), group, self)
10406+
rgroupers = (ResolvedGrouper(UniqueGrouper(), group, self),)
1040610407
else:
10407-
if len(groupers) > 1:
10408-
raise ValueError("Grouping by multiple variables is not supported yet.")
10409-
elif not groupers:
10408+
if not groupers:
1041010409
raise ValueError("Either `group` or `**groupers` must be provided.")
10411-
for group, grouper in groupers.items():
10412-
rgrouper = ResolvedGrouper(grouper, group, self)
10410+
rgroupers = tuple(
10411+
ResolvedGrouper(grouper, group, self)
10412+
for group, grouper in groupers.items()
10413+
)
1041310414

10414-
return DatasetGroupBy(
10415-
self,
10416-
(rgrouper,),
10417-
restore_coord_dims=restore_coord_dims,
10418-
)
10415+
return DatasetGroupBy(self, rgroupers, restore_coord_dims=restore_coord_dims)
1041910416

1042010417
@_deprecate_positional_args("v2024.07.0")
1042110418
def groupby_bins(

0 commit comments

Comments
 (0)