Description
What happened?
Upgrading numpy from 1.26.4 to 2.1.2 breaks my code. I went through several pages of issues looking for "concat", but none seemed to fit.
The xr.concat method applied to a list of DataArrays that are to be concatenated along a scalar coordinate seems to no longer work.
When the DataArrays are created, it used to convert a scalar coord of np.str_
type to a numpy array with dtype <U...
. This conversion seems to be gone and, without it, my code no longer works.
Instead a rather cryptic error message appears (full traceback below, here the last bit):
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/variable.py:1387, in Variable.set_dims(self, dim, shape)
1385 else:
1386 indexer = (None,) * (len(expanded_dims) - self.ndim) + (...,)
-> 1387 expanded_data = self.data[indexer]
1389 expanded_var = Variable(
1390 expanded_dims, expanded_data, self._attrs, self._encoding, fastpath=True
1391 )
1392 return expanded_var.transpose(*dim)
TypeError: string indices must be integers, not 'tuple'
self.data with latest numpy is just a string version of a UUID (was formerly converted to a numpy array) and the indexer is (None, Ellipsis).
What did you expect to happen?
In contrast to the output posted below in the "Minimal Complete Verifiable Example" and "Relevant log output", I expected this output that I get with numpy version 1.26.4 :
xr.concat([xarr, xarr2], dim=("scalar_coord"))
<xarray.DataArray (scalar_coord: 2, abc: 3)> Size: 48B
array([[1., 1., 1.],
[1., 1., 1.]])
Coordinates:
* abc (abc) <U1 12B 'a' 'b' 'c'
* scalar_coord (scalar_coord) <U36 288B '90ff719e-6e3b-434a-b4f1-facfa168b...
Where the `xarr1.coords["scalar_coord"] looks like this (an array created from a scalar):
<xarray.DataArray 'scalar_coord' ()> Size: 144B
array('90ff719e-6e3b-434a-b4f1-facfa168b2e1', dtype='<U36')
Coordinates:
scalar_coord <U36 144B '90ff719e-6e3b-434a-b4f1-facfa168b2e1'
Minimal Complete Verifiable Example
# Python 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)]
# Type 'copyright', 'credits' or 'license' for more information
# IPython 8.28.0 -- An enhanced Interactive Python. Type '?' for help.
import xarray as xr
import numpy as np
import uuid
xr.__version__
# '2024.9.0'
np.__version__
# 2.1.2'
xarr = xr.DataArray(np.array([1.0,] * 3, dtype=np.float64), dims=("abc"), coords=dict(abc=np.array(list("abc"), dtype="<U1"), scalar_coord=np.str_(uuid.uuid4())))
xarr.coords["scalar_coord"]
# <xarray.DataArray 'scalar_coord' ()> Size: 144B
# np.str_('56382178-7f7d-4ec8-a4c1-8ebee96ec8df')
# Coordinates:
# scalar_coord <U36 144B ...
#
xarr.coords["scalar_coord"].data
# np.str_('56382178-7f7d-4ec8-a4c1-8ebee96ec8df')
xarr2 = xr.DataArray(np.array([1.0,] * 3, dtype=np.float64), dims=("abc"), coords=dict(abc=np.array(list("abc"), dtype="<U1"), scalar_coord=np.str_(uuid.uuid4())))
xr.concat([xarr, xarr2], dim=("scalar_coord"))
# see error in "Relevant log output"
MVCE confirmation
- Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue in xarray.
- Complete example — the example is self-contained, including all data and the text of any traceback.
- Verifiable example — the example copy & pastes into an IPython prompt or Binder notebook, returning the result.
- New issue — a search of GitHub Issues suggests this is not a duplicate.
- Recent environment — the issue occurs with the latest version of xarray and its dependencies.
Relevant log output
In [10]: xr.concat([xarr, xarr2], dim=("scalar_coord"))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 xr.concat([xarr, xarr2], dim=("scalar_coord"))
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/concat.py:264, in concat(objs, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs, create_index_for_new_dim)
259 raise ValueError(
260 f"compat={compat!r} invalid: must be 'broadcast_equals', 'equals', 'identical', 'no_conflicts' or 'override'"
261 )
263 if isinstance(first_obj, DataArray):
--> 264 return _dataarray_concat(
265 objs,
266 dim=dim,
267 data_vars=data_vars,
268 coords=coords,
269 compat=compat,
270 positions=positions,
271 fill_value=fill_value,
272 join=join,
273 combine_attrs=combine_attrs,
274 create_index_for_new_dim=create_index_for_new_dim,
275 )
276 elif isinstance(first_obj, Dataset):
277 return _dataset_concat(
278 objs,
279 dim=dim,
(...)
287 create_index_for_new_dim=create_index_for_new_dim,
288 )
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/concat.py:755, in _dataarray_concat(arrays, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs, create_index_for_new_dim)
752 arr = arr.rename(name)
753 datasets.append(arr._to_temp_dataset())
--> 755 ds = _dataset_concat(
756 datasets,
757 dim,
758 data_vars,
759 coords,
760 compat,
761 positions,
762 fill_value=fill_value,
763 join=join,
764 combine_attrs=combine_attrs,
765 create_index_for_new_dim=create_index_for_new_dim,
766 )
768 merged_attrs = merge_attrs([da.attrs for da in arrays], combine_attrs)
770 result = arrays[0]._from_temp_dataset(ds, name)
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/concat.py:540, in _dataset_concat(datasets, dim, data_vars, coords, compat, positions, fill_value, join, combine_attrs, create_index_for_new_dim)
535 # case where concat dimension is a coordinate or data_var but not a dimension
536 if (
537 dim_name in coord_names or dim_name in data_names
538 ) and dim_name not in dim_names:
539 datasets = [
--> 540 ds.expand_dims(dim_name, create_index_for_new_dim=create_index_for_new_dim)
541 for ds in datasets
542 ]
544 # determine which variables to concatenate
545 concat_over, equals, concat_dim_lengths = _calc_concat_over(
546 datasets, dim_name, dim_names, data_vars, coords, compat
547 )
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/dataset.py:4797, in Dataset.expand_dims(self, dim, axis, create_index_for_new_dim, **dim_kwargs)
4793 if k not in variables:
4794 if k in coord_names and create_index_for_new_dim:
4795 # If dims includes a label of a non-dimension coordinate,
4796 # it will be promoted to a 1D coordinate with a single value.
-> 4797 index, index_vars = create_default_index_implicit(v.set_dims(k))
4798 indexes[k] = index
4799 variables.update(index_vars)
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/util/deprecation_helpers.py:143, in deprecate_dims.<locals>.wrapper(*args, **kwargs)
135 emit_user_level_warning(
136 f"The `{old_name}` argument has been renamed to `dim`, and will be removed "
137 "in the future. This renaming is taking place throughout xarray over the "
(...)
140 PendingDeprecationWarning,
141 )
142 kwargs["dim"] = kwargs.pop(old_name)
--> 143 return func(*args, **kwargs)
File ~/Projects/temp/xarray_numpy_bug/.env/lib64/python3.12/site-packages/xarray/core/variable.py:1387, in Variable.set_dims(self, dim, shape)
1385 else:
1386 indexer = (None,) * (len(expanded_dims) - self.ndim) + (...,)
-> 1387 expanded_data = self.data[indexer]
1389 expanded_var = Variable(
1390 expanded_dims, expanded_data, self._attrs, self._encoding, fastpath=True
1391 )
1392 return expanded_var.transpose(*dim)
TypeError: string indices must be integers, not 'tuple'
Anything else we need to know?
I created a fresh fedora container and created two new virtual environments in which I executed the exact same code to ensure this really has just to do with xarray and numpy versions.
I went through all 3 pages of open issues on "concat" and read those that appeared to possibly be relevant, but none seemed to match my case. Truely sorry if I overlooked something!
$ toolbox create -i fedora-toolbox:40 xarray_fedora40
$ toolbox enter xarray_fedora40
$ cd Projects/temp
$ mkdir xarray_numpy_bug
$ cd xarray_numpy_bug/
$ python --version
Python 3.12.3
$ python -m venv .env
$ . .env/bin/activate
$ pip --isolated install --upgrade pip ipython setuptools
$ pip --isolated install xarray
$ ipython
# broken code with latest numpy
$ deactivate
$ python -m venv .env_old_numpy
$ . .env_old_numpy/bin/activate
$ pip --isolated install --upgrade pip ipython setuptools
$ pip --isolated install xarray
$ pip --isolated install numpy==1.26.4
$ ipython
# same code with "old" numpy, works as before
Environment
INSTALLED VERSIONS
commit: None
python: 3.12.3 (main, Apr 17 2024, 00:00:00) [GCC 14.0.1 20240411 (Red Hat 14.0.1-0)]
python-bits: 64
OS: Linux
OS-release: 6.10.12-200.fc40.x86_64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_GB.UTF-8
LOCALE: ('en_GB', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2024.9.0
pandas: 2.2.3
numpy: 2.1.2
scipy: None
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
zarr: None
cftime: None
nc_time_axis: None
iris: None
bottleneck: None
dask: None
distributed: None
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: None
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 75.1.0
pip: 24.2
conda: None
pytest: None
mypy: None
IPython: 8.28.0
sphinx: None