-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
.dt accessor returns int instead of float, resulting in misrepresentation of NaT values #7928
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Thanks for opening your first issue here at xarray! Be sure to follow the issue template! |
@idantene Thanks for bringing this to attention. I can reproduce and it seems the unconditional cast (as you suggested) is the cause: xarray/xarray/core/accessor_dt.py Line 122 in e739df7
If |
Hey @kmuehlbauer, thanks for addressing and confirming the issue! I think both of those are valid approaches, and would surely address the issue at hand. However, these feel a bit like stop-gap measures, and I have to wonder (as someone who hasn't contributed to xarray yet) - why are the type casts even necessary, especially in the series case? |
Yes I think we should cast to float like pandas does. |
@dcherian pandas currently returns xarray/xarray/core/accessor_dt.py Line 68 in 4156ce5
Side note: If Update: This already raises in current main branch (and probably earlier releases) with pandas version 2.0.1: import pandas as pd
s = pd.to_datetime(pd.Series(['2021-12-01', pd.NaT]))
s.to_xarray().dt.isocalendar() ---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[11], line 3
1 import pandas as pd
2 s = pd.to_datetime(pd.Series(['2021-12-01', pd.NaT]))
----> 3 s.to_xarray().dt.isocalendar()
File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/core/accessor_dt.py:363, in DatetimeAccessor.isocalendar(self)
360 if not is_np_datetime_like(self._obj.data.dtype):
361 raise AttributeError("'CFTimeIndex' object has no attribute 'isocalendar'")
--> 363 values = _get_date_field(self._obj.data, "isocalendar", np.int64)
365 obj_type = type(self._obj)
366 data_vars = {}
File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/core/accessor_dt.py:122, in _get_date_field(values, name, dtype)
118 return map_blocks(
119 access_method, values, name, dtype=dtype, new_axis=new_axis, chunks=chunks
120 )
121 else:
--> 122 return access_method(values, name).astype(dtype, copy=False)
File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/xarray/core/accessor_dt.py:78, in _access_through_series(values, name)
75 field_values = _season_from_months(months)
76 elif name == "isocalendar":
77 # isocalendar returns iso- year, week, and weekday -> reshape
---> 78 field_values = np.array(values_as_series.dt.isocalendar(), dtype=np.int64)
79 return field_values.T.reshape(3, *values.shape)
80 else:
File /home/kai/miniconda/envs/xarray_311/lib/python3.11/site-packages/pandas/core/generic.py:1998, in NDFrame.__array__(self, dtype)
1996 def __array__(self, dtype: npt.DTypeLike | None = None) -> np.ndarray:
1997 values = self._values
-> 1998 arr = np.asarray(values, dtype=dtype)
1999 if (
2000 astype_is_view(values.dtype, arr.dtype)
2001 and using_copy_on_write()
2002 and self._mgr.is_single_block
2003 ):
2004 # Check if both conversions can be done without a copy
2005 if astype_is_view(self.dtypes.iloc[0], values.dtype) and astype_is_view(
2006 values.dtype, arr.dtype
2007 ):
TypeError: int() argument must be a string, a bytes-like object or a real number, not 'NAType' Looks like no test was catching this up to now. |
Any update on this @dcherian @kmuehlbauer? We're stuck with |
@idantene Thanks for checking back. I lost track over summer. I've a somewhat clumsy approach over in #8084. Would be great if you can test with your setup. It checks for NaT and casts to float64 if that's the case. Unfortunately |
@idantene @dcherian I think this problem should be fixed upstream. I've opened an issue pandas-dev/pandas#54657 over at pandas to see, if and how this can be aligned. |
Thanks @kmuehlbauer! I'll have a test run soon to verify this fix - it looks good on paper. I'm curious as to how you'll fix this for dask, though it does not apply to our use case :D |
I've no idea how this can be made consistent for dask without a priory knowledge if there are NaT involved or not. As far as I checked it will return some |
What happened?
With the latest xarray (this doesn't happen at least in version
2023.2.0
), accessing .dt parts returns a strictint64
DataArray, resulting in wrongly presented missing values.Notice how:
datetime64[ns]
.dt.year
accessor returns a float64 to accommodate the missing value (it will useint32
w/o the missing value)..dt.year
returns a negative integer instead ofnan
.Additionally, compare with the same snippet's output with xarray 2023.2.0:
What did you expect to happen?
The
.dt
accessor should return a float with missing values when needed.Minimal Complete Verifiable Example
MVCE confirmation
Relevant log output
No response
Anything else we need to know?
No response
Environment
INSTALLED VERSIONS
commit: None
python: 3.10.6 (main, May 29 2023, 11:10:38) [GCC 11.3.0]
python-bits: 64
OS: Linux
OS-release: 5.19.0-1025-aws
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: C.UTF-8
LOCALE: ('en_US', 'UTF-8')
libhdf5: None
libnetcdf: None
xarray: 2023.5.0
pandas: 2.0.2
numpy: 1.23.4
scipy: 1.10.1
netCDF4: None
pydap: None
h5netcdf: None
h5py: None
Nio: None
zarr: None
cftime: None
nc_time_axis: None
PseudoNetCDF: None
iris: None
bottleneck: None
dask: 2023.6.0
distributed: 2023.6.0
matplotlib: None
cartopy: None
seaborn: None
numbagg: None
fsspec: 2023.6.0
cupy: None
pint: None
sparse: None
flox: None
numpy_groupies: None
setuptools: 59.6.0
pip: 23.1.2
conda: None
pytest: None
mypy: None
IPython: 8.14.0
sphinx: None
The text was updated successfully, but these errors were encountered: