Skip to content

Commit c9c1c6d

Browse files
spencerkclarkdcheriankeewisIllviljan
authored
Continue to use nanosecond-precision Timestamps in precision-sensitive areas (#7731)
* [test-upstream] use nanosecond-precision timestamps for now * [test-upstream] allow kwargs to be passed to nanosecond_precision_timestamp * unpin `pandas` * Add type hint for nanosecond_precision_timestamp Co-authored-by: Illviljan <[email protected]> * Add one more TODO comment * Remove deleted attributes in CFTimeIndex documentation * Add a what's new entry * Remove one more attribute from api-hidden.rst * Silence conversion warnings in tests * Silence more conversion warnings --------- Co-authored-by: Deepak Cherian <[email protected]> Co-authored-by: Justus Magin <[email protected]> Co-authored-by: Illviljan <[email protected]>
1 parent 13a47fd commit c9c1c6d

22 files changed

+144
-75
lines changed

ci/requirements/all-but-dask.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,7 @@ dependencies:
2525
- numbagg
2626
- numpy<1.24
2727
- packaging
28-
- pandas<2
28+
- pandas
2929
- pint
3030
- pip
3131
- pseudonetcdf

ci/requirements/doc.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ dependencies:
1919
- numba
2020
- numpy>=1.21,<1.24
2121
- packaging>=21.3
22-
- pandas>=1.4,<2
22+
- pandas>=1.4
2323
- pooch
2424
- pip
2525
- pre-commit

ci/requirements/environment-py311.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ dependencies:
2727
- numexpr
2828
- numpy
2929
- packaging
30-
- pandas<2
30+
- pandas
3131
- pint
3232
- pip
3333
- pooch

ci/requirements/environment-windows-py311.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ dependencies:
2424
# - numbagg
2525
- numpy
2626
- packaging
27-
- pandas<2
27+
- pandas
2828
- pint
2929
- pip
3030
- pre-commit

ci/requirements/environment-windows.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -24,7 +24,7 @@ dependencies:
2424
- numbagg
2525
- numpy<1.24
2626
- packaging
27-
- pandas<2
27+
- pandas
2828
- pint
2929
- pip
3030
- pre-commit

ci/requirements/environment.yml

+1-1
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ dependencies:
2727
- numexpr
2828
- numpy<1.24
2929
- packaging
30-
- pandas<2
30+
- pandas
3131
- pint
3232
- pip
3333
- pooch

doc/api-hidden.rst

-6
Original file line numberDiff line numberDiff line change
@@ -375,10 +375,8 @@
375375
CFTimeIndex.is_floating
376376
CFTimeIndex.is_integer
377377
CFTimeIndex.is_interval
378-
CFTimeIndex.is_mixed
379378
CFTimeIndex.is_numeric
380379
CFTimeIndex.is_object
381-
CFTimeIndex.is_type_compatible
382380
CFTimeIndex.isin
383381
CFTimeIndex.isna
384382
CFTimeIndex.isnull
@@ -399,7 +397,6 @@
399397
CFTimeIndex.round
400398
CFTimeIndex.searchsorted
401399
CFTimeIndex.set_names
402-
CFTimeIndex.set_value
403400
CFTimeIndex.shift
404401
CFTimeIndex.slice_indexer
405402
CFTimeIndex.slice_locs
@@ -413,7 +410,6 @@
413410
CFTimeIndex.to_flat_index
414411
CFTimeIndex.to_frame
415412
CFTimeIndex.to_list
416-
CFTimeIndex.to_native_types
417413
CFTimeIndex.to_numpy
418414
CFTimeIndex.to_series
419415
CFTimeIndex.tolist
@@ -438,8 +434,6 @@
438434
CFTimeIndex.hasnans
439435
CFTimeIndex.hour
440436
CFTimeIndex.inferred_type
441-
CFTimeIndex.is_all_dates
442-
CFTimeIndex.is_monotonic
443437
CFTimeIndex.is_monotonic_increasing
444438
CFTimeIndex.is_monotonic_decreasing
445439
CFTimeIndex.is_unique

doc/user-guide/weather-climate.rst

+11-5
Original file line numberDiff line numberDiff line change
@@ -57,14 +57,14 @@ CF-compliant coordinate variables
5757

5858
.. _CFTimeIndex:
5959

60-
Non-standard calendars and dates outside the Timestamp-valid range
61-
------------------------------------------------------------------
60+
Non-standard calendars and dates outside the nanosecond-precision range
61+
-----------------------------------------------------------------------
6262

6363
Through the standalone ``cftime`` library and a custom subclass of
6464
:py:class:`pandas.Index`, xarray supports a subset of the indexing
6565
functionality enabled through the standard :py:class:`pandas.DatetimeIndex` for
6666
dates from non-standard calendars commonly used in climate science or dates
67-
using a standard calendar, but outside the `Timestamp-valid range`_
67+
using a standard calendar, but outside the `nanosecond-precision range`_
6868
(approximately between years 1678 and 2262).
6969

7070
.. note::
@@ -75,13 +75,19 @@ using a standard calendar, but outside the `Timestamp-valid range`_
7575
any of the following are true:
7676

7777
- The dates are from a non-standard calendar
78-
- Any dates are outside the Timestamp-valid range.
78+
- Any dates are outside the nanosecond-precision range.
7979

8080
Otherwise pandas-compatible dates from a standard calendar will be
8181
represented with the ``np.datetime64[ns]`` data type, enabling the use of a
8282
:py:class:`pandas.DatetimeIndex` or arrays with dtype ``np.datetime64[ns]``
8383
and their full set of associated features.
8484

85+
As of pandas version 2.0.0, pandas supports non-nanosecond precision datetime
86+
values. For the time being, xarray still automatically casts datetime values
87+
to nanosecond-precision for backwards compatibility with older pandas
88+
versions; however, this is something we would like to relax going forward.
89+
See :issue:`7493` for more discussion.
90+
8591
For example, you can create a DataArray indexed by a time
8692
coordinate with dates from a no-leap calendar and a
8793
:py:class:`~xarray.CFTimeIndex` will automatically be used:
@@ -235,6 +241,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
235241
236242
da.resample(time="81T", closed="right", label="right", offset="3T").mean()
237243
238-
.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
244+
.. _nanosecond-precision range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
239245
.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
240246
.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing

doc/whats-new.rst

+6
Original file line numberDiff line numberDiff line change
@@ -88,6 +88,12 @@ Internal Changes
8888
- Added a config.yml file with messages for the welcome bot when a Github user creates their first ever issue or pull request or has their first PR merged. (:issue:`7685`, :pull:`7685`)
8989
By `Nishtha P <https://github.com/nishthap981>`_.
9090

91+
- Ensure that only nanosecond-precision :py:class:`pd.Timestamp` objects
92+
continue to be used internally under pandas version 2.0.0. This is mainly to
93+
ease the transition to this latest version of pandas. It should be relaxed
94+
when addressing :issue:`7493`. By `Spencer Clark
95+
<https://github.com/spencerkclark>`_ (:issue:`7707`, :pull:`7731`).
96+
9197
.. _whats-new.2023.03.0:
9298

9399
v2023.03.0 (March 22, 2023)

setup.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,7 @@ include_package_data = True
7676
python_requires = >=3.9
7777
install_requires =
7878
numpy >= 1.21 # recommended to use >= 1.22 for full quantile method support
79-
pandas >= 1.4, <2
79+
pandas >= 1.4
8080
packaging >= 21.3
8181

8282
[options.extras_require]

xarray/coding/cftime_offsets.py

+10-3
Original file line numberDiff line numberDiff line change
@@ -57,7 +57,12 @@
5757
format_cftime_datetime,
5858
)
5959
from xarray.core.common import _contains_datetime_like_objects, is_np_datetime_like
60-
from xarray.core.pdcompat import NoDefault, count_not_none, no_default
60+
from xarray.core.pdcompat import (
61+
NoDefault,
62+
count_not_none,
63+
nanosecond_precision_timestamp,
64+
no_default,
65+
)
6166
from xarray.core.utils import emit_user_level_warning
6267

6368
try:
@@ -1286,8 +1291,10 @@ def date_range_like(source, calendar, use_cftime=None):
12861291
if is_np_datetime_like(source.dtype):
12871292
# We want to use datetime fields (datetime64 object don't have them)
12881293
source_calendar = "standard"
1289-
source_start = pd.Timestamp(source_start)
1290-
source_end = pd.Timestamp(source_end)
1294+
# TODO: the strict enforcement of nanosecond precision Timestamps can be
1295+
# relaxed when addressing GitHub issue #7493.
1296+
source_start = nanosecond_precision_timestamp(source_start)
1297+
source_end = nanosecond_precision_timestamp(source_end)
12911298
else:
12921299
if isinstance(source, CFTimeIndex):
12931300
source_calendar = source.calendar

xarray/coding/cftimeindex.py

+1-1
Original file line numberDiff line numberDiff line change
@@ -613,7 +613,7 @@ def to_datetimeindex(self, unsafe=False):
613613
------
614614
ValueError
615615
If the CFTimeIndex contains dates that are not possible in the
616-
standard calendar or outside the pandas.Timestamp-valid range.
616+
standard calendar or outside the nanosecond-precision range.
617617
618618
Warns
619619
-----

xarray/coding/times.py

+18-4
Original file line numberDiff line numberDiff line change
@@ -23,6 +23,7 @@
2323
from xarray.core import indexing
2424
from xarray.core.common import contains_cftime_datetimes, is_np_datetime_like
2525
from xarray.core.formatting import first_n_items, format_timestamp, last_item
26+
from xarray.core.pdcompat import nanosecond_precision_timestamp
2627
from xarray.core.pycompat import is_duck_dask_array
2728
from xarray.core.variable import Variable
2829

@@ -224,7 +225,9 @@ def _decode_datetime_with_pandas(
224225
delta, ref_date = _unpack_netcdf_time_units(units)
225226
delta = _netcdf_to_numpy_timeunit(delta)
226227
try:
227-
ref_date = pd.Timestamp(ref_date)
228+
# TODO: the strict enforcement of nanosecond precision Timestamps can be
229+
# relaxed when addressing GitHub issue #7493.
230+
ref_date = nanosecond_precision_timestamp(ref_date)
228231
except ValueError:
229232
# ValueError is raised by pd.Timestamp for non-ISO timestamp
230233
# strings, in which case we fall back to using cftime
@@ -391,7 +394,9 @@ def infer_datetime_units(dates) -> str:
391394
dates = to_datetime_unboxed(dates)
392395
dates = dates[pd.notnull(dates)]
393396
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
394-
reference_date = pd.Timestamp(reference_date)
397+
# TODO: the strict enforcement of nanosecond precision Timestamps can be
398+
# relaxed when addressing GitHub issue #7493.
399+
reference_date = nanosecond_precision_timestamp(reference_date)
395400
else:
396401
reference_date = dates[0] if len(dates) > 0 else "1970-01-01"
397402
reference_date = format_cftime_datetime(reference_date)
@@ -432,14 +437,16 @@ def cftime_to_nptime(times, raise_on_invalid: bool = True) -> np.ndarray:
432437
If raise_on_invalid is True (default), invalid dates trigger a ValueError.
433438
Otherwise, the invalid element is replaced by np.NaT."""
434439
times = np.asarray(times)
440+
# TODO: the strict enforcement of nanosecond precision datetime values can
441+
# be relaxed when addressing GitHub issue #7493.
435442
new = np.empty(times.shape, dtype="M8[ns]")
436443
for i, t in np.ndenumerate(times):
437444
try:
438445
# Use pandas.Timestamp in place of datetime.datetime, because
439446
# NumPy casts it safely it np.datetime64[ns] for dates outside
440447
# 1678 to 2262 (this is not currently the case for
441448
# datetime.datetime).
442-
dt = pd.Timestamp(
449+
dt = nanosecond_precision_timestamp(
443450
t.year, t.month, t.day, t.hour, t.minute, t.second, t.microsecond
444451
)
445452
except ValueError as e:
@@ -498,6 +505,10 @@ def convert_time_or_go_back(date, date_type):
498505
499506
This is meant to convert end-of-month dates into a new calendar.
500507
"""
508+
# TODO: the strict enforcement of nanosecond precision Timestamps can be
509+
# relaxed when addressing GitHub issue #7493.
510+
if date_type == pd.Timestamp:
511+
date_type = nanosecond_precision_timestamp
501512
try:
502513
return date_type(
503514
date.year,
@@ -641,7 +652,10 @@ def encode_cf_datetime(
641652

642653
delta_units = _netcdf_to_numpy_timeunit(delta)
643654
time_delta = np.timedelta64(1, delta_units).astype("timedelta64[ns]")
644-
ref_date = pd.Timestamp(_ref_date)
655+
656+
# TODO: the strict enforcement of nanosecond precision Timestamps can be
657+
# relaxed when addressing GitHub issue #7493.
658+
ref_date = nanosecond_precision_timestamp(_ref_date)
645659

646660
# If the ref_date Timestamp is timezone-aware, convert to UTC and
647661
# make it timezone-naive (GH 2649).

xarray/core/pdcompat.py

+13
Original file line numberDiff line numberDiff line change
@@ -39,6 +39,7 @@
3939
from typing import Literal
4040

4141
import pandas as pd
42+
from packaging.version import Version
4243

4344
from xarray.coding import cftime_offsets
4445

@@ -91,3 +92,15 @@ def _convert_base_to_offset(base, freq, index):
9192
return base * freq.as_timedelta() // freq.n
9293
else:
9394
raise ValueError("Can only resample using a DatetimeIndex or CFTimeIndex.")
95+
96+
97+
def nanosecond_precision_timestamp(*args, **kwargs) -> pd.Timestamp:
98+
"""Return a nanosecond-precision Timestamp object.
99+
100+
Note this function should no longer be needed after addressing GitHub issue
101+
#7493.
102+
"""
103+
if Version(pd.__version__) >= Version("2.0.0"):
104+
return pd.Timestamp(*args, **kwargs).as_unit("ns")
105+
else:
106+
return pd.Timestamp(*args, **kwargs)

xarray/tests/test_cftime_offsets.py

+1
Original file line numberDiff line numberDiff line change
@@ -1373,6 +1373,7 @@ def test_date_range_like_same_calendar():
13731373
assert src is out
13741374

13751375

1376+
@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
13761377
def test_date_range_like_errors():
13771378
src = date_range("1899-02-03", periods=20, freq="D", use_cftime=False)
13781379
src = src[np.arange(20) != 10] # Remove 1 day so the frequency is not inferable.

xarray/tests/test_concat.py

+1
Original file line numberDiff line numberDiff line change
@@ -297,6 +297,7 @@ def test_concat_multiple_datasets_with_multiple_missing_variables() -> None:
297297
assert_identical(actual, expected)
298298

299299

300+
@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
300301
def test_concat_type_of_missing_fill() -> None:
301302
datasets = create_typed_datasets(2, seed=123)
302303
expected1 = concat(datasets, dim="day", fill_value=dtypes.NA)

xarray/tests/test_conventions.py

+2
Original file line numberDiff line numberDiff line change
@@ -168,6 +168,7 @@ def test_do_not_overwrite_user_coordinates(self) -> None:
168168
with pytest.raises(ValueError, match=r"'coordinates' found in both attrs"):
169169
conventions.encode_dataset_coordinates(orig)
170170

171+
@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
171172
def test_emit_coordinates_attribute_in_attrs(self) -> None:
172173
orig = Dataset(
173174
{"a": 1, "b": 1},
@@ -185,6 +186,7 @@ def test_emit_coordinates_attribute_in_attrs(self) -> None:
185186
assert enc["b"].attrs.get("coordinates") == "t"
186187
assert "coordinates" not in enc["b"].encoding
187188

189+
@pytest.mark.filterwarnings("ignore:Converting non-nanosecond")
188190
def test_emit_coordinates_attribute_in_encoding(self) -> None:
189191
orig = Dataset(
190192
{"a": 1, "b": 1},

0 commit comments

Comments
 (0)