Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relax nanosecond datetime restriction in CF time decoding #9618

Open
wants to merge 164 commits into
base: main
Choose a base branch
from

Conversation

kmuehlbauer
Copy link
Contributor

@kmuehlbauer kmuehlbauer commented Oct 13, 2024

This is another attempt to resolve #7493. This goes a step further than #9580.

The idea of this PR is to automatically infer the needed resolutions for decoding/encoding and only keep the constraints pandas imposes ("s" - lowest resolution, "ns" - highest resolution). There is still the idea of a default resolution, but this should only take precedence if it doesn't clash with the automatic inference. This can be discussed, though. Update: I've implemented time-unit-kwarg a first try to have default resolution on decode, which will override the current inferred resolution only to higher resolution (eg. 's' -> 'ns'). To work towards #4490 the time decoding options (decode_time and use_cftime are bundled within CFDatetimeCoder which is distributed via decode_times kwarg. use_cftime-kwarg is deprecated.

For sanity checking, and also for my own good, I've created a documentation page on time-coding in the internal dev section. Any suggestions (especially grammar) or ideas for enhancements are much appreciated.

There still might be room for consolidation of functions/methods (mostly in coding/times.py), but I have to leave it alone for some days. I went down that rabbit hole and need to relax, too 😬.

Looking forward to get your insights here, @spencerkclark, @ChrisBarker-NOAA, @pydata/xarray.

Todo:

  • floating point handling
  • update decoding tests to iterate over time_units (where appropriate)
  • CFDatetimeCoder as input for decode_times kwarg
  • ...

@kmuehlbauer
Copy link
Contributor Author

Nice, mypy 1.12 is out and breaks our typing, 😭.

@TomNicholas
Copy link
Member

Nice, mypy 1.12 is out and breaks our typing, 😭

Can we pin it in the CI temporarily?

@TomNicholas TomNicholas mentioned this pull request Oct 14, 2024
4 tasks
@kmuehlbauer
Copy link
Contributor Author

Can we pin it in the CI temporarily?

Yes, 1.11.2 was the last version.

@kmuehlbauer kmuehlbauer force-pushed the any-time-resolution-2 branch from ca5050d to f7396cf Compare October 14, 2024 16:09
@kmuehlbauer kmuehlbauer marked this pull request as ready for review October 14, 2024 18:05
@kmuehlbauer
Copy link
Contributor Author

This is now ready for a first round of review. I think this is already in a quite usable state.

But no rush, this should be thoroughly tested.

@spencerkclark
Copy link
Member

Sounds good @kmuehlbauer! I’ll try and take an initial look this weekend.

@kmuehlbauer
Copy link
Contributor Author

Slowly approaching the gates...

I think I've addressed all suggestions and review comments. Good chance that I've missed something, though.

One final unresolved issue is with polyfit/integration (cc @dcherian). There seems to be some hardcoding of "ns". Unfortunately this is not my home turf, so maybe someone more experienced should take this on.

The current state of this PR should be backward compatible wrt to time decoding. There are slight changes (and also enhancements/bug fixes). All non-nano enhancements are opt-in (time_unit-kwarg to xr.coders.CFDatetimeCoder. I've left todo's in the code where to pick up in future PR's. If one of those todo's should be fixed in this PR, please have another review.

I'll need to summarize the changes in the whats-new.rst.

@shoyer
Copy link
Member

shoyer commented Jan 9, 2025

One final unresolved issue is with polyfit/integration (cc @dcherian). There seems to be some hardcoding of "ns". Unfortunately this is not my home turf, so maybe someone more experienced should take this on.

As long as the unit tests are passing, let's open an issue for this and leave it for later. This PR is big enough already. It would be best to continue this work in smaller follow-on PRs.

My cursory examination of datetime_to_numeric() (called by interp and integrate internally) is that it should be totally with different datetime units. In general we only have to worry about extremely unsafe cases where code assumes a datetime array already has ns precision without checking.

@dcherian
Copy link
Contributor

dcherian commented Jan 9, 2025

One final unresolved issue is with polyfit/integration (cc @dcherian). There seems to be some hardcoding of "ns". Unfortunately this is not my home turf, so maybe someone more experienced should take this on.

I don't think there are changes to make in polyfit per se. We should just call this out in the release notes and the docstrings. Note that the unit matters for integrate and differentiate too.

I agree with doing this in a followup.

@dcherian
Copy link
Contributor

dcherian commented Jan 9, 2025

Stephan's right, we force ns when converting to numeric for polyfit/polyval so this PR isn't breaking things

data=datetime_to_numeric(x.data, offset=offset, datetime_unit="ns"),

I pushed a couple of extra tests to make sure we don't unintentionally break this in the future.

EDIT: opened #9937

@kmuehlbauer
Copy link
Contributor Author

I went over the docs once again, also added whats-new.rst entry. Some more cosmetic changes. I'd say this is finished. Please let me know, if there is something missing.

Copy link
Member

@spencerkclark spencerkclark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all of your hard work on this @kmuehlbauer 🙏.

I think we are basically there! I just want to make sure we are not committing anything that we know is a breaking change or otherwise does not work out of the box. I'm approving, but take a look at kmuehlbauer#3, which fixes a couple issues related to timedelta decoding / encoding.

xarray/tests/test_interp.py Outdated Show resolved Hide resolved
xarray/tests/test_groupby.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
xarray/tests/test_dataset.py Outdated Show resolved Hide resolved
@@ -595,8 +624,9 @@ def test_infer_cftime_datetime_units(calendar, date_args, expected) -> None:
],
)
def test_cf_timedelta(timedeltas, units, numbers) -> None:
# todo: check, if this test is OK
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost—I realize my comments about enabling non-nanosecond timedelta decoding, but defaulting to nanosecond resolution decoding could only be satisfied by implementing a time_unit option in encode_cf_timedelta...I gave it a try in kmuehlbauer#3.

Let me know if that looks OK. Your refactoring of the timedelta handling in 0308672 really does the heavy lifting 👍.

xarray/coding/times.py Show resolved Hide resolved
kmuehlbauer and others added 3 commits January 13, 2025 07:39
* Fix timedelta encoding overflow issue; always decode to ns resolution

* Implement time_unit for decode_cf_timedelta

* Reduce diff
@kmuehlbauer
Copy link
Contributor Author

Thanks again @spencerkclark! I've merged your kmuehlbauer#3 as it is essential for this PR to keep consistency and backwards compatibility.

doc/whats-new.rst Outdated Show resolved Hide resolved
xarray/tests/test_coding_times.py Outdated Show resolved Hide resolved
xarray/tests/test_coding_times.py Outdated Show resolved Hide resolved
doc/whats-new.rst Outdated Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
10 participants