-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Support .sel
with method
kwarg for slices
#10711
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Support .sel
with method
kwarg for slices
#10711
Conversation
.sel
with method
kwarg for slices
this looks pretty good!! do we need to be concerned about |
|
either way, selecting from coordinates named |
…TomNicholas/xarray into support_method_sel_float_coords
sorry, you're right! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This omission was definitely back in the day!
I no longer recall exactly why, but I think the main concern was about potentially ambiguous behavior. In particular, default slicing works similar to method='backfill'
for the left bound and method='pad'
for the right bound, which isn't possible to express with a single method
argument.
Thinking about this now, I think this is probably safe and would be a welcome new feature, though it may be worth considering supporting the "default" method=None
with an explicit tolerance
.
xarray/core/indexes.py
Outdated
slice_index_bounds = index.get_indexer( | ||
[slice_label_start, slice_label_stop], method=method, tolerance=tolerance | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need to handle the "no match" case, for which get_indexer()
returns -1
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I handled this in d05ab1a, with the implementation now returning an empty slice if either endpoint is not found.
xarray/core/indexes.py
Outdated
f"{coord_name!r} with a slice over integer positions; the index is " | ||
"unsorted or non-unique" | ||
|
||
# +1 needed to emulate behaviour of xarray sel with slice without method kwarg, which is inclusive of point at stop label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if the index is non-unique?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behaviour of the existing implementation is confusing. This passes:
data_non_unique = xr.Dataset(
coords={"lat": ("lat", [20.1, 21.1, 21.1, 22.1, 22.1, 23.1])}
)
expected = xr.Dataset(coords={"lat": ("lat", [21.1, 21.1, 22.1, 22.1])})
actual = data_non_unique.sel(lat=slice(21.1, 22.1))
assert_identical(expected, actual)
but it relies upon calling pandas.Index.slice_indexer
on a non-unique index, despite the docstring of that method saying "Index needs to be ordered and unique."!
Also, triggering my implementation (using actual = data_non_unique.sel(lat=slice(21.0, 22.2), method="nearest")
) currently fails at .get_indexer
with pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects
. This restriction seems reasonable, but is not documented in the docstring of .get_indexer
.
So to get the current (intuitive) behaviour we are relying on undefined behaviour in pandas, and we can't support method
/tolerance
for slices on non-unique indexers using .get_indexer
because of undocumented restrictions in pandas 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least I can clearly catch this case, which I've done in 199765e. That means this PR now reduces the NotImplementedError
surface from "any Index passed a slice with method
" to "any non-unique Index passed a slice with method
".
I added support for this in 8cce331, and a test that confirms that with a big enough value of |
I feel like I must have missed some forbidden case here - it seemed too simple to effectively replace the use of the
pandas.Index.slice_indexer
method....sel
withmethod
doesn't work for slices #10710whats-new.rst
New functions/methods are listed inapi.rst