Skip to content

Conversation

TomNicholas
Copy link
Member

@TomNicholas TomNicholas commented Sep 6, 2025

I feel like I must have missed some forbidden case here - it seemed too simple to effectively replace the use of the pandas.Index.slice_indexer method...

@TomNicholas TomNicholas changed the title Support method sel float coords Support .sel with method kwarg for slices Sep 6, 2025
@max-sixty
Copy link
Collaborator

this looks pretty good!!

do we need to be concerned about method as a dim we're selecting over?

@TomNicholas
Copy link
Member Author

do we need to be concerned about method as a dim we're selecting over?

method is already in the API, I didn't touch that part, so I assume (hope) that is already handled at a higher-level somewhere.

@keewis
Copy link
Collaborator

keewis commented Sep 7, 2025

either way, selecting from coordinates named method can still be done by passing a dict

@max-sixty
Copy link
Collaborator

method is already in the API, I didn't touch that part, so I assume (hope) that is already handled at a higher-level somewhere.

sorry, you're right!

Copy link
Member

@shoyer shoyer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This omission was definitely back in the day!

I no longer recall exactly why, but I think the main concern was about potentially ambiguous behavior. In particular, default slicing works similar to method='backfill' for the left bound and method='pad' for the right bound, which isn't possible to express with a single method argument.

Thinking about this now, I think this is probably safe and would be a welcome new feature, though it may be worth considering supporting the "default" method=None with an explicit tolerance.

Comment on lines 569 to 571
slice_index_bounds = index.get_indexer(
[slice_label_start, slice_label_stop], method=method, tolerance=tolerance
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to handle the "no match" case, for which get_indexer() returns -1.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I handled this in d05ab1a, with the implementation now returning an empty slice if either endpoint is not found.

f"{coord_name!r} with a slice over integer positions; the index is "
"unsorted or non-unique"

# +1 needed to emulate behaviour of xarray sel with slice without method kwarg, which is inclusive of point at stop label
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the index is non-unique?

Copy link
Member Author

@TomNicholas TomNicholas Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The behaviour of the existing implementation is confusing. This passes:

data_non_unique = xr.Dataset(
    coords={"lat": ("lat", [20.1, 21.1, 21.1, 22.1, 22.1, 23.1])}
)
expected = xr.Dataset(coords={"lat": ("lat", [21.1, 21.1, 22.1, 22.1])})
actual = data_non_unique.sel(lat=slice(21.1, 22.1))
assert_identical(expected, actual)

but it relies upon calling pandas.Index.slice_indexer on a non-unique index, despite the docstring of that method saying "Index needs to be ordered and unique."!

Also, triggering my implementation (using actual = data_non_unique.sel(lat=slice(21.0, 22.2), method="nearest")) currently fails at .get_indexer with pandas.errors.InvalidIndexError: Reindexing only valid with uniquely valued Index objects. This restriction seems reasonable, but is not documented in the docstring of .get_indexer.

So to get the current (intuitive) behaviour we are relying on undefined behaviour in pandas, and we can't support method/tolerance for slices on non-unique indexers using .get_indexer because of undocumented restrictions in pandas 🙃

Copy link
Member Author

@TomNicholas TomNicholas Sep 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At least I can clearly catch this case, which I've done in 199765e. That means this PR now reduces the NotImplementedError surface from "any Index passed a slice with method" to "any non-unique Index passed a slice with method".

@TomNicholas
Copy link
Member Author

it may be worth considering supporting the "default" method=None with an explicit tolerance.

I added support for this in 8cce331, and a test that confirms that with a big enough value of tolerance this does reduce to the "default" behaviour, comparing against when tolerance is not supplied.

@TomNicholas TomNicholas requested a review from shoyer September 8, 2025 21:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

.sel with method doesn't work for slices
4 participants