Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extension array indexes #9671

Open
wants to merge 233 commits into
base: main
Choose a base branch
from

Conversation

ilan-gold
Copy link
Contributor

Identical to kmuehlbauer#1 - probably not very helpful in terms of changes since https://github.com/kmuehlbauer/xarray/tree/any-time-resolution-2 contains most of it....

kmuehlbauer and others added 30 commits October 18, 2024 07:31
…ore/variable.py to use any-precision datetime/timedelta with autmatic inferring of resolution
…t resolution, fix code and tests to allow this
… more carefully, for now using pd.Series to covert `OMm` type datetimes/timedeltas (will result in ns precision)
…rray` series creating an extension array when `.array` is accessed
@@ -104,17 +104,11 @@ def index_flat(request):
index fixture, but excluding MultiIndex cases.
"""
key = request.param
if key in ["bool-object", "bool-dtype", "nullable_bool", "repeats"]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seems to be some weird broadcasting behaviour here.

@dcherian
Copy link
Contributor

Sorry, this is a total mess. Apparently IndexVariable and Variable now behave differently, and I'm not sure why.

@@ -945,7 +944,7 @@ def load(self, **kwargs):
--------
dask.array.compute
"""
self._data = to_duck_array(self._data, **kwargs)
self._data = _maybe_wrap_data(to_duck_array(self._data, **kwargs))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should just return the PandasExtensionArray wrapper class but I'm wary of exposing that to users

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK i did this, it seems much neater at the expense of exposing the PandasExtensionArray wrapper class

* main:
  Bump scientific-python/upload-nightly-action in the actions group (pydata#10192)
  Add new whats-new section (pydata#10190)
  release 2025.03.1 (pydata#10188)
  Support zarr `write_empty_chunks` for zarr-python 3 and up (pydata#10177)
@dcherian dcherian changed the title (fix): extension array indexers Support extension array indexes Apr 1, 2025
@ilan-gold
Copy link
Contributor Author

@dcherian Could you give a bit of background into the changes you pushed? I'm not really following.

Sorry, this is a total mess. Apparently IndexVariable and Variable now behave differently, and I'm not sure why.

Did I do something wrong in the PR without knowing it i.e., bypassing the tests? It would be great to understand!

@dcherian
Copy link
Contributor

dcherian commented Apr 1, 2025

No you didn't do anything wrong per-se.

  1. I wanted the pandas-specific logic to live inside indexing.py as much as possible (and definitely not in namedarray/core.py, so moving that exposed some other warts. The solution right now is to expose PandasExtensionArray wrapper class.
  2. The groupby_bins tests needed to be updated because previously intervalarray got cast to a numpy object array of tuples.

@dcherian dcherian mentioned this pull request Apr 4, 2025
13 tasks
dcherian added 3 commits April 4, 2025 16:16
* main:
  Fix sparse dask repr test (pydata#10200)
  Apply ruff preview rule RUF046 (pydata#10199)
  DOC: Remove mention of netcdf pypi package (pydata#10197)
@@ -102,7 +102,7 @@ def replace_duck_with_extension_array(args) -> list:
return type(self)[type(res)](res)
return res

def __array_ufunc__(ufunc, method, *inputs, **kwargs):
def __array_ufunc__(self, ufunc, method, *inputs, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the test failure in test_units has to do with this implementation for the equals ufunc.

@ilan-gold can you take a look please?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The signature is correct https://numpy.org/devdocs/user/basics.subclassing.html

I have seen that error before but I can't remember what it means now, maybe a segfault...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw that, but we have self included for __array_ufunc__ inside Xarray 🤷🏾‍♂️

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! hopefully my fix helps, but it was hard to make heads or tails of what changed because the diff was super dirty (there was a merge from main or two). please open a PR into mine in the future, would make things easier :)

Comment on lines +337 to +340
try:
nbytes_str = f" {render_human_readable_nbytes(variable.nbytes)}"
except TypeError:
nbytes_str = " ?"
Copy link
Contributor

@dcherian dcherian Apr 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit ugly, but we'd need to define nbytes on pandas wrappers, and it breaks many reprs in our tests & doctests :/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants