-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
BUG: Pyarrow timestamp support for map() function #61236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
BUG: Pyarrow timestamp support for map() function #61236
Conversation
pandas/core/algorithms.py
Outdated
try: | ||
# Convert elements to pandas.Timestamp (or datetime64[ns]) | ||
arr = arr.astype("datetime64[ns]") | ||
except Exception: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the wrong place to fix this. This should be fixed in ArrowExtensionArray.map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks but the required fix is in ArrowExtensionArray
pandas/core/arrays/arrow/array.py
Outdated
@@ -1483,6 +1483,8 @@ def to_numpy( | |||
def map(self, mapper, na_action: Literal["ignore"] | None = None): | |||
if is_numeric_dtype(self.dtype): | |||
return map_array(self.to_numpy(), mapper, na_action=na_action) | |||
elif self.dtype == "timestamp[ns][pyarrow]": | |||
return map_array(self.to_numpy(dtype=object), mapper, na_action=na_action) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you avoid the type cast to object
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried using datetime64[ns]
instead of object
, but some tests expect Python objects (pd.Timestamp, ) and do not pass. I think keeping object
helps preserve that expected behavior. Let me know if you'd prefer adjusting the test instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the failing test would need adjustment (we get a better result when we don't return object
)
|
||
def test_map_arrow_timestamp_dict(): | ||
# GH 61231 | ||
pytest.importorskip("pyarrow", minversion="10.0.1") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pytest.importorskip("pyarrow", minversion="10.0.1") | |
pytest.importorskip("pyarrow") |
@@ -1483,6 +1483,10 @@ def to_numpy( | |||
def map(self, mapper, na_action: Literal["ignore"] | None = None): | |||
if is_numeric_dtype(self.dtype): | |||
return map_array(self.to_numpy(), mapper, na_action=na_action) | |||
elif self.dtype == "timestamp[ns][pyarrow]": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding an elif
you can modify the existing if
statement as if is_numeric_dtype(self.dtype) or self.dtype.kind in "mM":
Added type annotations to new arguments/methods/functions.doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.