Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Better error message for trying to convert to "double," type #60685

Closed
1 of 3 tasks
benrutter opened this issue Jan 9, 2025 · 4 comments
Closed
1 of 3 tasks

ENH: Better error message for trying to convert to "double," type #60685

benrutter opened this issue Jan 9, 2025 · 4 comments
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member

Comments

@benrutter
Copy link

benrutter commented Jan 9, 2025

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

I (don't ask how 😅) came over the fact that this:

pd.DataFrame({"a": [1, 2, 3]}).astype("hurdy-gurdy")

Raises a helpful 'TypeError: data type "hurdy-gurdy" not understood message'

But this doesn't happen if the incorrect type is "double," (I'm not sure why but I assume something around the fact that pandas needs to check for types such as "double[pyarrow]").

This code:

pd.DataFrame({"a": [1, 2, 3]}).astype("double,")

Raises the error: "TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''".

Feature Description

Throw the safe "double," is not a valid type error message

Alternative Solutions

Probably involves digging slightly into the part of the code that's throwing an error - I'd love to put in a PR if this is something that'd be accepted?

Additional Context

No response

@benrutter benrutter added Enhancement Needs Triage Issue that has not been reviewed by a pandas team member labels Jan 9, 2025
@Liam3851
Copy link
Contributor

I can't reproduce under 2.2.3, just converts to float64. Can you specify under what versions this occurs? This could also be a numpy upstream issue as ultimately this gets delegated to numpy for numpy-typed data blocks (below is with numpy 1.26.4).

In [4]: df = pd.DataFrame({"a": [1, 2, 3]})

In [5]: df.dtypes
Out[5]:
a    int64
dtype: object

In [6]: df.astype("double,")
Out[6]:
     a
0  1.0
1  2.0
2  3.0

In [7]: df.astype("double,").dtypes
Out[7]:
a    float64
dtype: object

@parkine
Copy link

parkine commented Feb 5, 2025

I was able to reproduce the error, and here’s what I found:

When you create a DataFrame with dtype="double,", Pandas delegates the dtype handling to NumPy. In NumPy, this is interpreted as a structured data type rather than a standard floating-point type.

According to NumPy’s documentation, a structured dtype requires:
A list of tuples, one tuple per field.
Each tuple has the form (fieldname, datatype, shape), where shape is optional. Reference

For example, a structured dtype can look like dtype([('f0', '<i8')]).

To investigate, I checked how the dtype is printed:

#Input
df = pd.DataFrame({"a": [1, 2, 3]}).astype("double,")
print(df.dtypes)

#Output
a    [('f0', '<f8')]
dtype: object

About this:

  • f0 is a default field name assigned by NumPy when none is provided.
  • <f8 represents a 64-bit floating-point number.
  • < denotes little-endian byte order. (See NumPy dtype documentation for details.)

So, here is the thing, 'double,' is not rejected as an invalid dtype because NumPy interprets it as a structured dtype.
However, when operations like isnan() are applied, NumPy raises an error since isnan() does not support structured dtypes.
This is why the error message differs from the "hurdy-gurdy" case, which is caught earlier during dtype parsing.

This is not about an existing datatype with some extra character; it's specifically about the comma.
For example, if I try using a dot instead of a comma:

# Input
df = pd.DataFrame({"a": [1, 2, 3]}).astype("double.")
# Output
TypeError: data type 'double.' not understood

This is similar to the "hurdy-gurdy" case.

I believe this is not entirely a bug, but I'm not sure how to work with structured data types in general.

Additionally, I'd like to ask if you have an aarch64 processor instead of an x86_64 processor. My laptop is aarch64, and I was able to reproduce the same error message. However, when I tried on an x86_64 machine, I couldn't reproduce it as @Liam3851 mentioned. I wonder if the difference is due to the processor type or some other reason.

@benrutter
Copy link
Author

@parkine that's so interesting! Thanks for digging into it.

That makes it sound like raising a more helpful error would either need to happen within numpy or pandas would have to have some kind of extra "is this a valid datatype" check before handing over to numpy or pyarrow, which probably would add additional complexity that isn't worth it just for a more helpful error message.

I'm on x86_64, so I guess the message might be related to something else? Potentially numpy version (numpy version 2.1.3).

Either way, happy to close this issue off since it sounds like handling it within pandas is probably not worth it?

@parkine
Copy link

parkine commented Feb 6, 2025

Happy to help!

I don’t think they could’ve fixed it since this seems more like a numpy “feature” than a bug. The datatype you used wasn’t technically wrong just a different interpretation from numpy’s perspective. What’s interesting is how numpy decides to treat "double," either as a typo or a structured datatype.

Thank you for checking. Like you said, it’s probably a version issue. The one that didn’t throw an error was NumPy 1.26.4, but version 2.2.2 did. I was thinking it'd be a pandas version issue and didn’t even think to check numpy.

Maybe the newer version enforces structured dtypes more strictly? Not entirely sure, but if I figure it out, I’ll drop a comment!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement Needs Triage Issue that has not been reviewed by a pandas team member
Projects
None yet
Development

No branches or pull requests

3 participants