Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enh]: cast expr in SparkLike #1743

Open
lucas-nelson-uiuc opened this issue Jan 6, 2025 · 3 comments
Open

[Enh]: cast expr in SparkLike #1743

lucas-nelson-uiuc opened this issue Jan 6, 2025 · 3 comments

Comments

@lucas-nelson-uiuc
Copy link
Contributor

We would like to learn about your use case. For example, if this feature is needed to adopt Narwhals in an open source project, could you please enter the link to it below?

No response

Please describe the purpose of the new feature or describe the problem to solve.

Currently working on implementing the cast expression in SparkLike - wanted to take this issue to discuss + list out all the key considerations with Spark's data types.

Suggest a solution if possible.

So far, I'm able to implement a handful of data types. However, I noticed that some types cannot (yet) be implemented or I'm uncertain how they'd be implemented.

Able to implement (currently testing)

  • Float64, Float32, Int64, Int32, Int16, Decimal
  • String
  • Boolean
  • ArrayType (see comment below)
  • Struct, Field

Cannot (yet) implement

  • No native support for unsigned integers (UInt8, UInt16, UInt32, UInt64)
  • No native support for categorical types (Enum, Categorical)
  • No direct parameterization of datetime information (would need to set in/get from Spark configuration)
  • No native support for other dtypes (Object, Unknown)

Unsure how to implement

  • pyspark.types.StructField contains more than just name and dtype - is it worth updating narwhals.dtypes.Field to have these additional (optional) parameters to accommodate PySpark?
    • nullable: whether the field can be null (None) or not
    • metadata: additional information about the field
  • I think PySpark's ArrayType functions like Polars' List type (at least it doesn't have a width constraint)
    • Should we map pyspark.types.ArrayType to narwhals.dtypes.List?
    • Is there a way we could implement an Array type that aligns with Polars?

Let me know what anyone thinks about the above. Feel free to add onto this with other types we could add in. Thanks!

If you have tried alternatives, please describe them below.

No response

Additional information that may help us understand your needs.

No response

@FBruzzesi
Copy link
Member

Hey @lucas-nelson-uiuc thanks for taking the time in this. I would be very happy to see cast implementation for pyspark as well.

Coming to your considerations:

Able to implement (currently testing)

Amazing 🚀👌

Cannot (yet) implement

I think it is fine to raise an error when casting to these types for now and evaluate later on.

pyspark.types.StructField contains more than just name and dtype - is it worth updating narwhals.dtypes.Field to have these additional (optional) parameters to accommodate PySpark?

polars should default to nullable, so we can keep such flag on, not sure how to go about having metadata

Should we map pyspark.types.ArrayType to narwhals.dtypes.List?

Yes I would say so, also with nullable flag!

Is there a way we could implement an Array type that aligns with Polars?

Not sure, we can either raise for now or validate that each element has the same length

@EdAbati
Copy link
Collaborator

EdAbati commented Jan 15, 2025

Hey @lucas-nelson-uiuc thank you for making the issue. I pushed some old commits today about casting on basic type and I didn't realise you may also working on this. sorry for that 😕 The PR only contains basic types therefore your work on ArrayType, Struct, etc. is very much welcome and needed. Also feel free to add feedback in the PR if you had something I missed

@lucas-nelson-uiuc
Copy link
Contributor Author

No problem ^ thanks for getting the PR started!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants