-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enh]: Spark Expr missing methods #1714
Comments
Hey @FBruzzesi , Working on implementing scalar methods like Planning on working on the following methods - want to first check if my thought process is "correct".
Thinking of implementing two patterns for these methods: # if predicate-based (e.g. drop_nulls, which uses predicate function `F.isnull`)
def method(self) -> Self:
def _method(_input: Column) -> Column:
from pyspark.sql import functions as F # noqa: N812
return F.explode(F.filter(F.array(_input), <predicate_func>))
return self._from_call(_method, "method", returns_scalar=False)
# if not predicate-based (e.g. unique, which uses array function `F.array_distinct`)
def method(self) -> Self:
def _method(_input: Column) -> Column:
from pyspark.sql import functions as F # noqa: N812
return F.explode(<array_func>(F.array(_input)))
return self._from_call(_method, "method", returns_scalar=False) Not sure how expensive doing this is or if it collides with future API developments. Lmk what you think |
thanks @lucas-nelson-uiuc for your efforts here can we leave the row-order dependent ones out for now, make sure we've got everything done from the others first? there's some broader api decisions we need to make for those |
got a working version for the following - all supports the Polars examples and
|
Amazing stuff @lucas-nelson-uiuc ! Looking forward to those as well! if "pyspark" in str(constructor):
request.applymarker(pytest.mark.xfail) |
FYI I am working on |
tried adding
lmk if I'm missing something |
Methods which are row order dependent are not included.
Methods within a namespace are included only if the namespace at least exists, otherwise it means that all namespace methods are missing.
Methods with one asterisk (*) change the length but don't aggregate - these are deprioritized for now
High priority:
Namespaces:
The text was updated successfully, but these errors were encountered: