[Enh]: Spark Expr missing methods #1714

FBruzzesi · 2025-01-03T21:40:35Z

lucas-nelson-uiuc · 2025-01-04T21:57:26Z

Working on implementing scalar methods like any and all - should be ready to push later today.

Planning on working on the following methods - want to first check if my thought process is "correct".

arg_true
drop_nulls
filter
gather_every
sort
unique

Thinking of implementing two patterns for these methods:

# if predicate-based (e.g. drop_nulls, which uses predicate function `F.isnull`)
def method(self) -> Self:
        def _method(_input: Column) -> Column:
            from pyspark.sql import functions as F  # noqa: N812

            return F.explode(F.filter(F.array(_input), <predicate_func>))

        return self._from_call(_method, "method", returns_scalar=False)


# if not predicate-based (e.g. unique, which uses array function `F.array_distinct`)
def method(self) -> Self:
        def _method(_input: Column) -> Column:
            from pyspark.sql import functions as F  # noqa: N812

            return F.explode(<array_func>(F.array(_input)))

        return self._from_call(_method, "method", returns_scalar=False)

Not sure how expensive doing this is or if it collides with future API developments. Lmk what you think

MarcoGorelli · 2025-01-07T08:05:06Z

thanks @lucas-nelson-uiuc for your efforts here

can we leave the row-order dependent ones out for now, make sure we've got everything done from the others first? there's some broader api decisions we need to make for those

lucas-nelson-uiuc · 2025-01-10T00:55:27Z

got a working version for the following - all supports the Polars examples and expr_and_series tests:

filter
drop_null
replace_strict
fill_null (only strategy='zero' and strategy='one' seem like v1 additions)

FBruzzesi · 2025-01-10T07:33:21Z

Amazing stuff @lucas-nelson-uiuc ! Looking forward to those as well!
Notice that now we merged the pyspark tests into the main test suite and to run a test you will just need to remove the following snippet from the dedicated feature test:

    if "pyspark" in str(constructor):
        request.applymarker(pytest.mark.xfail)

FBruzzesi · 2025-01-10T08:08:29Z

FYI I am working on SparkLikeNamespace methods

lucas-nelson-uiuc · 2025-01-12T21:40:16Z

tried adding is_nan into #1802 but noticed two things:

nw._spark_like.expr.cast is not yet fully developed - this causes the tests to fail
- see comments: [Enh]: cast expr in SparkLike #1743
Spark handles zero division by returning null instead of nan - this also causes the test to fail
- should the Spark implementation of is_nan check for NaN and NULL?

lmk if I'm missing something

FBruzzesi added enhancement New feature or request help wanted Extra attention is needed good first issue Good for newcomers, but anyone is welcome to submit a pull request! labels Jan 3, 2025

Dhanunjaya-Elluri mentioned this issue Jan 4, 2025

feat: add few missing SparkLikeExpr methods #1721

Merged

10 tasks

lucas-nelson-uiuc mentioned this issue Jan 4, 2025

feat: add all, any and null_count Spark Expressions #1724

Merged

10 tasks

This was referenced Jan 10, 2025

feat: SparkLikeNamespace methods #1779

Merged

feat: add SparkLikeStrNamespace methods #1781

Merged

lucas-nelson-uiuc mentioned this issue Jan 12, 2025

feat: add drop_nulls, fill_null, and filter Spark Expressions #1802

Open

10 tasks

FBruzzesi mentioned this issue Jan 14, 2025

feat: add .over method for SparkLikeExpr #1808

Merged

10 tasks

This was referenced Jan 17, 2025

feat: add sort to SparkLikeExpr #1821

Closed

feat: add Expr.dt methods to PySpark #1835

Merged

FBruzzesi added the pyspark Issue is related to pyspark backend label Jan 20, 2025

Dhanunjaya-Elluri mentioned this issue Jan 20, 2025

feat: add to_string method to SparkLikeExprDateTimeNamespace #1842

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Enh]: Spark Expr missing methods #1714

[Enh]: Spark Expr missing methods #1714

FBruzzesi commented Jan 3, 2025 •

edited

Loading

lucas-nelson-uiuc commented Jan 4, 2025 •

edited

Loading

MarcoGorelli commented Jan 7, 2025

lucas-nelson-uiuc commented Jan 10, 2025 •

edited

Loading

FBruzzesi commented Jan 10, 2025

FBruzzesi commented Jan 10, 2025

lucas-nelson-uiuc commented Jan 12, 2025

[Enh]: Spark Expr missing methods #1714

[Enh]: Spark Expr missing methods #1714

Comments

FBruzzesi commented Jan 3, 2025 • edited Loading

lucas-nelson-uiuc commented Jan 4, 2025 • edited Loading

MarcoGorelli commented Jan 7, 2025

lucas-nelson-uiuc commented Jan 10, 2025 • edited Loading

FBruzzesi commented Jan 10, 2025

FBruzzesi commented Jan 10, 2025

lucas-nelson-uiuc commented Jan 12, 2025

FBruzzesi commented Jan 3, 2025 •

edited

Loading

lucas-nelson-uiuc commented Jan 4, 2025 •

edited

Loading

lucas-nelson-uiuc commented Jan 10, 2025 •

edited

Loading